In the good old days, working as a Machine Learning Engineer meant working 95% of the time on feature engineering and 5% on training models with the extracted features. This was a manually intensive and time-consuming process, that usually led to inflexible proofs of concept that could hardly be adapted to new settings. Fortunately, Deep Learning came to the rescue, and now, using tons of data and simple data preprocessing, it’s possible to train algorithms that return better predictions than the classical approaches.

To better understand how this was a game-changer, we give you the following example. Imagine you have to create a model to identify the species of a plant-based on a picture of a leaf. Using the older methodology, the computer vision tasks that you need to apply to extract relevant features seems like an endless list: background removal, applying filters for edge detection, applying filters for textural features, compute color features, compute leaf cutout shape, detect, mask and extract features from leaf veins, etc. After applying pre-processing, these features would be ready to feed your classifier. To get the right features and an accurate classifier, you would need weeks or even months of intensive work.

Nowadays, to get the same or even better results, you only need to use a pre-trained Convolutional Neural Network and fine-tune it with your images and labels. This process can be done in a single afternoon.

Deep Learning brought us new technological age, indeed. With a small effort, we can build models that perform better than humans, and the keyword to do it is data. But what if we only have a couple of thousands of data points? What if our data is not a fair representation of the population? What if the human knowledge about the business is bigger than our data? Then, it’s time to Embed Domain Knowledge in our Deep Neural Networks (DNN)!

“With Embedding Domain Knowledge, we find a midpoint where we understand the business, we understand the technology, and we elevate the technology to the business, instead of dumbing down the business to fit the technology.” Kelwin Fernandes


Adding Invariances with Data Augmentation

As we already know, the more data we have, the better the Dense Neural Networks (DNNs) perform. This happens because we are not explaining to our DNNs how they should learn things, they simply make their inferences from what they see.

For example, if we want to train a model to detect cats, we do not tell the model to look for a small mammal with four legs, whiskers, long tail, pointed ears, and with or without fur. Instead, we give them pictures of different cats and the model starts learning what a cat looks like. However, if we are only showing pictures of black cats or cats sitting on a couch, our model might “think” those features are relevant to detect a cat and it won’t be able to detect cats when those conditions are not present.

In this example, we want to build a model invariant to color, background, cat’s pose, and any other conditions that are not relevant to the cat detection task, using small amounts of data. One way to do this is using Data Augmentation.

With Data Augmentation, we can use the same image over and over again, by applying random or specific transformations that do not compromise the task. This way, if we want to build a model invariant to certain conditions, we should manipulate those conditions on the original data to create new data points.

Using the cat detection example, to make our model invariant, we could apply different transformations to images, for instance, zoom out, distortion, rotation, contrast manipulation, brightness manipulation, background manipulation (subtracting the original background and using it as a blue screen for new contexts). With these transformations, it’s easier for the model to learn what’s relevant and what’s not.

What not to do? Remember, Data Augmentation is relevant to give some Domain Knowledge to the model, therefore, be smart while applying it. Do not apply random transformations just to increase the number of data points. Instead, make sure the transformations you’re applying make sense inside your domain. E.g. When you’re applying zoom to your images, do not zoom in too much or you might remove your cat from the picture. Also do not zoom out too much or you end up with a couple of pixels representing your cat.

In addition to Data Augmentation, you can also use Output Augmentation. This approach consists of using augmented data as input and the average prediction of that data as output. This way, the final prediction is not conditioned to input variances.


Adding Invariances with Tailored DNN

Another way to add invariances to DNN models is manipulating the Neural Network itself, which can be done using different approaches. In this post, we give you 3 examples of DNN tailoring by manipulation of the loss function, kernels, and architectures.

Loss Function Manipulation

The loss function is used to penalize the model when it is deviating from the target. In this case, our goal is to get accurate and invariant predictions. Therefore, we can use the loss function in our favor by penalizing variances. E.g. using a loss function of this form (f(x) – f(ax+b))^2, where x is the feature value, helps the model to be invariant to linear transformations.

Kernels Manipulation

In the previous section, we saw we can add invariances to illumination and contrast with Data Augmentation. Another way to add these invariances is by applying local normalization. We can do this by preprocessing or manipulating some CNN kernels to perform local normalization and freezing that/those layers during training.

Tailored Architectures

One invariance hard to add in CNN is related to pose. A 2D representation of a 3D object is always associated with a pose, i.e. a translation and a rotation. This concept is not new for the human brain, which is able to deconstruct a 2D representation and match it with objects in the real world (3D). In fact, our brain is so used to doing Inverse Graphics, that it even gets tricked by optical illusions.

On the contrary, CNNs are not able to make these space associations, since part of that information gets lost with max pooling. To overcome this problem, Geoffrey Hinton created a novel CNN architecture where neurons are replaced by capsules, creating the concept of Capsule Networks.

In a nutshell, capsules perform complex internal computations and encapsulate the output in a vector form. This vector has a direction and a length; the length represents the probability that a certain entity is present and the direction represents the space features including pose, lighting, and deformation.

Currently, the computational cost to train a Capsule Network is too high for personal computers but stay tuned and keep looking for architectures that best meet your needs.


Adding Group Invariances

In some cases, you might have variances related to some groups (race, gender, age, country). The association of the data point to one (or more) of these groups might be irrelevant to your model or even unfair. To avoid that, one option is to perform an Average Voting approach that assures group invariance. You can also check out our talk on Fairness in AI for more suggestions.

As shown in the above image, this method consists of two parts. Firstly, train a different model for each group and then, compute the average prediction to get the final result.

This way, we have variant models combined to compute an invariant prediction.


Removing Invariances

In the previous sections, we saw how to add invariance to DNN. However, there is one invariance that is already integrated into CNNs, due to their architecture: translational invariance. This feature might be helpful for a large number of tasks. Nevertheless, in some cases, it is relevant to know the position of the object relative to the image.

For example, in colposcopy, every image is centered within the cervix, i.e. the outline of the dataset images are very similar. Besides, for this exam, the more relevant area is the cervix itself. Therefore, for this task, it’s helpful to know the location of the pixels.

To avoid convolutional kernels ignoring positional information, we can pass that information to CNN as additional channels. For example, adding 3 extra channels where each pixel would take the value of its row, its column, and its distance to the center, respectively.


Forcing Monotonic Behaviour

At this point, we already saw how to add invariances to the DNNs and how to remove them in our favor. In this section, we’ll see how to manipulate model variances.

In some problems, we have enough Domain Knowledge to know what to expect when a certain feature is modified. E.g. in sales forecasting, when nothing else changes, it is safe to assume that a price decrease will lead to a sales increase.

However, in real-world data, there is more than one single feature being modified, which makes it harder for the algorithm to learn those expected correlations. E.g. in the hospitality sector, during the high season, usually both prices and sales increase. Nevertheless, there is no cause-effect relationship here. In this case, both effects are caused by increased demand.

To help the model better understand these correlations, we can force it to have a Monotonic Behaviour. Wilson Silva, et al. proposed an effective and intuitive methodology to do it in a paper with our collaboration. Their approach consists of splitting monotonic features from unconstrained features and making them follow two different streams, as shown in the figure below. One stream is designed for unconstrained features and consists of a conventional DNN. The second stream, designed for monotonic features, consists of a DNN where the weights were forced to be positive. In the end, the two independent streams concatenate to give the final output.

Going back to the previous example, with this approach, we can guarantee that a price increase will always negatively influence sales forecasting, no matter the season or the exceptions on the train data.

Be careful when using this technique! Remember you’re passing Domain Knowledge, do not force monotonic behavior if you’re not 100% sure about features’ effect.



Did Deep learning revolutionize artificial intelligence? Yes! Is it easier now to get better predictions with less effort? Also yes! Is it possible for a monkey to train a DNN? Some might answer “yes” to this question, but that’s not our opinion. DNNs should never be treated as black boxes, instead, we should understand what’s really happening inside them to get the most out of them.

Furthermore, never underestimate human intelligence and domain knowledge! Combine them with your models to improve them and yourself.

If you are a developer, do not forget to try this at home!

If you want to integrate Domain Knowledge, but the previous approaches do not fit your problem, give us a call, and let’s find a solution together.

Mailing List

If you enjoyed the content of this post, subscribe to our mailing list. There, you will find content such as:

  • Our blog posts
  • References to papers we publish with other clients or research institutions
  • Reference to events in which we will participate/sponsor
  • An aggregate of the content we recommend (e.g. papers, libraries, books, opinion articles, software, online courses, …)


* indicates required