{"id":2884,"date":"2023-03-03T18:47:41","date_gmt":"2023-03-03T18:47:41","guid":{"rendered":"https:\/\/nilg.ai\/?p=2884"},"modified":"2025-11-21T17:10:12","modified_gmt":"2025-11-21T17:10:12","slug":"active_learning","status":"publish","type":"post","link":"https:\/\/nilg.ai\/pt\/202303\/active_learning\/","title":{"rendered":"Increasing Efficiency with Active Learning"},"content":{"rendered":"<p><img decoding=\"async\" class=\"aligncenter wp-image-3004 size-large\" src=\"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation-1024x683.jpg\" alt=\"\" width=\"1024\" height=\"683\" srcset=\"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation-1024x683.jpg 1024w, https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation-300x200.jpg 300w, https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation-768x512.jpg 768w, https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation-600x400.jpg 600w, https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation.jpg 1077w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h2>The problem: Labeling data is boring (and expensive)<\/h2>\n<p><span style=\"font-weight: 400;\">So there you are. You have collected your data, analyzed it, processed it, and built your sophisticated model architecture. After many hours of training and evaluating, you have come to a very unpleasant conclusion: you need more data. Before you readjust your budget to fit the extra data acquisition and labeling, let me introduce you to a way of increasing efficiency with <a href=\"https:\/\/nilg.ai\/pt\/product\/the-machine-learning-spectrum\/\">Active Learning<\/a>!<\/span><\/p>\n<h2>The Solution: Active Learning<\/h2>\n<p><span style=\"font-weight: 400;\">So, what is this magical solution? <\/span><span style=\"font-weight: 400;\">Well, active learning is the idea that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns, i.e., if allowed to be curious (Settles, 2009).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In practice, we give a machine learning model our unlabeled data, and then from the predictions, we sample the data points that the model found hardest to predict. Then it\u2019s time to put the most elegant and important machine to work: the human brain. Once we have our sample, we give it to a human annotator (or oracle) They will decide whether or not the data is worth annotating. Finally, we feed the model the newly annotated data from our sample. And voil\u00e1: we hope it works.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The intuition is that giving the model a small sample of hard-to-predict data can improve model performance just as much as giving it the entire dataset. So far, the process is pretty straightforward. But you have been left wondering\u2026 how do you sample the data then?<\/span><a href=\"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/Active-Learning.svg\"><img decoding=\"async\" class=\"size-full wp-image-3007 attachment-svg aligncenter\" src=\"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/Active-Learning.svg\" alt=\"\" \/><\/a><\/p>\n<h2>Sampling your data<\/h2>\n<p><span style=\"font-weight: 400;\">Like anything in machine learning, there is no one-size-fits-all approach. When it comes to sampling though, there are a few tried and tested solutions that are worth a go. For example, if you are building a probabilistic classification model, a good measure might be uncertainty. Uncertainty sampling is a very popular strategy that is based on evaluating how uncertain a model is when predicting a data point. A direct approach to obtain this information can be applying the Least Confident Method to the predictions or calculating the prediction entropy. There are other methods that might be useful, such as calculating how a certain data point will modify the model predictions (Expected Model Change) or even calculating how the prediction losses will vary (Expected Error Reduction). These methods are trickier and more computationally heavy, so they are not as easy to apply.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Suppose you have multiple models trained on the dataset you want to label. In that case, you can apply the Query-by-Committee method, where, for each model, you calculate the predictions and then select the cases where the models disagree the most\u2014essentially allowing them to vote for the data to be labeled. A little democracy in your AI <\/span><span style=\"font-weight: 400;\">strategy can significantly improve your labeling efficiency. If you want to rig the election, you can always attribute different voting weights for each model. We won\u2019t judge.<\/span><\/p>\n<p><a href=\"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/Active-Learning-2-1.svg\"><img decoding=\"async\" class=\"alignnone size-full wp-image-3010 attachment-svg\" src=\"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/Active-Learning-2-1.svg\" alt=\"\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Another aspect you might want to consider is representativeness. It is easy to understand that, when your model gives you the data it finds most difficult to annotate, it will probably choose some outliers. This will again depend on your specific situation, but you will generally want to give your model data representative of the underlying data distribution. So, for example, if you are working with image data with millions of acquisitions, there is a chance that some of the images are pitch black, or completely blurry. Your model will have difficulty labeling those examples, but they won\u2019t help improve its performance.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2>Practical Considerations<\/h2>\n<p>So far, you already know what active learning is, and how it works. However, there are aspects beyond the theory that you should understand. To take full advantage of this tool, you must know how you should apply it and the ways it can be affected by external factors.<\/p>\n<h4>To err is human&#8230;<\/h4>\n<p><span style=\"font-weight: 400;\">Active learning\u2019s central tool is human intuition and the ability of the oracle (human annotator) to apply that same intuition to a problem. But, much like any other experiment in another field, the tools might not work as expected. Depending on the data that the oracle is analyzing, they may find some data points difficult to understand. Moreover, if the data is composed of, for example, medical images, the oracle may not even have the knowledge to annotate it, as some medical images are difficult to comprehend even for professionals. This means that annotations will vary from person to person.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another important aspect of human nature is that people can be affected by distractions or fatigue. So annotations are subject to different annotators and are also impacted by the person\u2019s surroundings and the time they have spent labeling. Even if the person is focused and knowledgeable, they might still misunderstand the task, which is why it is important to build proper user interfaces and labeling protocols that provide the required information.<\/span><\/p>\n<h4>Mind the costs!<\/h4>\n<p><span style=\"font-weight: 400;\">One might think that reducing the amount of data required to train a model reduces the overall cost of training that model. However, that cost is being paid by the oracle (and by the person that hired them), in the form of human effort, time (and money). Naturally, the task of the oracle should be as effortless as possible, so the objective should not only be to reduce the amount of data to annotate, but also to reduce the effort required to annotate it. This is why, in some cases, it can be useful to let the model help, by providing &#8220;pre-annotations&#8221;, or a prediction.<\/span><\/p>\n<h4>Knowing when to stop.<\/h4>\n<p><span style=\"font-weight: 400;\">When using interactive learning systems, it is important to understand at which point acquiring new data becomes more costly than the errors made by the current model. If it would require excessive resources (e.g., time, money,&#8230;) to generate relatively small gains, then it could be argued that, in some instances, it may not be worth it to use active learning. There is a line for using active learning, and understanding where the line is is important.<\/span><\/p>\n<h4>It&#8217;s time to listen to your AI!<\/h4>\n<p><span style=\"font-weight: 400;\">Now that you know about active learning, give it a try! Let the model choose the data for you, while you sit and relax. Then spend some time annotating that data while the model sits and relaxes. AI is a two-way street, and you\u2019ll find that human-machine collaboration can significantly boost your project\u2019s efficiency.<\/span><\/p>\n<p>If you want to learn more about using model insights to improve your projects, feel free to contact me, and we can discuss what solution is best for you!<\/p>\n<p>&nbsp;<\/p>\n  \n\n <div class=\"author-cta\">\n\t\t<div class=\"author-cta-img\">\n\t\t    \n\t\t    <img decoding=\"async\" width=\"1024\" height=\"906\" src=\"https:\/\/nilg.ai\/wp-content\/uploads\/2022\/07\/Web-Rafael.png\" class=\"attachment-full size-full\" alt=\"Rafael Cavalheiro NILG.AI\" srcset=\"https:\/\/nilg.ai\/wp-content\/uploads\/2022\/07\/Web-Rafael.png 1024w, https:\/\/nilg.ai\/wp-content\/uploads\/2022\/07\/Web-Rafael-300x265.png 300w, https:\/\/nilg.ai\/wp-content\/uploads\/2022\/07\/Web-Rafael-768x680.png 768w, https:\/\/nilg.ai\/wp-content\/uploads\/2022\/07\/Web-Rafael-600x531.png 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t    <\/div>\n\n<div class=\"author-cta-content\">\n\t<h3>Do you want to further discuss this idea?<\/h3><p>Book a meeting with <strong>Rafael Cavalheiro<\/strong><\/p>\t<a class=\"cta_btn\" onclick=\"Calendly.showPopupWidget('');return false;\"  \n\">Meet Rafael<\/a>\n\t\t\t\n\t<a href=\"https:\/\/nilg.ai\/pt\/?post_type=team&p=1650\" class=\"author-cta-link\">Saber mais<\/a>\n\t\t\t<\/div>\n\t<\/div>","protected":false},"excerpt":{"rendered":"<p>The problem: Labeling data is boring (and expensive) So there you are. You have collected your data, analyzed it, processed it, and built your sophisticated model architecture. After many hours of training and evaluating, you have come to a very unpleasant conclusion: you need more data. Before you readjust your budget to fit the extra [&hellip;]<\/p>\n","protected":false},"author":132,"featured_media":3004,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[53],"tags":[178,48,45],"class_list":["post-2884","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical","tag-active-learning","tag-ai4tech","tag-machine-learning"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Increasing Efficiency with Active Learning - NILG.AI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/nilg.ai\/pt\/202303\/active_learning\/\" \/>\n<meta property=\"og:locale\" content=\"pt_PT\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Increasing Efficiency with Active Learning - NILG.AI\" \/>\n<meta property=\"og:description\" content=\"The problem: Labeling data is boring (and expensive) So there you are. You have collected your data, analyzed it, processed it, and built your sophisticated model architecture. After many hours of training and evaluating, you have come to a very unpleasant conclusion: you need more data. Before you readjust your budget to fit the extra [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/nilg.ai\/pt\/202303\/active_learning\/\" \/>\n<meta property=\"og:site_name\" content=\"NILG.AI\" \/>\n<meta property=\"article:published_time\" content=\"2023-03-03T18:47:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-21T17:10:12+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1077\" \/>\n\t<meta property=\"og:image:height\" content=\"718\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Pedro Serrano\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@nilg_ai\" \/>\n<meta name=\"twitter:site\" content=\"@nilg_ai\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pedro Serrano\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/nilg.ai\/202303\/active_learning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/nilg.ai\/202303\/active_learning\/\"},\"author\":{\"name\":\"Pedro Serrano\",\"@id\":\"https:\/\/nilg.ai\/#\/schema\/person\/fa2fadc135ac18cfca6588d50278f854\"},\"headline\":\"Increasing Efficiency with Active Learning\",\"datePublished\":\"2023-03-03T18:47:41+00:00\",\"dateModified\":\"2025-11-21T17:10:12+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/nilg.ai\/202303\/active_learning\/\"},\"wordCount\":1107,\"publisher\":{\"@id\":\"https:\/\/nilg.ai\/#organization\"},\"keywords\":[\"active learning\",\"AI4tech\",\"Machine Learning\"],\"articleSection\":[\"Technical\"],\"inLanguage\":\"pt-PT\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/nilg.ai\/202303\/active_learning\/\",\"url\":\"https:\/\/nilg.ai\/202303\/active_learning\/\",\"name\":\"Increasing Efficiency with Active Learning - NILG.AI\",\"isPartOf\":{\"@id\":\"https:\/\/nilg.ai\/#website\"},\"datePublished\":\"2023-03-03T18:47:41+00:00\",\"dateModified\":\"2025-11-21T17:10:12+00:00\",\"inLanguage\":\"pt-PT\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/nilg.ai\/202303\/active_learning\/\"]}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/nilg.ai\/#website\",\"url\":\"https:\/\/nilg.ai\/\",\"name\":\"NILG.AI\",\"description\":\"Create ever-improving businesses with AI\",\"publisher\":{\"@id\":\"https:\/\/nilg.ai\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/nilg.ai\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"pt-PT\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/nilg.ai\/#organization\",\"name\":\"NILG.AI\",\"url\":\"https:\/\/nilg.ai\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-PT\",\"@id\":\"https:\/\/nilg.ai\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/nilg.ai\/wp-content\/uploads\/2022\/03\/logo.svg\",\"contentUrl\":\"https:\/\/nilg.ai\/wp-content\/uploads\/2022\/03\/logo.svg\",\"caption\":\"NILG.AI\"},\"image\":{\"@id\":\"https:\/\/nilg.ai\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/twitter.com\/nilg_ai\",\"https:\/\/youtube.com\/@nilg_ai\",\"https:\/\/www.linkedin.com\/company\/nilg-ai\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/nilg.ai\/#\/schema\/person\/fa2fadc135ac18cfca6588d50278f854\",\"name\":\"Pedro Serrano\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-PT\",\"@id\":\"https:\/\/nilg.ai\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/906b2944931a56745771ac56707539ac9583e780070b2c1d44a8b5bc02fb1976?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/906b2944931a56745771ac56707539ac9583e780070b2c1d44a8b5bc02fb1976?s=96&d=mm&r=g\",\"caption\":\"Pedro Serrano\"},\"url\":\"https:\/\/nilg.ai\/pt\/author\/pedro-serrano\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Increasing Efficiency with Active Learning - NILG.AI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/nilg.ai\/pt\/202303\/active_learning\/","og_locale":"pt_PT","og_type":"article","og_title":"Increasing Efficiency with Active Learning - NILG.AI","og_description":"The problem: Labeling data is boring (and expensive) So there you are. You have collected your data, analyzed it, processed it, and built your sophisticated model architecture. After many hours of training and evaluating, you have come to a very unpleasant conclusion: you need more data. Before you readjust your budget to fit the extra [&hellip;]","og_url":"https:\/\/nilg.ai\/pt\/202303\/active_learning\/","og_site_name":"NILG.AI","article_published_time":"2023-03-03T18:47:41+00:00","article_modified_time":"2025-11-21T17:10:12+00:00","og_image":[{"width":1077,"height":718,"url":"https:\/\/nilg.ai\/wp-content\/uploads\/2023\/03\/business-hand-robot-handshake-artificial-intelligence-digital-transformation.jpg","type":"image\/jpeg"}],"author":"Pedro Serrano","twitter_card":"summary_large_image","twitter_creator":"@nilg_ai","twitter_site":"@nilg_ai","twitter_misc":{"Written by":"Pedro Serrano","Est. reading time":"6 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/nilg.ai\/202303\/active_learning\/#article","isPartOf":{"@id":"https:\/\/nilg.ai\/202303\/active_learning\/"},"author":{"name":"Pedro Serrano","@id":"https:\/\/nilg.ai\/#\/schema\/person\/fa2fadc135ac18cfca6588d50278f854"},"headline":"Increasing Efficiency with Active Learning","datePublished":"2023-03-03T18:47:41+00:00","dateModified":"2025-11-21T17:10:12+00:00","mainEntityOfPage":{"@id":"https:\/\/nilg.ai\/202303\/active_learning\/"},"wordCount":1107,"publisher":{"@id":"https:\/\/nilg.ai\/#organization"},"keywords":["active learning","AI4tech","Machine Learning"],"articleSection":["Technical"],"inLanguage":"pt-PT"},{"@type":"WebPage","@id":"https:\/\/nilg.ai\/202303\/active_learning\/","url":"https:\/\/nilg.ai\/202303\/active_learning\/","name":"Increasing Efficiency with Active Learning - NILG.AI","isPartOf":{"@id":"https:\/\/nilg.ai\/#website"},"datePublished":"2023-03-03T18:47:41+00:00","dateModified":"2025-11-21T17:10:12+00:00","inLanguage":"pt-PT","potentialAction":[{"@type":"ReadAction","target":["https:\/\/nilg.ai\/202303\/active_learning\/"]}]},{"@type":"WebSite","@id":"https:\/\/nilg.ai\/#website","url":"https:\/\/nilg.ai\/","name":"NILG.AI","description":"Create ever-improving businesses with AI","publisher":{"@id":"https:\/\/nilg.ai\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/nilg.ai\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"pt-PT"},{"@type":"Organization","@id":"https:\/\/nilg.ai\/#organization","name":"NILG.AI","url":"https:\/\/nilg.ai\/","logo":{"@type":"ImageObject","inLanguage":"pt-PT","@id":"https:\/\/nilg.ai\/#\/schema\/logo\/image\/","url":"https:\/\/nilg.ai\/wp-content\/uploads\/2022\/03\/logo.svg","contentUrl":"https:\/\/nilg.ai\/wp-content\/uploads\/2022\/03\/logo.svg","caption":"NILG.AI"},"image":{"@id":"https:\/\/nilg.ai\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/twitter.com\/nilg_ai","https:\/\/youtube.com\/@nilg_ai","https:\/\/www.linkedin.com\/company\/nilg-ai\/"]},{"@type":"Person","@id":"https:\/\/nilg.ai\/#\/schema\/person\/fa2fadc135ac18cfca6588d50278f854","name":"Pedro Serrano","image":{"@type":"ImageObject","inLanguage":"pt-PT","@id":"https:\/\/nilg.ai\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/906b2944931a56745771ac56707539ac9583e780070b2c1d44a8b5bc02fb1976?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/906b2944931a56745771ac56707539ac9583e780070b2c1d44a8b5bc02fb1976?s=96&d=mm&r=g","caption":"Pedro Serrano"},"url":"https:\/\/nilg.ai\/pt\/author\/pedro-serrano\/"}]}},"_links":{"self":[{"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/posts\/2884","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/users\/132"}],"replies":[{"embeddable":true,"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/comments?post=2884"}],"version-history":[{"count":10,"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/posts\/2884\/revisions"}],"predecessor-version":[{"id":5105,"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/posts\/2884\/revisions\/5105"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/media\/3004"}],"wp:attachment":[{"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/media?parent=2884"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/categories?post=2884"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nilg.ai\/pt\/wp-json\/wp\/v2\/tags?post=2884"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}