Every aspect of our daily routines was hit by COVID this year, from our work and industries to social interactions. Nine months have passed since we started lockdowns, and the numbers are still increasing.

The general public opinion is that governments worldwide have mismanaged the pandemic, with decisions driven mainly by political fundamentalisms. For example, we just observed how the Portuguese government announced more restrictions on business operations and mobility to reduce family encounters. The decision was made based on the idea that most of the infections (67%) happened in such a context (here). About a month later, it turned out that the figure is much lower, representing just 10% of the known cases (without any amendment on the decisions) (here). I guess somebody forgot to attend his class on Survivorship bias!.

The long-term impact of these decisions is much larger than we think. There is a major economic crisis arising from this situation and thousands of apparently non-related deaths that nobody wants to explain, even less to be accounted for. This is how considering your biased beliefs above evidence backed by data became catastrophic.

However, there is an alternative, using data to support decision-making. Entrepreneurs, in general, have worked in highly uncertain scenarios for millennia. While most companies learned to embrace data as a way of supporting decision-making decades ago, politicians of any side of the spectrum seem to be in love with populism-driven decision-making. It’s time for Data Scientists to take the wheel -or at least to be in the cockpit- so we can redirect the debate to the scientific field.

Where can we help?

We all know tackling patient-level applications with AI is promising. However, my opinion is that we should deviate part of our focus from diagnosing COVID and smart respirators, to reducing the spread in the first place. As we discussed in our previous blog post, there’s a lot of space for AI at a hospital and at a society-level. With vaccines passing the required tests for market introduction, the hot topic for the next months -years? Hopefully not!- will be efficient vaccination policies. Even if we have a vaccine, the resources involved in global vaccination will take time to be fully deployed. So, we need strategies to vaccinate the right people.

Let’s sketch an idea on how we could use AI to optimize vaccination.


Optimizing COVID vaccination with AI

What’s our goal here? I guess it’s achieving a certain level of group immunity with a minimum number of vaccines. I know there are other factors (operational costs, logistics, deaths), but the number of administered vaccines seems like an intuitive constraint for the next months.
Taking sentiments (and politics) aside, vaccinating the high-risk population is not the ideal solution. For example, you will need to vaccinate dozens/hundreds of elders to reduce the community risk for a nursing home considerably, but just a few to vaccinate the staff that interacts with these high-risk patients daily. This happens in general with any community with a low number of inbound interactions (e.g., prisons).

Predictive Model to fill the gaps

The first thing we need to realize is that we are not aiming for vaccinating more people, but more immunized contacts (e.g., a handshake is safe if at least one of the hands is immune to the disease). So, we need to understand not just the “inbound” risk for a person when designing these policies, but the “outbound” risk. Namely, who will contribute the most to spread the disease?

Governments and health authorities collected some data about infections. So far, this data focused on understanding the disease impact on the patient and not so much on societal transmission. We need to address this ASAP! Then, let’s say that we are building a predictive model that, given an individual, predicts how many patients will get infected by him (please consider transitivity – i.e., K-order infections). The model may look at attributes such as age, sex, home address, working address, public transportation usage, mobility patterns, profession, stats about the workplace, stats about the relatives, etc. Always with a focus on spread potential and not just on personal risk. Most of this information can be extracted from the social security, tax collection, and public registries. Other data can be trickier to obtain, surpassing data protection boundaries (when in doubt, privacy goes first). Take a look at DSSG PT’s open letter to DGS for good guidelines on data collection.

There’s a problem when building such models; we don’t have reliable data on transmissions. Most of the infections are of an unknown source. Therefore, the models we build must consider that we won’t observe most data points and that what’s observed is biased towards infected people with known infection sources (e.g., the child that infected his mother).

Building optimal policies: from predictions to decisions

Let’s say we have built such a model that, given an individual, predicts the outgoing expected degree of contagiousness. Who should we vaccinate now? Just the ones with a higher infectious rate? Probably not. If we knew links between persons, we could estimate the graph’s minimum cuts or a vertex cover to ensure COVID won’t move from one community to another. In some local cases, we will have access to such graphs (e.g., inside a hospital, in a school, etc.), and we could aim for the best solution. In most cases, considering the highly dynamic world we live in, we won’t. So, we need to think about heuristic alternatives that deal with partial knowledge graphs by looking at known relationships (e.g., family aggregates and place of work) and the predicted transmission rate per person.


Note on fairness: The model may harm certain groups of people. This is especially true if we consider that some people may work in informal economies, which would limit the visibility of their contact networks. Therefore, a naive method would exclude them from a fair and needed access to the vaccine. Take a look at our previous blog post on Fairness in AI for more details.

We understand that the solution described here may not be scalable or even feasible. However, this post’s goal is not to solve the problem but to open the discussion, creating a debate in the data community for proposing solutions that have been overlooked and that would undoubtedly increase the reach of such programs.

Do you have any more ideas for applying data science for optimizing this massive logistic issue? Let us know!


Disclaimer: the views, thoughts, and opinions expressed in the text belong solely to the author, and not necessarily to the author’s employer, organization, committee, or other group or individual.