Beyond what it is already doing, with a little more audacity Machine Learning could help us decipher the mysterious COVID-19 crisis and fight its successors.

After one full year, the COVID thing is still ripe with mysteries.

Luckily we already have vaccines and promising drugs like specific monoclonal antibodies – both approved for emergency use.

However, physicians, whose job is to apply general scientific guidelines to specific cases, still don’t quite know how to treat the disease, and the fortunes of many drugs and remedies used so far are swinging up and down in both scientific papers and medical practice.

Similarly, on other COVID subjects, authoritative scientific journals publish studies proclaiming dreadful or useful truths, only to issue stark denials after a few months.

All the above is a measure of the exceptional research effort as well as the overall uncertainty.

Malign SARS-CoV-2 variants are emerging, and we know little of their actual impact.

Estimates of the basic reproduction number R0 of the virus have been varying by almost an order of magnitude.

Germany seemed an inviolable fortress during the first wave but now appears on the verge of kneeling every other day – and we sure hope it won’t. Same with Israel or Greece.

Bewilderingly, the disease seems to have killed at least twice as many infected people in New York City, Boston, or Milan as in any other city of the world.

What to make of these, and many other, odd COVID facts? Is there something hidden in them, that we could use to fight this virus, and the future ones, more effectively if we only found out?

Blurred data

In addition to the inherent complexity of the situation, most available data are buried in noise.

One example is the incidence of the infection. Territories don’t really ever know how many residents are infected, and they can only guess, like using a vague x6 multiplier of Confirmed Cases based on last summer’s seroprevalence testing in some regions. And it could be wildly off the mark.

Another example is testing: some territories report Samples Tested while others report People Tested and, adding to the mess, procedures of substantially different reliabilities are used.

Other sources of noise include the different and time-changing criteria with which territories classify critical cases or those with which deaths are registered.

Under such circumstances, predicting economy or health-care outcomes is a gamble.

Enter Machine Learning

This looks like a good case for using Data Science / Machine Learning.

DS / ML is what scientists can sometimes turn to when they have no idea what on earth the mechanistic/analytic model of a phenomenon might be.

When a likely model is conceivable, the researcher will devise some equations, solve them, then use the data to check if the model does indeed explain the empirical evidence (and consequently has predictive power).

But when a model, a reasonable plot, is not even emerging like with the SARS-CoV-2 epidemiological and clinical situations today, then one may be better off “letting the data do the talking from the very start” to use the words of MIT’s Max Tegmark.

Digging out hidden variables

Born within the domain of statistics, the latent-variable model is a method used in machine learning to discover hidden variables that, while not directly observable empirically, can lead to understanding a natural phenomenon.

Applications include diagnosing in medicine or making recommendations to online users based on past actions.

In the COVID context, the effective reproduction number R is an example of a useful latent variable. By estimating its value after the facts, we can tell if certain non-pharmaceutical interventions (NPIs) are useful in slowing the infection.

But we could get bolder and use unsupervised learning latent-variable models to try and figure out what is the actual mechanism behind some of the weirdest COVID facts, like the ones alluded to in the introduction of this post.

From understanding to forecasting

And this could lead to better forecasting. Writes physicist and neuroscience PhD student Manuel Brenner:

“If successfully learned, latent variable models not only give us the tools to explain the data, but they allow us to build a generative model of a given data distribution, making it possible to generate entirely new data following the same distribution by sampling from the latent variables”.

Remember GANs (read the relevant paragraph here)? Remember fake faces or videos?

Well, this is exactly the point. Figure out what the rules of the game are, then start playing.

So, for example,

Our anti-COVID measures, like NPIs, vaccination plans, or medical care, can be adjusted based on the statistics of the latent variable(s), thus taming the virus in a shorter time.

We could also acquire a more reliable understanding of the effects of reopening schools or certain businesses or geographical areas.

Early attempts

In December, an Italian economist attempted something large-scale on Covid with his Synthetic COVID Index for Italy.

The study ends up in a mere search for a ‘strength of the pandemic’ index and fails to provide explanations of the phenomena in progress. It also lacks predictive power. This quite moderate success may be due to the very rigor of the approach, which has caused the author to take as few risks as possible.

And/or it could be due to insufficient training: all experts I consulted did complain that data is currently scarce to deeply train an effective system.

I surmise that they could be bolder (synthetic data augmentation? Careful interpolation of data from different but compatible regions?…). But the players are them: I’m but an incompetent spectator in the stands.

It might simply be that machine learning still has not conquered the hearts of sufficient numbers of epidemiologists, sociologists, and political scientists.

I would like to see some applications before concluding that machine learning cannot help us master the epidemiological mechanisms of COVID and can only be applied to finding point solutions, however useful (see for example. here, here or here).

So what?

Despite the scientific progress made, after a year COVID still contains many unknowns and risks.

As a consequence, devising economic and social security policies is an adventure, in all countries except perhaps less than 5% who were ready to act at the end of 2019.

The situation seems to be telling us something which we still don’t grasp. A credible interpretation would be useful now and with future pandemics.

I am waiting for ‘AI’ to come to the rescue, in the form of well-thought-of, human-driven, machine learning bold efforts.

Stay tuned.

Paolo Magrassi Attribution 3.0 Unported (CC BY 3.0)