Unfolding the protein-folding fuss
Has a computer solved an outstanding 50-year old scientific problem? Has General Artificial Intelligence arrived on November 30, 2020?
In this era, scientific results are announced like new movies, with trailers, interviews, and all the rest … months or years before the movie is released.
We had to come to terms with this reality once again, after witnessing the magniloquent media fanfare around DeepMind, a subsidiary of Alphabet Inc., and its great creation AlphaFold.
Last November its newest version, AlphaFold 2, won hands down a traditional contest where algorithms of various sorts are tested bi-annually as possible alternatives to lengthy and costly laboratory projects to decipher proteins.
Knowing the exact shapes of different proteins in short times (we are talking hours instead of months here) can help improve tremendously the cost and speed of developing things like new drugs or sustainable fuels, devise new ways to degrade waste plastic, and much more.
The trade press, including the inevitably ambiguous media outlets of revered scientific journals, was nearly unanimous in proclaiming that AlphaFold has solved one the toughest scientific problems of our time.
Let’s keep calm
As we already discussed, the science lab is where Deep Learning, the cutting edge of today’s AI, gives the best of itself. And AlphaFold’s step is indeed a big one.
However, we also know that the enthusiastic AI community has a tendency to announce results with hyperbole.
In this case, the fact that the AlphaFold software correctly predicted about two-thirds of experimental results has completely overshadowed the complementary finding: AlphaFold missed one-third of them.
In a real-world lab, that is, some full-blown experimental work would be necessary to double-check at least some of AlphaFold results.
Said in a different way: the proteins being analyzed were known to the scientists who had organized the contest. But in normal, daily lab settings most molecules would be new. Hence the net-net is that AlphaFold’s predictions are only 67% reliable.
“Would you buy a satnav that was only 67% accurate?”
Stephen Curry, Professor of Structural Biology, Imperial College, London
In his blog, Dr. Curry expresses admiration for the achievements of the AlphaFold team but warns that they will advance biology rather than immediately transform it like media reports, including from Nature News, tried to make us believe on November 30.
Furthermore, AlphaFold had been trained on data describing existing protein structures, i.e. that had already been experimentally identified: the tool may not be as good at predicting structures that are too far from those represented in the training set.
That said, chemistry blogger and pharma expert Derek Lowe, a scholar usually skeptical of bombastic AI announcements, concurs that “getting that level of structural accuracy on that many varied proteins is something that has just never been done before”.
He is echoed by Lion Pachter, Bren Professor of Computational Biology at Caltech, who says that protein folding “is not a solved problem” because of AlphaFold but the method “produced very impressive results”.
On about the same line is chemistry Nobel Prize winner Venki Ramakrishnan, calling the result “a stunning advance on the protein folding problem.”
Is this General AI, making biologists irrelevant?
Unlike what happened in 2017 with their previous great achievement, AlphaGo Zero, this time around I have not heard DeepMind claim that their algorithm can learn tabula rasa, i.e. knowing nothing in advance and simply training by playing against itself in reinforcement learning style.
That statement did not convince many common people like me. And it irritated some scholars, who pointed out how numerous domain experts (including a Go grandmaster) had participated in the construction of the AlphaGo Zero algorithm.
AlphaFold was introduced differently from the beginning.
DeepMind made it clear both now and back then that the neural network features loads of computational biology competence. It contains a domain-related knowledge graph representing the protein and uses “evolutionarily related sequences, multiple sequence alignment […], and a representation of amino acid residue pairs” to refine it.
AlphaFold has tackled the challenge using compute brute force, the optimization capabilities of a deep neural network, and the domain knowledge of the experts who had instructed it on how to reason about the geometry of protein curls and curated the training data set.
Our current AI contains human intelligence and is best used side by side with humans, augmenting and aiding them.
As Lowe points out, “you cannot just compute your way out of problems like this one if you don’t have some solid ideas about where you’re going and how you’re going to find a path forward”.
AlphaFold’s results are overstated in the early media reports. They will become clearer when its creators publish a peer-reviewed scientific paper and disclose the software (the 2018 version, or chunks of it, has been on GitHub for a long time).
Regardless, by showing how one of the most laborious steps in biology could be scaled down by an order of magnitude, AlphaFold 2 is likely to provide us with yet another example, and a superb one, of what well-trained neural networks in the hands of domain experts can accomplish.
Developers of future business or social applications of AI should treasure this experience.
A scientific experiment is an excellent playground for Deep Learning algorithms because their masters are very good at teaching them how to learn, and the domain, however large, is controlled and as constrained (border-patrolled) as possible.
The more the application domain becomes open, and/or common sense becomes paramount, the more current AI algorithms will be likely to sometimes fail miserably and without warning.