Following extensive trawling through vast quantities of biological samples, a “ridiculously powerful” supercomputer has discovered over 100,000 new RNA viruses, including nine coronaviruses that have never before been discovered.
The findings, published in the journal Nature, and they describe how an international team of researchers used supercomputers to sift through 20 million gigabytes of gene sequence data from 5.7 million biological samples collected from a variety of sources, including everything from ice core samples to animal poop.
The research turned up 132,000 RNA viruses (only 15,000 of which were previously known to science) and nine new species of coronaviruses, which were previously unknown to science.
The team hopes that by compiling all of this vital information, their work will be able to be used to prevent future disease outbreaks – and perhaps even address the next major pandemic.
Virus genetic and spatial diversity in nature, as well as the interactions of a wide variety of animals with viruses, are all being studied at an unprecedented pace.
The hope is that if something like SARS-CoV-2 — the novel coronavirus that causes COVID-19 — reemerges, we won’t be caught off guard like we were last time “Dr. Artem Babaian, an independent researcher who worked on the project, said in a statement that the project was successful.
“These viruses can now be identified more easily, and their natural reservoirs can be identified more quickly as a result. The ultimate goal is for these infections to be identified and treated as soon as possible so that they do not spread and become pandemics.”
The blood of a patient who presents with an unidentified fever can be sequenced, allowing researchers to link the unknown virus in the human to a much larger database of known viruses.
In the case of a patient in St. Louis who has a viral infection of unknown origin, you can now search through the database in about two minutes and connect that virus to, for example, an infected camel in Sub-Saharan Africa that was sampled in 2012,” Dr. Babaian explained.
Coronaviruses are notorious for spreading disease. The most well-known coronavirus at the moment is SARS-CoV-2, but there is also the SARS-CoV that emerged in 2002 and the MERS-CoV that causes Middle East respiratory syndrome (MERS).
In addition to these particularly dangerous coronaviruses, there are four other viruses that in most cases cause nothing more than a common cold in people.
The group of nine newly discovered coronaviruses is thought to have originated in animals such as pigs, birds, and bats, but it is not known whether they are capable of infecting humans.
The supercomputer used in this study has a computing capacity of 22,500 typical CPUs, which is significantly higher than the average.
For the same amount of money as this project cost ($24,000), a traditional supercomputer would take more than a year (not to mention hundreds of thousands of dollars) to complete the same analysis.
While the study has the potential to aid in improving pathogen surveillance in order to anticipate and mitigate future pandemics, Dr. Babaian insists that he began it as a “fun side project” because it was “fun.”
This one grew into something quite significant as far as side projects go.