Deleted SARS-CoV-2 Data From Wuhan Finally Recovered After a Year to Shed Light on the Origins of COVID-19

In June 2020, American scientists discovered that over 200 gene sequences of SARS-CoV-2 were mysteriously missing from the public database, called Sequence Read Archive (SRA). The online database contains a sample of the virus from patients isolated in China during the early days of COVID-19.

But a recent report from Eminetra claimed that virologist Dr. Jesse Bloom from Fred Hutchinson Cancer Center in Seattle found the early SARS-CoV-2 gene sequences that have gone missing for over a year. He conducted a digital survey and successfully tracked 13 sequences on Google Cloud.

 Deleted SARS-CoV-2 Data from Wuhan Finally Recovered After A Year to Shed Light on the Origins of COVID-19
Deleted SARS-CoV-2 Data from Wuhan Finally Recovered After A Year to Shed Light on the Origins of COVID-19 Pixabay

SARS-CoV-2 Deleted Virus Sequence

Early last year, Wuhan University scientists investigated new ways to test the deadly novel coronavirus causing an epidemic in China. They took virus samples from 34 patients at Huoshenshan Hospital and sequenced short genetic material from them.

The researchers posted their findings in the preprint medRxiv last year, the same time when they also uploaded the gene sequences on the online database Sequence Read Archive that is maintained by the National Institutes of Health.

However, Dr. Bloom noticed that the data was missing. He sent an email to a Chinese scientist on June 6 but did not get any response. So, he posted a report about this matter, which was featured by different media companies.

In addressing this concern, an NIH spokesperson said that the author of the study requested that the sequence be removed from the database in June 2020 as it will be updated and will be added to another database.


However, Dr. Bloom said that it had been a year already, but the said updated sequence was not yet published to any database. Two weeks after his report was published, researchers from China published a report in the database maintained by the National Center for Biological Information.

During a press conference, Chinese officials rejected allegations that the pandemic began in a laboratory leak. They said that researchers thought it was no longer necessary to store the data at the NCBI database, so they decided to take it out. Germany-based Small issued a fix shortly to clarify the error and posted it on Thursday.

Despite that, it remains unclear why the authors requested to remove the sequence from the SRA and why they waited for a year to upload it to another database.

Origins of SARS-CoV-2 Remains A Mystery

According to WebMD, Dr. bloom was able to recover files from the Google Cloud stored in the SRA. He then reconstructed these 13 sequences from the Huanan Seafood Market, which were strains from early on the epidemic, and found that they do not fully represent the viruses that were in Wuhan when COVID-19 starts spreading.

Dr. Bloom wrote that initial sequences had three likely mutations similar to the coronavirus in bats, but the recovered sequences do not explain the origins of the coronavirus. On the other hand, he believes that the virus was already circulating in Wuhan even before December 2019.

"This study provides no evidence either way," he told The Washington Post. "But it does indicate that we probably have not exhausted all relevant data."

Check out more news and information on COVID-19 on Science Times.

Join the Discussion

Recommended Stories

Real Time Analytics