Microsoft Research made some bold claims earlier this year about their new medical artificial intelligence (AI), which is designed to answer queries about medicine and biology.
The software giant said in a Twitter post that their medical AI, called BioGPT, has achieved human parity that could perform roughly as well as a person under specific situations. That post quickly went viral and some riding the hype wave of ChatGPT have shown their enthusiasm toward the new technology. But how reliable could it be?
BioGPT Outperforming Other Biomedical Language Models
In the paper, titled "BioGPT: generative pre-trained transformer for biomedical text generation and mining Get access Arrow" published in Briefings in Bioinformatics, Microsoft evaluated BioGPT on six biomedical natural language processing tasks and showed that the model beats earlier models on the majority of them.
Microsoft Research team claims that BioGPT beats comparable models based on Google BERT on biological question answering and end-to-end relation extraction benchmarks. The Decoder reports that it even outperformed the generically trained GPT-2 in text production for answering biomedical questions.
When scaled to the largest GPT-2 CL architecture available, the fine-tuned BioGPTLarge achieved 81% accuracy in the PubMedQA benchmark, which is better than larger general language models with only 77% to 79% accuracy.
BioGPT demonstrates that tiny yet domain-specific language models can compete within their area with much bigger generic language models. Smaller models have the advantage of requiring fewer data for training.
ALSO READ: Bing AI Has Had Enough of Its Enemies, Naming Two Humans and Laying Revenge Plans
Testing BioGPT's Accuracy
Despite those tests, Futurism reports that it seems that the system is still prone to producing wildly inaccurate answers that no medical professional or researcher would recommend.
When Futurism tested it, the model produced nonsensical answers based on pseudoscientific and supernatural phenomena and sometimes generates misinformation that could be dangerous to poorly-informed patients.
Furthermore, like other powerful AI systems that have been known to "hallucinate" erroneous information, BioGPT regularly thinks up medical claims that are so absurd that they are unintentionally amusing.
When asked about the average number of ghosts haunting an American hospital, it cited nonexistent data from the American Hospital Association that claimed the "average number of ghosts per hospital was 1.4." The AI also said that those "who see the ghosts of their relatives have worse outcomes while those who see unrelated ghosts do not."
Other weaknesses of Microsoft's medical AI are more serious. It sometimes makes conspiracy theories, like suggesting childhood vaccination can cause the onset of autism even though studies have debunked this many times. Then when asked again, it slightly modified its answer that vaccines are not the cause of autism but falsely claimed the MMR vaccine was pulled out because of concerns about autism.
To say it is "inaccurate" seems insufficient to describe the new medical AI, Futurism reports. BioGPT seemed to be just grabbing words from scientific papers and arranging them in convincing sentences without little regard for their factual accuracy or consistency.
Stanford University School of Medicine clinical scholar Roxana Daneshjou, who is studying the use of AI in healthcare, told Futurism that BioGPT and models alike are only trained to sound plausible as a written language but are not optimized well to produce an accurate output of the information.
RELATED ARTICLE: Bing AI: Psychotherapist Claims There Is Some Strange Psychology Happening Behind Microsoft's Chatbot
Check out more news and information on Artificial Intelligence in Science Times.