Faulty Algorithm of YouTube AI Shows Explicit Caption in Videos for Kids

YouTube's algorithm system recently gained the attention of parents and the scientific community due to the platform's automatic captions inserting explicit content in children's videos.

YouTube's Automatic Speech Transcription

The video-sharing platform's Automatic Speech Transcription or ASR is observed displaying words that are not suited for very young audiences, reported Wired.

The issue was investigated by a scholar from New York's Rochester Institute of Technology. The study included 7,000 videos gathered from 24 YouTube channels for kids with the highest subscribers.

From the samples they collected, almost half or 40 percent of the videos included inappropriate words in captions. Out of the total number of videos, one percent had highly inappropriate words.

The study examined each of the videos played on the standard platform instead of the supposed parent-controlled YouTube Kids. The reason is due to the latter not producing any transcribed captions for the younger viewers. In addition, many of YouTube's users are still using the main version of the platform for their children's shows, as found in the study.

Better quality language models that could relay a wider variety of pronunciations might result in an improved automatic transcription on particular platforms, said the authors.

Several observations see that inappropriate and offensive captions are removed, but the company's efforts seem far away from resolving the matter.

Powered by artificial intelligence, the captions in the YouTube platform helps people for better accessibility. Most of the population who benefit from the algorithm are those assisted with human intervention and have hearing loss.

Experts concluded that automatic speech recognition systems utilized in YouTube indeed produce caption content that is highly inappropriate for the younger population as it transcribes videos on the platform.


Automated Caption Problem of YouTube

Kid watching YouTube
A girl watches a video on youtube on a computer on February 27, 2013 in Chisseaux near Tours, central France. ALAIN JOCARD/AFP via Getty Images

The findings on YouTube ASR examination suggest that "inappropriate content hallucinations are far from occasional." Moreover, the algorithm frequently produces explicit words in high confidence, experts added.

Automatic Speech Transcription works similarly to speech-to-text software. It processes audio collected from the video and adds timestamps to each of the words for audiences to read in real-time.

Most of the time, the system does not hear the correct words from the videos. The issue is common, especially with voices that have thick accents and or with poor annunciations.

Authors of the study say the problem can be resolved through broader language models, including other pronunciations of common words.

One example video reviewed by Wired included a clip from the YouTube channel 'Rob the Robot.' The video was published in 2020, producing a phrase of 'strong and rape like Heracles' instead of the audio 'strong and bold.'

Ryan's World, a separate channel, produced the caption 'buy porn' for the audio 'you should also buy corn.'

YouTube spokesperson Jessica Gaby said kids under 13 should use YouTube Kids, as the platform does not have automated captions.

They are improving the automatic captions and reducing errors, Gaby told Wired.

The preprint for the study was titled "Beach to Bitch: Inadvertent Unsafe Transcription of Kid's Content on YouTube."

Check out more news and information on Artificial Intelligence in Science Times.

Join the Discussion

Recommended Stories

Real Time Analytics