AI Systems Prefer Human Voices Rather Than Binary Codes [STUDY]

The world is now in the digital revolution, which is built on the foundation of invisible 1's and 0's known as bits. Today, information and knowledge on the world flow on a stream of 1's and 0's, wherein it is believed that artificial intelligence (AI) systems prefer to speak in binary codes.

This premise is rarely questioned, but a recent study from Columbia University School of Engineering and Applied Science studies is changing this belief.

Mechanical Engineering Professor Hod Lipson and his Ph.D. student Boyuan Chen described in their study, entitled "Beyond Categorical Label Representations For Image Classification," that AI systems might actually favor sound files of human language than numerical data labels to reach a higher level of performance.

In a side-by-side comparison, the Columbia Engineering press release reported that researchers found that neural networks with training labels of human voices sound files perform better in identifying images than those networks programmed with the traditional binary codes.

AI Systems Prefer Human Voices Rather Than Binary Codes [STUDY] Screenshot from YouTube/Columbia Engineering YouTube

AI Systems Trained With Human Voices

Researchers said that understanding the significance of having AI perform better is significant to how neural networks are programmed.

Binary codes are compact and precise when used in conveying information, but human voices are more tonal and analog. Binary codes are more efficient in digitizing data, and programmers rarely do not use them when developing a neural network.

Regarded roboticist Lipson and Ph.D. student Chen had a hunch that neural networks might not be reaching their full potential. They believe that it could improve and might even learn faster and better if it were trained to recognize animals using human voices uttering specific words, Tech Xplore reported.

Together with two other students, Yu Li and Sunand Raghupathi, they set up a controlled experiment of two new neural networks to see if training them to recognize ten objects in 50,000 images will support their hypothesis.

One Ai system was trained with the traditional way of using binary codes where it was fed with a data table of images of animals or objects, while the other AI system is trained with human voices saying the word for the depicted animal or object.

When tested, the first AI system spat out an answer in a series of ten 1's and 0's. While the experimental AI produced a voice that says the name of the image, although initially, it sounded like a garble. But both neural networks correctly identify the images 92% of the time.

ALSO READ : OpenAI Deceived? Handwritten Notes Fool Computer Vision in Typographic Attack

AI Systems Performed Better With Human Voices

In another side-by-side comparison, researchers only fed 2,500 photographs and tested them. An AI system's neural network would perform poorly if the training data is sparse, and that happened to the numerically trained network in this second experiment.

According to Science Daily, the neural network trained by binary codes only accurately identify the animals and objects in the photos 35% of the time in contrast to the 70% accuracy rate of the experimental neural network.

As the researchers were intrigued, they repeated the experiment on another classic AI image recognition challenge by adding more difficult images that an AI would have a hard time understanding.

But still, the neural network trained with human voices fared better with 50% accuracy compared to the numerically-trained neural network that achieved 20% accuracy.

Researchers thought that human language compared to binary codes, has gone through thousands of years of the optimization process that makes it a perfect sense why AI systems perform better with the human voices-trained neural network than the numerically-trained neural network.