Software with artificial intelligence is getting surprisingly good at having conversations, winning board games, and making art. But what about making software? In a new paper, researchers at Google DeepMind say that their AlphaCode program can stay with the average human coder in standardized programming contests.
The researchers published their study in the journal Science.
Google's DeepMind Artificial Intelligence Solves Programming Challenges
Gizmodo said AlphaCode worked "approximately as well as a human" in the test and solved natural language problems it had never seen before. It did this by guessing code segments and coming up with millions of possible solutions. After making a lot of solutions, AlphaCode narrowed them down to a maximum of 10. The researchers say that all solutions were made "without any built-in knowledge about how computer code works."
AlphaCode was ranked in the top 54.3% on average in simulations of recent coding competitions on the Codeforces competitive coding platform, which only allowed ten solutions per problem per generation. However, the first submission solved 66% of these problems.
ALSO READ : Machine Learning Model Haunted by Artificial Intelligence Generated Bloody-Faced 'Loab' [Report]
That might not sound like much, especially when you compare it to how well models compete against humans in complex board games, but the researchers say that doing well in coding competitions is especially hard. AlphaCode had to simplify different coding problems in natural languages and afterwards give "reason" about problems it hadn't seen before. It couldn't just memorize bits of code. AlphaCode was able to resolve issues it hadn't seen before, and the researchers say they didn't find any evidence that their model just copied core logic from the training data. Researchers say that these things make AlphaCode's performance a "big step forward".
Rising to the Challenge
Ars Technica said the system could be used to suggest many possible solutions in code, but it also significantly impacted how those possible solutions were scaled. That's because the code it made on its own could have been better. Over 40% of the suggested solutions either used up all the memory of the system it was tested on or didn't work in a reasonable amount of time.
One way to find the wrong code was to look at the examples given in the challenge and see if any program made by AlphaCode could pass that simple test. This got rid of almost all of AlphaCode's first suggestions since only about 1% of the code passed this simple test.
DeepMind's staff had to use some simple logic to move on to the next level of filtering. Most of the time, solutions that worked were similar to each other. This made it easy to find similar code in the vast sea of possible code. On the other hand, wrong answers will be given out at random. So, the system found the ten biggest groups of related code that gave the same result when given a set of inputs and chose one example from each group.
Under a third of the time, the result could lead to programs that work and are successful. But it turns out that humans can only sometimes make successful code in response to these challenges, too, so the system was able to place in the top 54 percent of competitors in these competitions. That is, more than half of the people who enter these contests could beat the AI. The paper's authors said this performance in competitions corresponds to a novice programmer with a few months to a year of training.
Check out more news and information on Artificial Intelligence and Technology in Science Times.