DeepMind's human-bashing AlphaGo AI is now even stronger (GOOG)

Go is a Chinese board game that dates back thousands of years. There are more possible Go moves than there are atoms in the universe so it's been incredibly tough for machines to crack Go.
The AlphaGo AI agent that has beat Go world champions Lee Sedol and Ke Jie over the last 18 months has now been upgraded.
Unlike the previous versions of AlphaGo, the new "AlphaGo Zero" AI learns how to play Go without any human data. This is a major breakthrough in the field of AI.

DeepMind, the London-based artificial intelligence (AI) lab acquired by Google for a reported £400 million, announced on Wednesday that it has significantly improved its most famous AI agent: AlphaGo.

AlphaGo, an algorithm that's made headlines for mastering the ancient Chinese board game of Go and defeating some of the best human players in the world, has been modified and reprogrammed into a new AI called AlphaGo Zero.

DeepMind CEO Demis Hassabis told journalists at Google's UK headquarters in King's Cross, London, that AlphaGo Zero is "much stronger" than AlphaGo.

Go is a simple game yet highly complex at the same time. There are only a few rules but there are a lot of potential moves that can be played at any one time — the figure is considerably higher than the number of atoms in the universe.

The original AlphaGo was impressive but it's no match for AlphaGo Zero. The enhanced AlphaGo defeated the AlphaGo AI that beat world champion Lee Sedol 4-1 in Korea last March by 100 games to 0 after just three days of practice. And after 40 days of training, it beat "AlphaGo Master," which successfully took on current world champion Ke Jie in China in May.

Hassabis said that AlphaGo Zero has essentially acquired thousands of years of human knowledge during a period of just a few days, while also discovering new knowledge, Go strategies, and creative new Go moves.

AlphaGo Zero taught itself how to play Go without any human input

The main difference between the old AlphaGo AIs and the new one is that one learns how to play Go from human data and one doesn't.

All previous versions of AlphaGo started by training on human data (amateur and professional Go matches) that was downloaded from online sites. They looked at thousands of games and were told what moves human experts would make in certain positions. But AlphaGo Zero doesn't use any human data whatsoever. Instead, AlphaGo Zero has learned how to play Go for itself, completely from self play.

David Silver, DeepMind's lead AlphaGo researcher, explained how AlphaGo Zero learns completely tabula rasa (from scratch).

"It uses a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher," said Silver, who met Hassabis while completing his undergraduate degree in computer science at the University of Cambridge. "The idea is that it starts off with a neural network that knows nothing about the game and it plays thousands of games against itself. Each move, what it does, is it combines this neural network with a powerful search algorithm and it uses that to actually pick the next move to play.

"And at the end of each of these games it actually trains a new neural network. It improves its neural network to predict the moves which AlphaGo Zero itself played and also predict the winner of these games. When it does this it actually produces an even more powerful neural network, which leads to another new iteration of the player. So we end up with a new version of AlphaGo Zero that's even stronger than what came before and in turn as this process is repeated, it gives rise to ever better quality data which is used to train even better neural networks, and the process repeats."

Discovering New Knowledge

This isn't the first time AI researchers have programmed an algorithm to learn without human data. In August, OpenAI, the AI research firm that has been backed by Elon Musk with $1 billion, revealed that it had created an AI that can teach itself how to play the "Defense of the Ancients" computer game without any human inputs.

AlphaGo Zero also uses an order of magnitude less compute power than the previous versions of AlphaGo, suggesting that it's the algorithmic advances that lead to more progress than compute power or data.

AlphaGo Zero is a major breakthrough that has earned DeepMind's research another spot in the scientific journal Nature.

Getting machines to become "superhuman" at certain tasks without feeding them human data has been a long-standing challenge in the AI research community, which is held back when human data is too expensive, too unreliable, or simply unavailable.

Silver, someone that Business Insider once crowned the unsung hero of Google DeepMind, added: "By not using this human data, features, or expertise in any fashion, we've actually removed the constraints of human knowledge. It's able to therefore create knowledge for itself from first principles, from a blank slate, and work out its own strategies, and its own novel ways of playing the game. This enables it to be much more powerful that previous versions."

When asked how many Alphabet dollars DeepMind used to fund all of its AlphaGo work, Hassabis said it was hard to quantify before admitting that the figure is "probably quite scary." Around 15 of DeepMind's top people, who are likely to be on six or even seven figure salaries, have worked on AlphaGo full time and the company has used a lot of Google compute power.

Machines are still nowhere near as Hollywood portrays them

While the AlphaGo Zero breakthrough is impressive, it's worth noting that researchers are still a long way off the AIs depicted in Hollywood films like "Ex-Machina" or "Her". AI agents today can typically excel at one task (such as a game) but they'd struggle to do multiple tasks at the same time, especially if those tasks are in different domains.

DeepMind is, however, now looking at how it can apply algorithms built on the same principles as AlphaGo Zero to real scientific challenges like protein folding, reducing energy consumption, searching for new materials, or discovering new drugs.

"We're trying to build general purpose learning algorithms here and this is just one step towards that, but quite an exciting step," said Hassabis during the press briefing. "A lot of the AlphaGo team are now onto other projects and trying to apply this technology to other domains."