Google CEO Sundar Pichai was obviously excited when he spoke to developers about a blockbuster result from his machine-learning lab earlier this month. Researchers had figured out how to automate some of the work of crafting machine-learning software, something that could make it much easier to deploy the technology in new situations and industries.
But the project had already gained a reputation among AI researchers for another reason: the way it illustrated the vast computing resources needed to compete at the cutting edge of machine learning.
A paper from Google’s researchers says they simultaneously used as many as 800 of the powerful and expensive graphics processors that have been crucial to the recent uptick in the power of machine learning (see “10 Breakthrough Technologies 2013: Deep Learning”). They told MIT Technology Review that the project had tied up hundreds of the chips for two weeks solid—making the technique too resource-intensive to be more than a research project even at Google.
A coder without ready access to a giant collection of GPUs would need deep pockets to replicate the experiment. Renting 800 GPUs from Amazon’s cloud computing service for just a week would cost around $120,000 at the listed prices.
Feeding data into deep learning software to train it for a particular task is much more resource intensive than running the system afterwards, but that still takes significant oomph. “Computing power is a bottleneck right now for machine learning,” says Reza Zadeh, an adjunct professor at Stanford University and founder and CEO of Matroid, a startup that helps companies use software to identify objects like cars and people in security footage and other video.
The sudden thirst for new power to drive AI comes at a time when the computing industry is adjusting to the loss of two things it has relied on for 50 years to keep chips getting more powerful. One is Moore’s Law, which forecast that the number of transistors that could be fitted into a given area of a chip would double every two years. The other is a phenomenon called Dennard scaling, which describes how the amount of power that transistors use scales down as they shrink. (Read More...)