I wrote back in September about why programming the GPU is going mainstream. That’s even more the case today, with Amazon’s announcement of a Cluster GPU instance for the Elastic Compute Cloud. It is also a vote of confidence for NVIDIA’s CUDA architecture. Each Cluster GPU instance has two NVIDIA Tesla M2050 GPUs installed and costs $2.10 per hour. If one GPU instance is not enough, you can use up to 8 by default, with more available on request.
GPU programming in the cloud makes sense in cases where you need the performance of a super-computer, but not very often. It could also enable some powerful mobile applications, maybe in financial analysis, or image manipulation, where you use a mobile device to input data and view the results, but cloud processing to do the heavy lifting.
One of the ideas I discussed with someone from Adobe at the NVIDIA GPU conference was to integrate a cloud processing service with PhotoShop, so you could send an image to the cloud, have some transformative magic done, and receive the processed image back.
The snag with this approach is that in many cases you have to shift a lot of data back and forth, which means you need a lot of bandwidth available before it makes sense. Still, Amazon has now provided the infrastructure to make processing as a service easy to offer. It is now over to the rest of us to find interesting ways to use it.
NVIDIA CEO Jen-Hsung Huang spoke to the press at the GPU Technology Conference and I took the opportunity to ask some questions.
I asked for his views on the cloud as a supercomputer and whether that would impact the need for local supercomputers of the kind GPU computing enables.
Although we expect more and more to happen in the cloud, in the meantime we’re going to keep buying devices with more and more solid state memory. The way to think about it is, storage is simply a surrogate for bandwidth. If we had infinite bandwidth none of us would need storage. As bandwidth improves the requirement for storage should reduce. But there’s another trend which is that the amount of data we collect is growing incredibly fast … It’s going to be quite a long time before our need for storage will reduce.
But what about local computing power, Gigaflops as opposed to storage?
Wherever there is storage, there’s GigaFlops. Local storage, local computing.
Next, I brought up a subject which has been puzzling me here at GTC. You can do GPU programming with NVIDIA’s CUDA C, which only works on NVIDIA GPUs, or with OpenCL which works with other vendor’s GPUs as well. Why is there more focus here on CUDA, when on the face of it developers would be better off with the cross-GPU approach? (Of course I know part of the answer, that NVIDIA does not mind locking developers to its own products).
The reason we focus all our evangelism and energy on CUDA is because CUDA requires us to, OpenCL does not. OpenCL has the benefit of IBM, AMD, Intel, and ourselves. Now CUDA is a little difference in that its programming approach is different. Instead of an API it’s a language extension. You program in C, it’s a different model.
The reason why CUDA is more adopted than OpenCL is because it is simply more advanced. We’ve invested in CUDA much longer. The quality of the compiler is much better. The robustness of the programming environment is better. The tools around it are better, and there are more people programming it. The ecosystem is richer.
People ask me how do we feel about the fact that it is proprietary. There’s two ways to think about it. There’s CUDA and there’s Tesla. Tesla’s not proprietary at all, Tesla supports OpenCL and CUDA. If you bought a server with Tesla in it, you’re not getting anything less, you’re getting CUDA more. That’s the reason Tesla has been adopted by all the OEMs. If you want a GPU cluster, would you want one that only does OpenCL? Or does OpenCL and CUDA? 80% of GPU computing today is CUDA, 20% is OpenCL. If you want to reach 100% of it, you’re better off using Tesla. Over time, if more people use OpenCL that’s fine with us. The most important thing is GPU computing, the next most important thing to us is NVIDIA’s GPUs, and the next is CUDA. It’s way down the list.
Next, a hot topic. Jen-Hsun Huang explained why he announced a roadmap for future graphics chip architectures – Kepler in 2011, Maxwell in 2013 – so that software developers engaged in GPU programming can plan their projects. I asked him why Fermi, the current chip architecture, had been so delayed, and whether there was good reason to have confidence in the newly announced dates.
He answered by explaining the Fermi delay in both technical and management terms.
The technical answer is that there’s a piece of functionality that is between the shared symmetric multiprocessors (SMs), 236 processors, that need to communicate with each other, and with memory of all different types. So there’s SMs up here, and underneath the memories. In between there is a very complicated inter-connecting system that is very fast. It’s nearly all wires, dense metal with very little logic … we call that the fabric.
When you have wires that are next to each other that closely they couple, they interfere … it’s a solid mesh of metal. We found a major breakdown between the models, the tools, and reality. We got the first Fermi back. That piece of fabric – imagine we are all processors. All of us seem to be working. But we can’t talk to each other. We found out it’s because the connection between us is completely broken. We re-engineered the whole thing and made it work.
Your question was deeper than that. Your question wasn’t just what broke with Fermi – it was the fabric – but the question is how would you not let it happen again? It won’t be fabric next time, it will be something else.
The reason why the fabric failed isn’t because it was hard, but because it sat between the responsibility of two groups. The fabric is complicated because there’s an architectural component, a logic design component, and there’s a physics component. My engineers who know physics and my engineers who know architecture are in two different organisations. We let it sit right in the middle. So the management lesson learned – there should always be a pilot in charge.
Huang spent some time discussing changes in the industry. He identifies mobile computing “superphones” and tablets as the focus of a major shift happening now. Someone asked “What does that mean for your Geforce business?”
I don’t think like that. The way I think is, “what is my personal computer business”. The personal computer business is Geforce plus Tegra. If you start a business, don’t think about the product you make. Think about the customer you’re making it for. I want to give them the best possible personal computing experience.
Tegra is NVIDIA’s complete system on a chip, including ARM processor and of course NVIDIA graphics, aimed at mobile devices. NVIDIA’s challenge is that its success with Geforce does not guarantee success with Tegra, for which it is early days.
The further implication is that the immediate future may not be easy, as traditional PC and laptop sales decline.
The mainstream business for the personal computer industry will be rocky for some time. The reason is not because of the economy but because of mobile computing. The PC … will be under disruption from tablets. The difference between a tablet and a PC is going to become very small. Over the next few years we’re going to see that more and more people use their mobile device as their primary computer.
[Holds up Blackberry] There’s no question right now that this is my primary computer.
The rise of mobile devices is a topic Huang has returned to on several occasions here. “ARM is the most important CPU architecture, instruction set architecture, of the future” he told the keynote audience.
Clearly NVIDIA’s business plans are not without risk; but you cannot fault Huang for enthusiasm or awareness of coming changes. It is clear to me that NVIDIA has the attention of the scientific and academic community for GPU computing, and workstation OEMs are scrambling to built Tesla GPU computing cards into their systems, but transitions in the market for its mass-market graphics cards will be tricky for the company.
Update: Huang’s comments about the reasons for Fermi’s delay raised considerable interest as apparently he had not spoken about this on record before. Journalist Nico Ernst captured the moment on video: