Category Archives: intel

Ubuntu 9.04 not so jaunty

I still love Ubuntu, but it’s hard to find much to enthuse about in the latest release, 9.04 also known as Jaunty Jackalope. As this post observes, most of the changes are under the hood, so users will not notice much difference from the previous release, Intrepid Ibex or 8.10. Well, there’s faster start-up, and Open Office 3.0 – but then again, I installed Open Office 3.0 as soon as Intrepid came out, so this is not really exciting.

My own upgrade went better than the last one, but I’ve still had problems. Specifically:

  • I had to edit Grub’s menu.lst manually after the upgrade. I always have to do this, since it detects the hard drive configuration incorrectly.
  • My Adobe AIR installation was broken and had to be re-installed
  • I’ve lost hardware graphics acceleration and desktop effects. This is a laptop with embedded Intel graphics; apparently this is a common problem and Intel graphics support in Jaunty is work in progress. See here for more details an experimental suggested fix, which is not for the faint-hearted.

There are other updates, of course, and I was glad to see Mono 2.0.1 and MonoDevelop 2.0 available in the repository, for .NET development on Linux. If Jaunty is the same as before, but faster and more stable, that is no bad thing, though the shaky Intel graphics support undermines that argument.

My question: why is Canonical persevering with its policy of supposedly major releases every six months? This looks to me like a minor update; would it not be better presented as updates to 8.10, and focusing efforts on 9.10 in October? Six-monthly releases must be a heavy burden for the team.

I don’t mean to put you off Ubuntu. It is well worth trying either as a companion or alternative to Windows and Mac.

Update:

I have fixed my desktop effects. How? First, a little more about the problem. DRI (Direct Rendering Infrastructure) was not enabled. My graphics card (from lspci –nn | grep VGA) is:

Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)

The problem I had before was reported in Xorg.0.log as:

Xorg.0.log:(EE) intel(0): [dri] DRIScreenInit failed. Disabling DRI.

I also noticed that /dev/dri/card0 did not exist on my system.

Well, I tried the technique described here. That is, I booted into an older version of the kernel; the oldest available on my system being 2.6.22.14. DRI magically started working. Then I rebooted into the latest version of the kernel, 2.6.28.11. DRI still works. So I am sorted. I’d be interested to know why this works.

Parallel Programming: five reasons for caution. Reflections from Intel’s Parallel Studio briefing.

I’m just back from an Intel software conference in Salzburg where the main topic was Parallel Studio, a new suite which adds Intel’s C/C++ compiler, debugging and profiling tools into Visual Studio. To some extent these are updates to existing tools like Thread Checker and VTune, though there are new features such as memory checking in Parallel Inspector (the equivalent to Thread Checker) and a new user interface for Parallel Amplifier (the equivalent to VTune). The third tool in the suite, Parallel Composer, is comprised of the compiler and libraries including Threading Building Blocks and Intel Integrated Performance Primitives.

It is a little confusing. Mostly Parallel Studio replaces the earlier products for Windows developers using Visual Studio; though we were told that there are some advanced features in products like VTune that meant you might want to stick with them, or use both.

Intel’s fundamental point is that there is no point in having multi-core PCs if the applications we run are unable to take advantage of them. Put another way, you can get remarkable performance gains by converting appropriate routines to use multiple threads, ideally as many threads as there are cores.

James Reinders, Intel’s Chief Evangelist for software products, introduced the products and explained their rationale. He is always worth listening to, and did a good job of summarising the free lunch is over argument, and explaining Intel’s solution.

That said, there are a few caveats. Here are five reasons why adding parallelism to your code might not be a good idea:

1. Is it a problem worth solving? Users only care about performance improvements that they notice. If you have a financial analysis application that takes a while to number-crunch its data, then going parallel is a big win. If your application is a classic database forms client, it is probably a waste of time from a performance perspective. You care much more about how well your database server is exploiting multiple threads on the server, because that is likely to be the bottleneck.

There is a another reason to do background processing, and that is in order to keep the user interface responsive. This matters a lot to users. Intel said little about this aspect; Reinders told me it is categorised as convenience parallelism. Nevertheless, it is something you probably should be doing, but requires a different approach than parallelising for performance.

2. Will it actually speed up your app? There is an overhead in multi-threading, as you now have to manage the threads as well as performing your calculations. The worst case, according to Reinders, is a dual-core machine, where you have all the overhead but only one additional core. If the day comes when we routinely have, say, 64 cores on our desktop or laptop, then the benefit becomes overwhelming.

3. Is it actually desirable on a multi-tasking operating system? Consider this: an ideally parallelised application, from a performance perspective, is one that uses 100% CPU across all cores until it completes its task. That’s great if it is the only application you are running, but what if you started four of these guys (same or different applications) simultaneously on a quad-core system? Now each application is contending with others, there’s no longer a performance benefit, and most likely the whole system is going to slow down. There is no perfect solution here: sometimes you want an application to go all-out and grab whatever CPU it needs to get the job done as quickly as possible, while sometimes you would prefer it to run with lower priority because there are other things you care about more, such as a responsive operating system, other applications you want to use, or energy efficiency.

This is where something like Microsoft’s concurrency runtime (which Intel will support) could provide a solution. We want concurrent applications to talk to the operating system and to one another, to optimize overall use of resources. This is more promising than simply maxing out on concurrency in every individual application.

4. Will your code still run correctly? Edward Lee argues in a well-known paper, The Problem with Threads, that multi-threading is too dangerous for widespread use:

Many technologists are pushing for increased use of multithreading in software in order to take advantage of the predicted increases in parallelism in computer architectures. In this paper, I argue that this is not a good idea. Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism. Threads, as a model of computation, are wildly nondeterministic, and the job of the programmer becomes one of pruning that nondeterminism. Although many research techniques improve the model by offering more effective pruning, I argue that this is approaching the problem backwards. Rather than pruning nondeterminism, we should build from essentially deterministic, composable components. Nondeterminism should be explicitly and judiciously introduced where needed, rather than removed where not needed.

I put this point to Reinders at the conference. He gave me a rather long answer, saying that it is partly a matter of using the right libraries and tools (Parallel Studio, naturally), and partly a matter of waiting for something better:

Law articulates the dangers of threading. Did we magically fix it or do we really know what we’re doing in inflicting this on the masses? It really come down to determinism. If programmers make their program non-deterministic, getting out of that mess is something most programmers can’t do, and if they can it’s horrendously expensive.

He’s right, if we stayed with Windows threads and Pthreads and programming at that level, we’re headed for disaster. What you need to see is tools and programming templates that avoid that. The evil thing is what we call shared mutable state. When you have things happening in parallel, the safest thing you can do is that they’re totally independent. This is one of the reasons that parallelism on servers works so well, in that you do lots and lots of transactions and they don’t bump into each other, or they only interface through the database.

Once we start opening up shared mutable state, encouraging threading, we set ourselves up for disaster. Parallel Inspector can help you figure out what disasters you create and get rid of them, but ultimately the answer is that you need to encourage people to use programming like OpenMP or Threading Building Blocks. Those generally guide you away from those mistakes. You can still make them.

One of the open questions is can you come up with programming techniques that completely avoid the problem? We do have one that that we’ve just started talking about called Ct … but I think we’re at the point now where OpenMP and Threading Building Blocks have proven that you can write code with that and get good results.

Reinders went on to distinguish between three types of concurrent programming, referring to some diagrams by Microsoft’s David Callaghan. The first is explicit, unsafe parallelism, where the developer has to do it right. The second is explicit, safe parallelism. The best approach according to Reinders would be to use functional languages, but he thinks it unlikely that they will catch on in the mainstream. The third type is implicit parallelism that’s safe, where the developer does not even have to think about it. An example is the math kernel library in IPP (Intel Integrated Performance Primitives) where you just call an API that returns the right answers, and happens to use concurrency for its work.

Intel also has a project called Ct (C/C++ for Throughput) which is a dynamic runtime for data parallelism, which Reinders considers also falls into the implicit parallelism category.

It was a carefully nuanced answer, but proceed with caution.

5. Will your application need a complete rewrite? This is a big maybe. Intel’s claim is that many applications can be updated for parallelism with substantial benefits. A guy from Nero did a presentation though, and said that an attempt to parallelise one of their applications, a media transcoder, had failed because the architecture was not right, and it had to be completely redone. So I guess it depends.

This brings to mind another thing which everyone agrees is a hard challenge: how to design an application for effective parallelism. Intel has a tool in preparation called Parallel Advisor, to be part of Parallel Studio at a future date, which is meant to identify candidates for parallelism, but that will not be a complete answer.

Go parallel, or not?

None of the above refutes Intel’s essential point: that effective concurrent programming is essential to the future of computing. This is an evolutionary process though, and at this point there is every reason to be cautious rather than madly parallelising every piece of code you touch.

Additional Links

Microsoft has a handy Parallel Computing home page.

David Callaghan: Design considerations for Parallel Programming

VirtualBox is amazing, 50% faster than Virtual PC on my PC

It was only when Sun acquired it that I got round to trying VirtualBox, a free open source virtualization utility. I was immediately impressed, not least by its performance. It just felt snappy, something I’ve never been able to say about Microsoft’s Virtual PC, useful though it is. When I needed to set up a new virtual machine in order to do some Delphi 7 development, I decided to use VirtualBox rather than Virtual PC. Again, I’ve been very impressed. I thought it would be interesting to see if my perception of good performance would be verified by a test suite, so I dug out the PassMark suite and ran a few tests.

Note that both Virtual PC and VirtualBox can use Intel’s Virtualization Technology CPU extensions (AMD have similar extensions, but I’m running on an Intel Core 2 Quad). I ran PassMark on XP Pro with SP3, under both Virtual PC and VirtualBox, with hardware virtualization first enabled, and then disabled. I ran it full screen, with as little as possible running on the underlying OS (Vista 32-bit). PC additions were installed. Both virtual machines were given 512MB RAM. Here are the surprising (to me) results:

  • Virtual PC 2007 with hardware virtualization: 399.6
  • Virtual PC 2007 without hardware virtualization: 345.9
  • VirtualBox 1.5.6 with hardware virtualization:  542.9
  • VirtualBox 1.5.6 without hardware virtualization: 616.4

So on my machine (your results may vary) VirtualBox is faster without hardware virtualization, and more than 50% faster than the best result from Virtual PC.

I drilled into the results a little. On the CPU tests there was not a big difference; in some cases Virtual PC was ahead. On the Graphics 2D tests though, VirtualBox was dramatically faster – more than twice as fast on the GUI test, for example. It was also dramatically faster on disk I/O. For example:

Disk- Sequential Read: VirtualBox 143.4 MB per second vs Virtual PC 90.8 MB per second

Disk – Sequential Write: VirtualBox 97.4 MB per second vs Virtual PC 6.8 MB per second

I’m not surprised that this makes a big difference to perceived performance, since Windows spends much of its time reading and writing temporary files. This may also be why VirtualBox seems to start up and shut down much more quickly.

I don’t claim that my informal tests prove that VirtualBox is a faster performer in every case. Maybe there is some setting I could change that would improve Virtual PC’s speed; or maybe Virtual PC likes some hardware better than others. Still, it is a real-world experience, and enough to make me suggest that you give VirtualBox a try if you have yet to do so. By the way, both these products are free.

Finally, let me note that Vista running directly on the hardware scores a PassMark of around 1100 on this machine. Even VirtualBox is a lot slower than the real thing, as it were.

Update: If you try VirtualBox, make sure you get at least version 1.6.2 (6th June 2008), as 1.6.0 has broken networking on Windows.