Seeds of Discontent
I like reading Tom’s Hardware Guide. I liked it better in the site’s early days when it wasn’t so javacript and flash heavy and the articles were idomatically ‘German-English’. A lot happens in twelve or so years. Still, Tom’s is the best source of information on the web.
I read a CPU benchmark this morning which got me wondering if there were a possiblity to benchmark on any operating system but Windows. I concluded for the kind of benchmarks seen on Tom’s–not really.
Many (most?) of the benchmarks rely on software only available on Windows. The platforms are varied while holding the software and operating system constant. This makes sense as Tom’s Hardware is, well, primarily a hardware site.
But what could we could we learn by holding the hardware constant and vary the operating systems? A lot, I believe.
One point I’ve noticed in reading Tom’s benchmarks over the years is the increasingly predictable nature of the results. Motherboards often perform within a few percentage points of each other. THere is some variance in different CPUs but rarely unpredicable. The conclusions often center around cost-performance comparisons to see if the premium parts are justified.
What really motivated me to write this blog entry was a video card benchmark. The two major vendors (AMD née ATI, NVidia) have been long locked in fierce battle. The benchmark seems to keep the software constant and only varies the video card hardware but the lines are not so cleanly drawn. Each video card has a unique architectural philosphy and design. Software modules are written to detect and take special advantage of these differences in video cards.
I argue that a systems approach might make for an interesting benchmark and would likely yield surprising results. Furthermore, such a benchmark would spark the desktop religious wars that have subsided in recent years (e.g., 68000 vs x86, PowerPC vs x86, IPX vs TCP/IP, RLL vs ESDI, Word vs. WordPerfect, Microsoft vs. Novell …). It seems the good fight has now moved on to smartphones and tablets. The desktop market has become docile.
My aim is to whack the hive. This is my vision.
1. This benchmark has a name: “Seeds of Discontent”
Why this name? It fits. How?
Ah. A little about me. My first job out of college was writing assembly language graphics routines used in medical imaging. On a 12MHz 80286. This introduced me to graphics hardware leading to a job at National Semiconductor (yes, National once developed graphics chips) and a deeper understanding of PC architecture. In the late nineties, I moved on to a small consumer electronics startup and eventually into pure software. I’ve seen the industry from several perspectives.
The early PC days placed importance on CPU speed for better performance and hardware integration (combining many chips into a single chip) for lower cost. The early days are gone and the funeral was the death of Comdex which happened a few years before the last Comdex. Those last few Comdex shows were zombies and those of you who were there know what I mean.
Nowdays, the big advances in hardware don’t spur the kind of religious fervor of old. It evokes a semi-interested “Oh, that’s kinda cool” response. (e.g., PATA to SATA, HDD TO SDD, dual core to quad core to six core). Interesting but not controversial.
This isn’t to say the hardware designers are sleeping. They are not. It’s just that hardware advancments alone won’t produce the kinds of dramatic performance improvements year-over-year that we saw during the eighties and nineties. The kinds of hardware advances we now see require specialized software to take advantage of those advances. And the industry knows it.
The greatest potential for seeing surprisingly results in a benchmark are not in swapping out CPUs or mainboards. The greatest potential lies in swapping out the operating systems. If there is one scrap of religiosity left in the industry, it’s in operating systems. As it happens, the OS is also the component that now has the biggest potential to influence the industry.
Which brings me to my point. I believe the right set of benchmark tests could pit OS vs OS and spur the kind of religious spat that leads to real progress. But with peace breaking out all over the industry, whipping up a bit of OS hooiganism might just whip up a real revolution. As David W. once told me, “Man’s best friend is his dogma.” Today, there is nothing more religious in the industry than operating systems.
2. “Seeds of Discontent” has a mission: I see you, OS
Chicken and egg.
Mulitcore CPUs and graphics engines both require specialized software to take advantage of the hardware. But today’s changing hardware is a moving target. It’s hard for a small software firms to write specialized hardware for an ever changing platform. But without software running on a platform (e.g., graphics), there is no “tie down” for a hardware vendor to stick with a stable API (I’m looking at you Nvidia, AMD).
In the multicore CPU arena, it takes a partner in the OS vendor to take full advantage of the multicore architecture. This isn’t to say mutithreaded software is held back by today’s OS but rather todays OS don’t do enough to create a furtile ground for multithreaded/multicore software development. (Yeah, the penguinistas are going ape-shit about now but I stick with my proposition.)
The real problem, as I see it, is that it takes a coordinated effort to make the necessary software development changes. No body wants to spend their money first. Nobody. Especially those with no money (the solo developer with no resources other than time.)
Disclaimer: I am a macinista.
Apple’s Snow Leopard (I have no information on Lion) arrived with much fanfare about Grand Central Dispatch, LLVM, OpenCL, blah blah blah and other multicore magic. I love the candy-coated icons for Core-X. But how has that translated into better software support for modern hardware? You don’t hear much about it (even if it has or hasn’t happened).
“Seeds of Discontent” is a set of benchmarks to provide an adequacy gap for each OS to buck up and meet the challenge of each other. Really. Can anyone name a benchmark today by which one OS can embarrass another? No.
The real value add an OS can deliver is hardware abstraction. (Unless you are the dominate OS -Microsoft–software developers are undermotivated to deliver software cutomized for specialize hardware under the OS.) I want OS vendors motivated to deliver better performance with cost-effective effort for software vendors. Not exlusively but in a multi-vendor environment.
Benchmarks give a measurable performance comparison between operating systems. And the operative word here is “System”. Given a fixed hardware platform, how do the various system’s perform. It’s not which comes first “chicken or egg,” what comes first is the benchmark and the “gap of shame.” Nobody wants to be last. Nobody.
3. Who?
To compare various Operating Systems presupposes a fixed hardware platform, (I say.) It also presupposes running the same software on all systems. Technically, that is impossible. Well, impossible except in a systems perspective.
Software makes OS calls to handle many functions. Mostly this is to abstract the hardware details from the OS. What’s needed then is software that has been compiled for each of the target systems. Since I’m interested in system performance (i.e., what you see when you use the system), I don’t care about how the software is written. If a package is not opitimized to use the special benefits offered by the OS, I don’t care. In my world, the experience is diminished.
So the one thing that is constant in a benchmark is the hardware. This isn’t such a problem for Linux and Windows but does present a problem for OS X. I want OS X included. Here lies a fork in the road. Does the benchmark use an Apple computer and install all OS on that machine? Or does the benchmark use a white-box PC and install all OS that machine?
I vote for the latter. This means that only way to make a proper benchmark is to build a hackintosh. This presents a problem since Apple forbids it. I recommend, then, including a “Brand-X” OS. Of course, the people actually performing the benchmark cannot comment but I would want Brand-X to be OS X, ostensibly the latest shipping version at the time of the benchmark. Of course, Brand-X could be anything. Maybe even Solaris ;^). But what about the other operating systems?
Brand-X
Windows
Ubuntu
Fedora
4. What?
Operating systems alone do not a benchmark make. What software applications could be used for the tests? Tom’s uses software that’s only available on Winodws so those won’t work. I give here a short candidate list. This isn’t a definitive list but a starting point for discussion. All packages are available for compilation and execution on all platforms.
4.1 inkscape: complex vector graphics generation
4.2 gimp: complex bit image manipulation
4.3 blender: complex 3D image manipulation, rendering. (Could we get the raw sources to render the entirity of Sintel?)
4.4 aqsis renderer: another rendering engine
4.5 brl-cad: yeat another rendering engine
4.6 handbrake: DVD ripping
4.7 ffmpeg: audio transcoding
4.8 R, rgl: converting data into visualization (e.g., httpd logfiles)
4.9 activemq: build with unit tests (requires java, maven)
4.10 tesseract-ocr: OCR convert scaned text to text files
4.11 nanoc: convert text file + gutenberg to static HTML site
4.12 nginx: file server of a static assets (see 4.12)
4.13 quake (II, 4, spasm): compile, performance benchmark
Conclusion: Will this go anywhere? I hope so. But I believe for it to come alive requires a large community to care about it. For them to care about it requires a benchmark that defines a tangible result. The difference between 63 and 68 frames per second in some game that only runs on one platform is largely irrelevant; it only says that if you spend a few hundred dollars more, you get a few more frames per second.
What would make sense? Well, for example, how many frames per second could one render raw source files for the short film Sintel at HD 1080p? My guess that right now that is under one frame per second. That sets a milestone. I really don’t know. Maybe it is greater than 1 FPS on a desktop machine running a quadcore CPU and some sort of GPU acceleration. Then again, maybe not. But there will be a price point for that. So maybe a better benchmark is $US/FPS or €/FPS (that is, cost of the system over frames per second.)
Another benchmark that puts pressure on the OS could be quake (pick your poison/version). If the only objective were FPS, then I’m confident that all OS versions could (with effort) come out about the same when running on the same platform. However, a benchmark which relied upon the OS (e.g., OpenGL, DirectX) for rendering would provide a more apples-to-apples comparison. The astute reader would immediately recognize that DirectX has an advantage in games. That’s the point. If non-Windows hopes to compete, they have to compete with the experience of games on DirectX.
This extends to the other benchmarks as well. Apple’s (remember, I’m a fan) deployment of Grand Central Dispatch, OpenCL, LLVM in Snow Leopard was exciting but I want to see it make it’s way to real software. I also use CentOS in my servers so I have a vested interest in the evolution of Fedora. Since I’m interested in Fedora, I’m interested in the other major linux OS, Ubuntu. And since OS X, Fedora and Ubuntu are minority players and these benchmarks set a bar, I’m interested in Windows. In the end, a battle for tangible system level benchmark performance benefits all.
I don’t believe the desktop is dead. After all, software written for consoles, or smartphones or tablets (or even other desktops) are written on desktops. The industry is asleep. I want to whack the hive.
“Breakin’ the law!” –Beavis
1 Trackback