Inventor Performance Viewer
I recently put off the release of Inventor IDE Alpha 5b (probably coming in August, since my honours project is highest priority now) to get something ready for DemoCamp Vancouver that I’ve been wanting for a very long time. It’s a feature that probably won’t be released until Alpha 6, but you can be certain that I’ll try to make good use of it at D-Wave as soon as I can. I consider it the first truly awesome feature of Inventor IDE. It’s easiest to introduce with a picture:
(In case you’re wondering, the times are in clock cycles on a 1.66GHz laptop, so 100,000 clocks = 60 microseconds.)
A big problem in doing performance analysis is that getting reliable results and useful scaling info usually means tests that take minutes, so that the impact of the OS or other applications levels off to a constant factor. If you try to do short tests, you get issues like these:
These aren’t exactly reliable results; running a short test multiple times produces wildly different results because of just a few bad data. On Tuesday around 4am (Pacific Daylight Time), I finally managed to find a decent approach to eliminate almost all bad data without eliminating much good data. It’s certainly not perfect, but here are the results of running the two Mersenne Twister tests thrice each while AQUA@Home is trying to max out the CPU and the harddrive is thrashing like mad (for unknown reasons):
It’s worth pointing out that these times are in fact noticeably worse than the results when the CPU and harddrive aren’t under heavy load elsewhere (about 1.5 times as long). The important thing is that the results are still remarkably consistent, and Mersenne128 is still 3.8 times faster than Mersenne32. You can even still roughly see that Mersenne32 regenerates its data array every 624 elements and Mersenne128 regenerates every 2496 elements. In case you’re wondering why CPU times would be so consistently longer instead of scattered, it has to do with process switches flushing the cache and TLBs, only causing cache misses in the good data, whereas the bad data still contain any process switches and other interrupts (like the 1kHz tick and page faults).
I’m still shocked that I can effectively triple-click the test run button on the performance test functions and have the results show up as fast as Windows can show the windows. It’s even more responsive than the regular run button in the ImageProgram sample app, and that had shocked me. That said, I found the ironic-but-not-so-important problem that it names the output files with the time only down to the second, so one that ran in the same second as another wrote over the first’s data file.
This kind of capability finally allows developers to rapidly iterate through possible performance improvements, not only making it faster to optimize, but much easier, because there’s time to try out many more optimizations in the same amount of time. For example the faster mergesort in the top graph is from adding a special case for n=2. In that way, it can also be used as a learning tool for developers to determine what does and doesn’t improve performance. Best of all, because of the performance viewer’s language-independence, its platform-independence, and the many useful analyses it could do on the data, it’s applicable to many different situations; probably many I won’t think of myself, so I leave that up to you. I’ve just shown you the first glimpse into what I’ve got planned for it. 🙂