Inventor Performance Viewer

I recently put off the release of Inventor IDE Alpha 5b (probably coming in August, since my honours project is highest priority now) to get something ready for DemoCamp Vancouver that I’ve been wanting for a very long time.  It’s a feature that probably won’t be released until Alpha 6, but you can be certain that I’ll try to make good use of it at D-Wave as soon as I can.  I consider it the first truly awesome feature of Inventor IDE.  It’s easiest to introduce with a picture:

Rock-solid sorting performance data literally in the blink of an eye

Rock-solid performance data literally in the blink of an eye

(In case you’re wondering, the times are in clock cycles on a 1.66GHz laptop, so 100,000 clocks = 60 microseconds.)

A big problem in doing performance analysis is that getting reliable results and useful scaling info usually means tests that take minutes, so that the impact of the OS or other applications levels off to a constant factor.  If you try to do short tests, you get issues like these:

Bad Performance Data for 2 Mersenne Twister Implementations

Bad Performance Data for 2 Mersenne Twister Implementations

Bad Performance Data for Sorting Algorithms

Bad Performance Data for 3 Sorting Implementations

These aren’t exactly reliable results; running a short test multiple times produces wildly different results because of just a few bad data.  On Tuesday around 4am (Pacific Daylight  Time), I finally managed to find a decent approach to eliminate almost all bad data without eliminating much good data.  It’s certainly not perfect, but here are the results of running the two Mersenne Twister tests thrice each while AQUA@Home is trying to max out the CPU and the harddrive is thrashing like mad (for unknown reasons):

Much Better Mersenne Twister Performance Data

Much Better Mersenne Twister Performance Data

It’s worth pointing out that these times are in fact noticeably worse than the results when the CPU and harddrive aren’t under heavy load elsewhere (about 1.5 times as long).  The important thing is that the results are still remarkably consistent, and Mersenne128 is still 3.8 times faster than Mersenne32.  You can even still roughly see that Mersenne32 regenerates its data array every 624 elements and Mersenne128 regenerates every 2496 elements.  In case you’re wondering why CPU times would be so consistently longer instead of scattered, it has to do with process switches flushing the cache and TLBs, only causing cache misses in the good data, whereas the bad data still contain any process switches and other interrupts (like the 1kHz tick and page faults).

I’m still shocked that I can effectively triple-click the test run button on the performance test functions and have the results show up as fast as Windows can show the windows.  It’s even more responsive than the regular run button in the ImageProgram sample app, and that had shocked me.  That said, I found the ironic-but-not-so-important problem that it names the output files with the time only down to the second, so one that ran in the same second as another wrote over the first’s data file.

This kind of capability finally allows developers to rapidly iterate through possible performance improvements, not only making it faster to optimize, but much easier, because there’s time to try out many more optimizations in the same amount of time.  For example the faster mergesort in the top graph is from adding a special case for n=2.  In that way, it can also be used as a learning tool for developers to determine what does and doesn’t improve performance.  Best of all, because of the performance viewer’s language-independence, its platform-independence, and the many useful analyses it could do on the data, it’s applicable to many different situations; probably many I won’t think of myself, so I leave that up to you.  I’ve just shown you the first glimpse into what I’ve got planned for it.  🙂

~ by Neil Dickson on July 9, 2009.

3 Responses to “Inventor Performance Viewer”

  1. impressive, as always

    but let me know when this and pwn OS are up and running, i have an unused computer somewhere that i wanna hopefully throw this stuff onto

  2. This, Neil, is a step towards what you’ve been saying for a while: performance matters, not theoretical algorithmic complexity as the limit approaches infinity.
    Question: The y axis is time the test took, what’s the x axis? size of input?

  3. Yep, x is the number of elements to sort in the case of the sorting, and x is the number of random numbers to generate in the case of Mersenne Twister.

    If I had time to spare, I’d throw up the forward-shifting version of insertion sort (i.e. instead of shifting the array starting from the back, use xchg repeatedly to go forward through it) and non-recursive mergesort. I might do it anyway, if only to satisfy curiosity, but I really need to get PwnOS going in order to have something to back up an argument in my honours project report. You can bet that I’ll have an analysis of memory allocation and thread switching performance in PwnOS if I have time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: