Computer *Science* “Research”
I don’t want people to think that I’m just some academic hating on companies from an ivory tower in my last post, because at the moment, I blame academia about as much as companies for the lack of innovation for a very different set of reasons. My apologies for another ranty, depressing post, but I need to let off some steam after the latest encounter with my honours project supervisor. I’d like to share an anecdote about one of my roommates when I was at Microsoft.
Joey Jo-jo Junior Shabadoo (fake name at the request of the real guy) is a PhD student (or postdoc, I can’t recall) in Sweden, specializing in wireless sensor networks (WSN), and he was in a team at Microsoft Research, developing what sounded like a software toolkit for WSN. Everybody on his team had a PhD. There’s just one problem: PhD computer science researchers in North America aren’t required to know anything about software, or even computers… or even science, whereas PhD computer science researchers in Sweden are still required to know how to develop software, since how else could you really determine whether a hypothesis makes sense? Why is this a problem?
He was the only person on his team who had actually developed much software in recent history. There was code by his team when he got to Redmond, but all of it was a mess, a much of it was untested, and as it turned out much later, what little of it was tested had faulty tests. Worse yet, his team hadn’t decided what hardware they’d be using yet, and somehow they had code written for some unknown hardware. When they eventually determined what hardware they’d use, much of the code no longer made any sense.
Joey was pretty angry about all this, and if you knew him, you’d know that’s saying a lot, since he’s such a light-hearted, fun character. Moreover, he was determined to get this stuff working. This guy worked his ass off to pick up the slack for his team. I’m talking 10 hours Monday through Friday and 8 hours Saturday and Sunday. He once worked 12 hours on one Saturday or Sunday to try to fix a bug that turned out to be because a message passing function that was supposedly tested had never been tested with an odd number of bytes, and it failed much of the time when a message had an odd number of bytes. And no, he was not getting paid for overtime. By the end of it, he was on the verge of giving up on computer science entirely, and he sounded pretty depressed. I don’t blame him; I would’ve quit or killed myself, since it sounded as bad or worse than my experience at RIM the term before. I hope he’s doing well, wherever he is now, and I’ll send him a link to this to check that I’ve told the story correctly.
Scientific method refers to bodies of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering observable, empirical and measurable evidence subject to specific principles of reasoning. A scientific method consists of the collection of data through observation and experimentation, and the formulation and testing of hypotheses.
The above is from Wikipedia. What does it have to do with the story or my distaste for the state of computer science research (in North America)? We’ve completely lost the science in computer science. There is not a single computer science course I’ve taken or heard of that actually shows anything that would come close to meeting the above definition of scientific. We don’t make hypotheses and test them in any courses, we’re just told to come up with something that does X, often just on paper. In terms of computer science “researchers”, they appear to often just make hypotheses and publish them. The “test” is whether other researchers accept them, but there’s no real testing of them, no measurements, no experiments, often no principles of reasoning or evidence. Often when there is testing, it’s completely irrational, and is either very skewed in favour of those researchers or the tests indicate absolutely nothing. Why? Because we’re all implicitly taught that there’s no value in testing hypotheses, so they have no idea how to select tests or do systematic testing; they never see it in undergrad, and it evidently doesn’t get through in grad studies either.
I could give examples of this until I become ill, but I’ll just give three very different ones:
- Here’s a paper that claims that OpenMP makes for faster code and is easier to use than MPI for parallelizing. Their testing: get students to use OpenMP on a single computer (which is what it’s designed for) and get students to use MPI across multiple computers (which is what it’s designed for) with the same total number of CPU cores; time the students and their code. If you’ve done anything with parallel computing, it’s pretty easy to come up with a correct hypothesis about this experiment. They neglect to mention that the numbers are completely uncomparable. One is using several threads on a single computer using shared memory (fast) to communicate implicity (easy), but doesn’t scale up past one machine (since it isn’t designed to); the other is using several processes spread across computers using a network (slow) to communicate explicitly (harder), but scales to hundreds of thousands of machines. Of course OpenMP will be faster and easier to use in their small-scale test case! If it wasn’t, it’d be a complete failure on the part of OpenMP!
- Here’s a paper that “proves” that P=NP. It took about 5 minutes for me to find a counter-example. Other people found counterexamples too. So, the author made a completely new algorithm and claimed that it “proves” that P=NP. I implemented it, picked an arbitrary 10-variable example I found, and it got the wrong answer; others have too. He’s made other algorithms for P=NP that have been disproven as well. I was sad to learn that as of recently, the author still thinks that his algorithms prove that P=NP. This is a simple matter of that he doesn’t test his work at all, and leaves it to others to prove him wrong. It’s taking “innocent [of academic fraud] until proven guilty [of academic fraud in every paper]” to the extreme.
- Here’s one that I wrote myself on jump encoding. Yes, I’m even willing to bash my own work. It turns out that the statement of assumptions in the abstract is wrong. It’s true that it the algorithm isn’t always space-optimal if an array is declared with a size dependent on the negative of the number of bytes in a section of code, but it’s also not always space-optimal if people put in preprocessor statements that exclude things based on the size of code. Of course, it turns out that then it’s NP-complete just to determine whether there’s a way to include/exclude code such that no paradoxes occur (e.g. if this code is included, exclude it, and if this code is excluded, include it). That’s why assemblers restrict what you can and can’t do with symbol references, but it is possible to construct some case where the algorithm isn’t space optimal without running into those restrictions on some assemblers. I just happen not to care about those cases, since they’re ridiculous, and not supported by Inventor IDE (they’re hard enough to support at all, let alone being space-optimal).
What’s really missing from this paper, other than a formal proof of correctness, is any sort of empirical check of correctness or performance analysis, since I hadn’t even implemented it yet. The eventual implementation is fairly similar to what’s described, and it does check out empirically with the testing I’ve done, but I still haven’t done any performance testing to see whether it actually runs any faster in practice than the worst-case O(n^2) algorithms that are a bit simpler. A few people pointed out quite rightly that a very simple O(n^2) algorithm could likely beat the crap out of this O(n) algorithm, which I neglected to state in the paper. Hopefully I’ll get a chance to check that once a friend of mine has finished making the next big feature of Inventor IDE, since it’ll help a ton. Maybe I’ll post a new version then.
In order to move forward as a field, we absolutely need a course that teaches some form of experiment design and systematic testing, even performance testing. I don’t mean garbage like the QA courses I’ve heard of, I mean real testing for things that are really worth testing. There was only one assignment I’ve ever had (assignment #1 of COMP 2402) where we measured the actual time of a piece of code, and it was a toy example with only 5 data points. I find that outrageous and unacceptable.
In relation to innovation, the academic world is so full of useless papers that finding anything useful in some fields can be a life’s work. How can people possibly innovate when they can’t find out what’s actually useful and what’s just empty hypothesis?
I haven’t even touched on the problem that so much research is doen for the sake of “wouldn’t it be cool if ____” (the answer is no, by the way) or “this is popular, so it must be important”. Researchers are often so separated from the real world that they honestly don’t even realize that most papers will probably never help anyone. It reminds me of how a few weeks ago, some lawyers tried to convince me that most users actually read and understand the legal agreements on software. *sigh* 😦