Computer *Science* “Research”

I don’t want people to think that I’m just some academic hating on companies from an ivory tower in my last post, because at the moment, I blame academia about as much as companies for the lack of innovation for a very different set of reasons.  My apologies for another ranty, depressing post, but I need to let off some steam after the latest encounter with my honours project supervisor.  I’d like to share an anecdote about one of my roommates when I was at Microsoft.

Joey Jo-jo Junior Shabadoo (fake name at the request of the real guy) is a PhD student (or postdoc, I can’t recall) in Sweden, specializing in wireless sensor networks (WSN), and he was in a team at Microsoft Research, developing what sounded like a software toolkit for WSN.  Everybody on his team had a PhD.   There’s just one problem: PhD computer science researchers in North America aren’t required to know anything about software, or even computers… or even science, whereas PhD computer science researchers in Sweden are still required to know how to develop software, since how else could you really determine whether a hypothesis makes sense?  Why is this a problem?

He was the only person on his team who had actually developed much software in recent history.  There was code by his team when he got to Redmond, but all of it was a mess, a much of it was untested, and as it turned out much later, what little of it was tested had faulty tests.  Worse yet, his team hadn’t decided what hardware they’d be using yet, and somehow they had code written for some unknown hardware.  When they eventually determined what hardware they’d use, much of the code no longer made any sense.

Joey was pretty angry about all this, and if you knew him, you’d know that’s saying a lot, since he’s such a light-hearted, fun character.  Moreover, he was determined to get this stuff working.  This guy worked his ass off to pick up the slack for his team.  I’m talking 10 hours Monday through Friday and 8 hours Saturday and Sunday.  He once worked 12 hours on one Saturday or Sunday to try to fix a bug that turned out to be because a message passing function that was supposedly tested had never been tested with an odd number of bytes, and it failed much of the time when a message had an odd number of bytes.  And no, he was not getting paid for overtime.  By the end of it, he was on the verge of giving up on computer science entirely, and he sounded pretty depressed.  I don’t blame him; I would’ve quit or killed myself, since it sounded as bad or worse than my experience at RIM the term before.  I hope he’s doing well, wherever he is now, and I’ll send him a link to this to check that I’ve told the story correctly.

Scientific method refers to bodies of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering observable, empirical and measurable evidence subject to specific principles of reasoning. A scientific method consists of the collection of data through observation and experimentation, and the formulation and testing of hypotheses.

The above is from Wikipedia.  What does it have to do with the story or my distaste for the state of computer science research (in North America)?  We’ve completely lost the science in computer science.  There is not a single computer science course I’ve taken or heard of that actually shows anything that would come close to meeting the above definition of scientific.  We don’t make hypotheses and test them in any courses, we’re just told to come up with something that does X, often just on paper.  In terms of computer science “researchers”, they appear to often just make hypotheses and publish them.  The “test” is whether other researchers accept them, but there’s no real testing of them, no measurements, no experiments, often no principles of reasoning or evidence.  Often when there is testing, it’s completely irrational, and is either very skewed in favour of those researchers or the tests indicate absolutely nothing.  Why?  Because we’re all implicitly taught that there’s no value in testing hypotheses, so they have no idea how to select tests or do systematic testing; they never see it in undergrad, and it evidently doesn’t get through in grad studies either.

I could give examples of this until I become ill, but I’ll just give three very different ones:

  • Here’s a paper that claims that OpenMP makes for faster code and is easier to use than MPI for parallelizing. Their testing: get students to use OpenMP on a single computer (which is what it’s designed for) and get students to use MPI across multiple computers (which is what it’s designed for) with the same total number of CPU cores; time the students and their code.  If you’ve done anything with parallel computing, it’s pretty easy to come up with a correct hypothesis about this experiment.  They neglect to mention that the numbers are completely uncomparable.  One is using several threads on a single computer using shared memory (fast) to communicate implicity (easy), but doesn’t scale up past one machine (since it isn’t designed to); the other is using several processes spread across computers using a network (slow) to communicate explicitly (harder), but scales to hundreds of thousands of machines.  Of course OpenMP will be faster and easier to use in their small-scale test case!  If it wasn’t, it’d be a complete failure on the part of OpenMP!
  • Here’s a paper that “proves” that P=NP. It took about 5 minutes for me to find a counter-example.  Other people found counterexamples too.  So, the author made a completely new algorithm and claimed that it “proves” that P=NP.  I implemented it, picked an arbitrary 10-variable example I found, and it got the wrong answer; others have too.  He’s made other algorithms for P=NP that have been disproven as well.  I was sad to learn that as of recently, the author still thinks that his algorithms prove that P=NP.  This is a simple matter of that he doesn’t test his work at all, and leaves it to others to prove him wrong.  It’s taking “innocent [of academic fraud] until proven guilty [of academic fraud in every paper]” to the extreme.
  • Here’s one that I wrote myself on jump encoding. Yes, I’m even willing to bash my own work. It turns out that the statement of assumptions in the abstract is wrong.  It’s true that it the algorithm isn’t always space-optimal if an array is declared with a size dependent on the negative of the number of bytes in a section of code, but it’s also not always space-optimal if people put in preprocessor statements that exclude things based on the size of code.  Of course, it turns out that then it’s NP-complete just to determine whether there’s a way to include/exclude code such that no paradoxes occur (e.g. if this code is included, exclude it, and if this code is excluded, include it).  That’s why assemblers restrict what you can and can’t do with symbol references, but it is possible to construct some case where the algorithm isn’t space optimal without running into those restrictions on some assemblers.  I just happen not to care about those cases, since they’re ridiculous, and not supported by Inventor IDE (they’re hard enough to support at all, let alone being space-optimal).
    What’s really missing from this paper, other than a formal proof of correctness, is any sort of empirical check of correctness or performance analysis, since I hadn’t even implemented it yet.  The eventual implementation is fairly similar to what’s described, and it does check out empirically with the testing I’ve done, but I still haven’t done any performance testing to see whether it actually runs any faster in practice than the worst-case O(n^2) algorithms that are a bit simpler.  A few people pointed out quite rightly that a very simple O(n^2) algorithm could likely beat the crap out of this O(n) algorithm, which I neglected to state in the paper.  Hopefully I’ll get a chance to check that once a friend of mine has finished making the next big feature of Inventor IDE, since it’ll help a ton.  Maybe I’ll post a new version then.

In order to move forward as a field, we absolutely need a course that teaches some form of experiment design and systematic testing, even performance testing.  I don’t mean garbage like the QA courses I’ve heard of, I mean real testing for things that are really worth testing.  There was only one assignment I’ve ever had (assignment #1 of COMP 2402) where we measured the actual time of a piece of code, and it was a toy example with only 5 data points.  I find that outrageous and unacceptable.

In relation to innovation, the academic world is so full of useless papers that finding anything useful in some fields can be a life’s work.  How can people possibly innovate when they can’t find out what’s actually useful and what’s just empty hypothesis?

I haven’t even touched on the problem that so much research is doen for the sake of “wouldn’t it be cool if ____” (the answer is no, by the way) or “this is popular, so it must be important”.  Researchers are often so separated from the real world that they honestly don’t even realize that most papers will probably never help anyone.  It reminds me of how a few weeks ago, some lawyers tried to convince me that most users actually read and understand the legal agreements on software.  *sigh*  😦

~ by Neil Dickson on May 12, 2009.

8 Responses to “Computer *Science* “Research””

  1. Yipes…

    I believe you have a /very/ skewed view of the academic situation in North America by way of antedoctal evidence from a Swede. I’m sure you’re fully aware, but Comp Sci has been around far longer than Computers physically existed, and the unfortunate term is used to cover the grey area of where mathematics/science overlap (computability, logic, etc). Even more unfortunate, it often encompasses the not-so-grey areas of discrete mathematics, etc.

    To say anyone researching in those fields is doing pointless work is going a bit too far — and I don’t imagine that’s what you had in mind — but there still needs to be respect for the unfortunate categorization of what the term Computer Science implies. Not all results that fall under the umbrella can be experimentally validated, and any attempt just seems fabricated / pointless. For a (hypothetical) example – the proof of a P -vs- NP
    result will not (and should not) require experimental validation. It’s a matter of logic, reasoning, and pure mathematics that lets to arrive to that result.

    As for the current state of affairs in graduate studies, I haven’t met a single researcher at this level who actively monitors (let alone even knows about) Arxiv. Not a good place to find examples of the state-of-research. A common criticism is actually quite the opposite of what you have described — far too often the peer-reviewed conferences / journals turn away good ideas because of a lack of thorough benchmarking / horse-race tests. I don’t fully agree with this criticism, but there are tremendously convincing antedoctal incidents to support it.

    Finally, the course spread at Carleton should not be taken as a stereotypical stamp of the continents academic focus. Aye, I had to play catchup to learn how to properly evaluate the research we do, but most of the North American students (and international ones) already had the undergrad course’s in experimental design and methodology.

    Don’t be so gloom — it ain’t all bad ;).

    PS. I fully support the notion that doing research motivated by the question “wouldn’t it be cool if ____?” is a perfectly valid / wonderous path to take.

  2. I can’t say I’m as pessimistic in the sub-field of computer vision (and probably graphics, etc). I see what appear to be pretty good experiments in those papers. Not perfect, because the methods researchers have found for accomplishing things are still imperfect themselves. I can understand where you’re coming from, especially with some other subfields, though. (By the way, it’s not really the purpose of an undergrad degree to learn research, but I do agree that doesn’t have to mean they don’t teach real and proper testing.)

  3. @Christian: Computer Science was originally what we now call “Numerical Methods”. “Computer” was literally a job title. A computer was a person who’d receive calculations to compute and would compute them. At the time, computer science was a science, because hypotheses got tested pretty fast, and things were largely motivated by “our physicists/chemists/biologists/accountants need us to calculate _____.” There’s still a lot of important science to be done in computer science, and much of it isn’t being done. I could see a dialogue analogous to the following happening (it’s hyperbole, of course, but the point remains):

    Biochemist: “Hi, we’ve got some really tough computations that could help cure cancer and were wondering if you could help us.”
    Computer Scientist: “Okay, so why did you come to us? We’re computer scientists.”
    Biochemist: “Well, we figured you’d know more about computation than us, so you’d be able to help us do better simulations faster.”
    Computer Scientist: “That’s awfully arrogant of you. We’re busy making a distributed operating system to store files for 10 billion users.” (see the end of the introduction)
    Biochemist: “Arrogant? but we’re trying to cure cancer. It could save hundreds of millions of lives over just a few decades.”
    Computer Scientist: “Oh yeah? Well our project will help 10 billion people! and they’ll each pay us a monthly fee for our service.” (see just after the introduction)
    Biochemist: “…but there aren’t that many people in the wor–”
    Computer Scientist: “10 BILLION!”
    (after implementing their distributed OS)
    Computer Scientist: “Let’s test it with 100 simulated users. That ought to be enough.”

    As for the approval of papers, the first one I linked to somehow made it into SC05, one of the world’s most prestigious supercomputing conferences! They even quote a 24% acceptance rate, so somehow that paper was deemed better than 76% of the 260 papers submitted.

    You’re right that some things ultimately aren’t determined by experiment, and some things are logical enough not to require testing, but situations like Chazelle’s infamous linear-time triangulation algorithm are just sad. It’s so complicated that, as Jit put it, the only reason it’s widely accepted as a linear-time triangulation algorithm is because Tarjan accepts it as a linear-time triangulation algorithm. Some things just don’t lend themselves to being checked by humans.

  4. @Gail: You’re right that it’s unfair of me to lump all of computer science into the same pile. Computer vision is definitely a scientific area of computer science; it’s all about making hypotheses and testing them rigorously. High-performance computing is often also done fairly scientifically, despite the one example I cited above.

    I think that some areas could really benefit from more scientific study; the tendency is just to get so separated from reality that it becomes almost impossible to define what it means to be a “realistic” test of the work.

    Christian’s right that “it ain’t all bad”, but we sometimes need to keep our wits about us to try to avoid falling into the trap of “I don’t need to think about the real world; I’m a computer scientist.” I’ll admit to having done completely pointless things before out of curiosity, like my approximation for n choose n/2 in this post. I nonetheless try to think about what I do in terms of who’s likely to be affected by it and how. Maybe I’m a jerk for hoping that other people would sometimes do the same.

  5. I personally need a connection to the real world in my work, but not everyone feels the same way. The key is just to recognize which research is the theoretical stuff that may never be practical (or certainly not all that easy to implement, despite the implication to the contrary). I figure there’s a place for both camps – we just need to be aware of it. And posts like these are a good place to discuss it 😉

  6. Can’t always have that connection to the real world though. Some research is done simply out of curiosity, and only decades later does its true potential for impacting the world in a positive way become evident.

    The example you provide is more akin to what a biologist or chemist would do when finding external software development contracting. I’ve witnessed this, and it isn’t a theoretic computer scientist they approach — it is a well versed software engineer. In fact, some in the department at U of T (about 4 years ago) acknowledged their ability (or lack thereof) to quickly produce good quality code for a concept they had invented — the computer science department outsourced the software development to someone better equipped for the task.

    Different people have their strengths and weaknesses, and it should be perfectly acceptable for those who are badass about proving new bounds to do their thing, and those who can optimize the hell out of a piece of C to do their own magic. Trying to fit someone with the label ‘Computer Scientist’ into too many skillsets will just serve to weaken their overall ability to contribute to the human race (via the discovery of new knowledge).

    Now I know not everyone need’s to specialize, and those that /can/ straddle the areas prove to be the most influential and have the highest impact…but I would venture to say its the norm.

    PS. In no way does this excuse those in a position who need to experimentally validate their claims from producing shitty work. If you’re innovating in a field that lends itself to experimental methodology, then it’s the researchers responsibility to verse themselves with what they need to know (and the peer’s responsibility to make sure those who don’t aren’t repeatedly rewarded (although this doesn’t always work…I’ve witnessed counterexamples too)). It’s just that not all fields under the Computer Science umbrella have such a requirement.

  7. @Christian: I absolutely agree with your statement “Different people have their strengths and weaknesses, and it should be perfectly acceptable for those who are badass about proving new bounds to do their thing, and those who can optimize the hell out of a piece of C to do their own magic.” One thing I’m really against is the growing mentality that if it’s practical, it’s of no academic value. I’m not really trying to argue that there’s no value in doing theoretical research, since there can be value in it; I’m just trying to say that theoretical researchers should stop dumping on more practical researchers for being practical. The world is missing out on a lot of useful research just because it’s shunned, which is why a lot of practical research gets relegated to only being done in companies, and from my last post, you can tell how I feel about how companies have done overall.

    I don’t mean to insult Anil, and my apologies to him for using him as my example here, since he’s really a nice person, and he’s BY FAR the best teacher of the operating systems course at Carleton, but he’s told me straight up that I’m “very arrogant” for wanting research to help people and that implementation is of no academic value (he used the synonym “scholarship”). Maybe he doesn’t intend to, but that kind of talk is incredibly discouraging toward practical research. If someone as kind as Anil discourages practical research so heavily, I couldn’t imagine how discouraged it is by less respectful professors. Maybe what I’m doing won’t work, and maybe it won’t end up helping people, and maybe it’ll end up being of no academic value, but is it really so wrong for me to try?

  8. Ahh, I see where you’re coming from. Thing to keep in mind is that the whole range of respectful tendencies (and pleasantness towards students) is mostly disjoint from the spectrum of where people think implementational details belong in the academic world. I can assure you that there are some areas in Computer Science (more specifically operations research) that have been turning away good /theoretical/ research over mediocre “practical” implementations (which aren’t always done with the best of scientific methods).

Leave a reply to Gail Cancel reply