What I Used To Do

15 Jun 2008
Posted by david
david's picture

So here is a description of something I did a couple of weekends ago at Uni for my thesis. Hopefully you'll get some idea what I spent 3 years doing.

What I needed to do was a statistical significance test, that is, comparing two models to see if their results were actually any different. This guy, Dan Bikel, had written a script to do this, here: Randomized Parsing Evaluation Comparator. Only problem was it was in Perl, which as we all know, sucks hard, and also blows. So I rewrote the thing in Python, yay Python!

Here's what it does. You take two input files that look like this:
Sent. Matched Bracket Cross Correct Tag
ID Len. Stat. Recal Prec. Bracket gold test Bracket Words Tags Accracy ============================================================================
1 8 0 100.00 100.00 5 5 5 0 6 5 83.33
2 40 0 87.88 90.62 29 33 32 1 37 37 100.00
3 31 0 69.57 76.19 16 23 21 2 26 26 100.00

…that goes on for a while, so I'll skip to the bottom…

2415 13 0 100.00 100.00 11 11 11 0 12 12 100.00 2416 13 0 70.00 77.78 7 10 9 2 12 11 91.67

============================================================================

87.88 88.38 40085 45612 45355 1553 49892 48706 97.62

Then you pull out the important columns, and randomly swap them between the two models. Then you recalculate the overall precision, recall and F-score (those are measures of how well the model did), and see if its better than it was originally. Then you repeat another 9999 times.

In the end, almost all the results were 0. There were no times when the random metrics were better than the original. Which is a good thing, cause it means what I'd done had actually improved stuff.

Anyway, that's pretty indicative of what I spent my PhD doing. Read in data files you got from somewhere, and process them to get some number. The result is often another data file that needs more processing.

As a side note, I actually submitted my thesis on Friday. Suck it bitches.