The random rantings of a concerned programmer.
Archive for August 11th, 2008

(Untitled)

August 11th, 2008 | Category: Random

Diablo 2 really is an awesome game — I’m in a party of 3 (potentially 4-5 soon) people who now hang out and play it. Yes, it’s the modern equivalent of LARP faggotry, but whatever it’s fun as hell. Also the Diablo 3 gameplay video is fucking amazing, I can’t wait for that game to come out.

Microsoft Excel, on the other hand, is a pile of steaming shit. They sent me a 1200-row spreadsheet to throw into one of the databases. I figured what the hell it wouldn’t be a big problem, just export it to a tab-delimited format and it’s a 10-line script.

Hahaha yeah that would be nice.

The data in question is filled with Unicode, which apparently Microsoft Excel doesn’t handle. At least, 2007 doesn’t do it. There’s a “tab-delimited” export option, but it just converts all of the Unicode characters to ‘?’s. There’s a “unicode text” export option, but I have no idea what the fuck it’s outputting (it certainly isn’t unicode). Funnily enough, a quick Google suggests that there was a “tab-delimited unicode” export option in Excel 2003. Apparently they didn’t think it was important to keep in though, the fucking bastards.

So I used GMail’s ‘View as HTML’ option and wrote a quaint little script to parse the output –

re.findall( '<tr[^>]*><th[^>]*><b>(\\d+)</b></th>\\s+
<td[^>]*><font[^>]*>([^<]+)</font></td>\\s+
<td[^>]*><font[^>]*>([^<]+)</font></td>\\s+
<td[^>]*><font[^>]*>([^<]+)</font></td>\\s+
<td[^>]*><font[^>]*>([^<]+)</font></td>\\s+
<td[^>]*><font[^>]*>([^<]+)</font></td></tr>',
 open( 'gmail_conv.html' ).read() )

(Stand back, I know regular expressions; in b4 optimization)

2 comments