The random rantings of a concerned programmer.

(Untitled)

June 23rd, 2009 | Category: Random

455785d1ff3869b642fea479110a35a5_19ccBefore the shitstorm hits 4scrape is dead for awhile due to a HDD failure. Yes, there are offsite backups for all the stuff on there but meh probably going to be awhile before I’m arsed to actually restore it. Kind of want to rewrite the damn thing again. I cbf’d to put the images in a torrent or whatever because they’re all shit anyway. RIP.

And I csup‘d a bad kernel last night on my laptop or something because it panicked a couple minutes ago. Naturally this means spending the rest of the night dicking around with it to get a crash dump (it didn’t leave one FUUUUUUUUUUU) then dicking around in gdb to find the cause of the crash (or just posting the traces to the mailing list). <3

Pic potentially related to binge drinking for the past week.

16 comments

Fetching and Parsing XML with Haskell

April 01st, 2009 | Category: Random

Okay, I have to stumble through the Haddock for this shit every time I want to get something done with Haskell, so I’m going to type some of this up for the next time I have to deal with it. Hopefully this will help me remember the specifics better.

I want to implement a “feature” on 4scrape where when someone does a search, the JavaScript also does a search on JList and injects some product suggestions into the results. Since JList only provides HTML/XML interfaces (and not JSONP), I have to write a proxy service, and if I’m going to do that, I might as well have it cache, parse and convert the JList product data into JSON so the JavaScript side is really easy to write.

Fetching shit over HTTP

First, we need to fetch the raw data. Before we can do that, we need to parse the target URL into a Network.URI.URI


ghci> import Network.URI
ghci> let url = "http://feeds.jlist.com/SEARCH/evangelion/feed.xml"
ghci> let Just uri = parseURI url

There’s a lot of ways to fetch shit over HTTP, but I prefer Network.HTTP.simpleHTTP because it’s simple. We just have to construct a Request object from our URI (and you can put headers and shit in there too — but for posting stuff there’s better interfaces I think) -


ghci> import Network.HTTP
ghci> :t Request
Request :: URI -> RequestMethod -> [Header] -> String -> Request
ghci> let req = Request uri GET [] ""

And then perform the simpleHTTP call, which returns an Either we have to deal with (I just let the shit error out) –


ghci> res <- simpleHTTP req
Right HTTP/1.1 200 OK 
Date: Wed, 01 Apr 2009 22:47:42 GMT
Server: Apache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/xml
Content-Length: 65190


ghci> let Right rsp = res

There’s various accessor functions for grabbing all of the pieces of the response object, but the only thing I’m interested in is the body –


ghci> let xml = rspBody rsp

Parsing with Text.XML.Light

Now that we have the data, we need to parse it. There’s a bajillion Haskell libraries to parse and manipulate XML documents (HXT, HaXML, libxml bindings, etc), but for lightweight work I find Text.XML.Light to be simple enough. Arguably, this would be faster to deal with using some XPath but ugh that shit (and XSLT) make me kind of queasy (for no good reason). Fucking XML.

Anyway, have to parse the XML document –


ghci> import Text.XML.Light
ghci> let Just doc = parseXMLDoc xml

In this case, I’m dealing with RSS, which is fairly straightforward — I just want to grab all the “item” elements, then from each of those, grab the title, link and description. The only muddy thing about traversal with XML.Light is using QName (qualified names?) which really is just an extra step in the process. Fucking XML.


ghci> let items = findElements (QName "item" Nothing Nothing) doc
ghci> length items
58

And this is where I introduce totally unmaintainable code


ghci> let stuff = map (\e -> map (\n -> maybe "" strContent (findElement (
      QName n Nothing Nothing) e)) ["title", "guid", "description"]) items

ghci> stuff !! 0 !! 0
"Evangelion Head Interface ~ Rei Ayanami ver. "

Could probably clean that up a lot by sticking the lambdas and the ["title", ...] into where clauses but I am too lazy. Anyway, that’s all our stuff and we can just throw that into JSON with Text.JSON.encode and be on our way. Hooray.

6 comments

wild asuka has been caught

February 15th, 2009 | Category: Random

asuka

Not only did I spend way too much fucking money at Katsucon, but the Korean mafia is after my head. Goddammit.

11 comments