The random rantings of a concerned programmer.

varnishlog filter by client IP address

February 15th, 2011 | Category: Random

Couldn’t figure this one out myself (didn’t help that the Varnish 2.1.4 man pages aren’t synced properly with the code), so after asking around on the Varnish IRC channel, I learned that you can filter varnishlog‘s output by remote IP address using

$ varnishlog -c -o SessionOpen $IP

Really bloody useful when troubleshooting on a production machine >:(


Fetching and Parsing XML with Haskell

April 01st, 2009 | Category: Random

Okay, I have to stumble through the Haddock for this shit every time I want to get something done with Haskell, so I’m going to type some of this up for the next time I have to deal with it. Hopefully this will help me remember the specifics better.

I want to implement a “feature” on 4scrape where when someone does a search, the JavaScript also does a search on JList and injects some product suggestions into the results. Since JList only provides HTML/XML interfaces (and not JSONP), I have to write a proxy service, and if I’m going to do that, I might as well have it cache, parse and convert the JList product data into JSON so the JavaScript side is really easy to write.

Fetching shit over HTTP

First, we need to fetch the raw data. Before we can do that, we need to parse the target URL into a Network.URI.URI

ghci> import Network.URI
ghci> let url = ""
ghci> let Just uri = parseURI url

There’s a lot of ways to fetch shit over HTTP, but I prefer Network.HTTP.simpleHTTP because it’s simple. We just have to construct a Request object from our URI (and you can put headers and shit in there too — but for posting stuff there’s better interfaces I think) -

ghci> import Network.HTTP
ghci> :t Request
Request :: URI -> RequestMethod -> [Header] -> String -> Request
ghci> let req = Request uri GET [] ""

And then perform the simpleHTTP call, which returns an Either we have to deal with (I just let the shit error out) –

ghci> res <- simpleHTTP req
Right HTTP/1.1 200 OK
Date: Wed, 01 Apr 2009 22:47:42 GMT
Server: Apache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/xml
Content-Length: 65190

ghci> let Right rsp = res

There’s various accessor functions for grabbing all of the pieces of the response object, but the only thing I’m interested in is the body –

ghci> let xml = rspBody rsp

Parsing with Text.XML.Light

Now that we have the data, we need to parse it. There’s a bajillion Haskell libraries to parse and manipulate XML documents (HXT, HaXML, libxml bindings, etc), but for lightweight work I find Text.XML.Light to be simple enough. Arguably, this would be faster to deal with using some XPath but ugh that shit (and XSLT) make me kind of queasy (for no good reason). Fucking XML.

Anyway, have to parse the XML document –

ghci> import Text.XML.Light
ghci> let Just doc = parseXMLDoc xml

In this case, I’m dealing with RSS, which is fairly straightforward — I just want to grab all the “item” elements, then from each of those, grab the title, link and description. The only muddy thing about traversal with XML.Light is using QName (qualified names?) which really is just an extra step in the process. Fucking XML.

ghci> let items = findElements (QName "item" Nothing Nothing) doc
ghci> length items

And this is where I introduce totally unmaintainable code

ghci> let stuff = map (\e -> map (\n -> maybe "" strContent (findElement (
      QName n Nothing Nothing) e)) ["title", "guid", "description"]) items

ghci> stuff !! 0 !! 0
"Evangelion Head Interface ~ Rei Ayanami ver. "

Could probably clean that up a lot by sticking the lambdas and the ["title", ...] into where clauses but I am too lazy. Anyway, that’s all our stuff and we can just throw that into JSON with Text.JSON.encode and be on our way. Hooray.