Hurr, so I actually spent the night going through the Parsec documentation and implemented the first half of a shitty Yaml parser (doesn’t meet spec but who the fuck cares — it’s 100x better than the shit regexes I was using before). Still need to instance Read and Show properly (and then Data as well, and toYml/fromYml to marshall values around). But it seems to work so far.
Honestly, I couldn’t figure out most of Parsec’s combinators (specifically, the Token parsers) because they need a tokenizer definition or something and I was like “cbf’d”, so I only used the primitive constructs. Additionally, for some reason, I couldn’t get between to work — though I can’t fucking figure out what I couldn’t understand with it because it’s working now. But my re-implementation was kind of amusing –
pbetween o c p = do
o
v <- p
c
return v
I also ended up implementing a combinator to concatenate Parser results (inb4 someone optimizes these with Arrows) --
pjoin f p1 p2 = do
v1 <- p1
v2 <- p2
return $ f v1 v2
(<++>) = pjoin (++)
I guess I should stop ranting and be a bit more concrete about why Parsec is cool. Parsec is a combinator-based parsing framework, so instead of mucking around with specially-formatted lists of regular expressions (lex, I'm looking at you), you bind combinators together to describe the parser.
Parsec comes with a bunch of commonly used combinators like string, char and digit which are pretty bread-and-butter (though you can use satisfy, which takes an arbitrary Char -> Bool to do pretty much any low-level bullshit).
So let's say you wanted to parse an integer --
{- Helper function which returns "-" if it's there -}
pneg = option "" (string "-")
{- Parses an integer out of the stream -}
yvInteger :: Parser YamlValue
yvInteger = do
{- bind i to the negative sign (if it exists) concatenated with [0-9]+ -}
i <- pneg <++> many1 digit
return $ YInteger $ read i
Excessively verbose (noting discretely that that performs both lexing and parsing -- Parsec operates simply on streams of characters), but awesome.
Taking it a step further, the parser for lists isn't that bad either --
{- parse out lists like [1,"hello",1.45] -}
yvList :: Parser YamlValue
yvList =
{- make a list of the things between '[' and ']' which are separated by ',' -}
liftM YList $ btw $ sepBy yvValue sep
where
{- takes a parser and wraps '[' ']' around it's expected input -}
btw = between (char '[') (char ']')
{- just a separator -}
sep = char ','
And then I was going to show the yvObject function which keeps track of state (indentation for nested objects etc) but I'm too lazy to type any more; would be better for me to just type up some Haddock to refer to later. The full source is on github somewhere but they're taking forever to process my push so dicks etc.
10 comments