Table of Contents
- Parsing – the Ugly
- Regexes
- Parser Combinators
- ReadP
- metar
- Inspection
- This or That Parsers
- Maaaybeee…
- Tying Together
Parsing is something every programmer does, all the time. Often, you are lucky, and the data you receive is structured according to some standard like json, xml … you name it. When it is, you just download a library for converting that format into native data types, and call it a day.
Sometimes, you are not quite so lucky. Sometimes you get data in an unstructured, badly documented “miniformat”, such as the various ways in which people write phone numbers, license plates and social security numbers, the output of command line interfaces or systematically named files on the file system. Sometimes you’re actually dealing with a standard format, but you have no parser for it because it’s not very popular or well-known. Or maybe you’re reading input from the user and you want a relatively user-friendly format for it.
This is when you need to write a parsing routine of some sort, and there are a few ways of doing it. In Haskell, we prefer using parser combinators. I’ll take a couple of minutes to show you why. If you already know why it’s important to learn parser combinators, feel free to skip down to the heading ReadP.
Parsing – the Ugly
When I just started out with programming, I tended to roll my own parsing routine by splitting strings on keywords, and comparing to known values. For example, metar reports (an international semi-standard format for reporting conditions on airports, such as weather, cloud layers, humidity and such) can look like
BIRK 281500Z 09014KT CAVOK M03/M06 Q0980 R13/910195
Here, the third “word”, i.e. 09014KT contains information about the wind. If I want to extract the wind speed (14, for now ignoring the unit knots), I might do something like
windSpeed :: String -> Maybe Int
windSpeed windInfo =
let
-- remove the wind direction
speed = drop 3 windInfo
in
-- remove the "knots" unit and read number
readMaybe (take (length speed - 2) speed)
Ignoring the fact that this is pretty hard to read after a few months11 Why does this number 2 pop up? What is the significance of the length of the speed value? this is also not very stable code. Some metar reports specify wind speed in m/s, and look what happens when we feed that to our function:
λ> windSpeed "09007MPS"
Nothing
Why does it return Nothing? Well, one of our magic constants, one of the ~2~s in the code, does not apply to wind speed numbers stated in m/s. Our code was specific to knots. Whoops. You can of course work around this by checking what the first letter of the wind speed string is but at this point it’s getting fairly complicated already.