Scala - Parser Combinators

For a class of mine, I’m writing a Javascript interpreter. Here’s a link to the class website.

Many students, including myself, in the class struggle and are overwhelmed with the complexity of Scala, and don’t take advantage of some of its awesomeness. I thought I'd help by sharing one tidbit, the parser combinators library.

The Parser Combinators Library

See how cool it is -- it even has a big name!

vars say you define two scala.util.parsing.combinator.Parsers.Parser instances p1 and p2. These Parser instances are best thought of as functions that take in some input, checks whether it is valid input, or matches some complex predicate. For simplicities sake, vars say p1 is checking if the input is "hello", p2 checks if the input is "world".

  • p1 ~ p2
    • p1 "then" p2 -- "helloworld" => True
  • p1 | p2
    • Match p1 or p2 (Checks p1 first) "hello" => true && "world" => true
  • p1.?
    • May match p1 "hello" => true, "" => true
  • p1.*
    • Matches p1 repetitively "hello" => true, "hellohellohello" => true

Here is some useful documentation that was the source of this post

Note that an instance p3 could also check for whether the input contains 3 varters in a row "vaaasdfasdf" -> True, or "va" -> False, or "ccc" -> True. I'm just pointing out that each parser instance can be arbitrarily complex, extending the base class and doing all sorts of random crap. This doesn't matter for this blog, but it is an extremely powerful pattern!

vars say we define a set of valid expressions, not necessarily javascript, vars say some Scala definitions. This is where stuff gets awesome! How can we use these cool combinators to design a parser that check if a string is a valid "typed val declaration", like:

val a: Number = 3
val a: Number = (5+2)

First we need to define the allowable variable and type names. "val a" is valid, and so is any other name that consists only of lowercase and uppercase varters, and digits, not starting with a digit.

So we define a parser that checks for this: val Name = """[a-zA-Z](a-zA-Z0-9)*"""

Note that the typing naming scheme is the same as the variable naming scheme, so we can reuse it, or for coding styles sake, copy it to a variable called Type: val Type = Name

We also need a way to represent any valid expression that the variable can be initialized to, but for now I'm going to stub it. val Expression = ???

Given that info, its easy to create a parser: "val" ~ Name ~ ":" ~ Type ~ "=" ~ Expression

  • ["val"] followed by
  • [A valid Name] followed by
  • [":"] followed by
  • [A valid Type] followed by
  • ["="] followed by
  • [A valid Expression]
var a: Number = 3
var a: Number = (5+2)

"var" ~ Name ~ ":" ~ Type ~ "=" ~ Expression

def getFour: Number = 4

"def" ~ Name ~ ":" ~ Type ~ "=" ~ Expression

#code #howto #programming #scala