Archive

Archive for February, 2012

Parser combinators in Scala

February 23rd, 2012 1 comment

Lately, I’ve been writing some matrix factorization code for a video recommender system. I’m a big fan of Test Driven Development but it is a technique that is difficult for me to apply to the numerical or algorithmic domain. In order to make sure that the code I write is correct what I’m doing is the Oracle Test Pattern (I’m sure there is a correct name for this but I’m not able to find it right now). The idea is to have a reference implementation that solves the exact problem you are trying to solve. In my case, that reference implementation does not exists but I’m writing a straightforward unoptimized version of the script in Python using the Numpy libraries. That will be my ground truth when comparing results with the highly optimized parallel Scala version of the algorithm.

The problem

So the problem is that I’m working interactively coding the algorithm in an iPython session and I’m getting results of this kind

In [4]: x
Out[4]:
matrix([[ 0.03893565, -0.35836827, -2.06492572,  1.49773613, -1.01988835,
         -0.20590096, -0.19658741],
        [ 0.43055155,  1.07532444,  0.89299596, -1.070371  , -0.24015718,
          0.04521229, -1.39209522],
        [-0.4482701 ,  0.15201451, -1.42824771,  1.13859559,  0.66432642,
          0.51184435,  0.52637519],
        [-0.26518471, -1.14331753, -1.15492029, -0.27501194,  1.73750282,
         -1.4118682 ,  0.14701005],
        [-1.6577536 , -0.0781593 , -0.01558478,  0.67277257, -0.07249647,
          0.70946581, -0.82349608]])

and I would like to just copy and paste this string and use it as the expected value in my Scala tests. That would involve to remove “matrix”, all the “[" and "]“, substitute them for their Array() equivalents and put an F at the end of each number denoting that it is a Float. Too much work.

The solution

If there is an area where functional languages shine is in creating DSLs. More specifically creating an internal DSL that looks more like an external DSL. In Scala you can use the Parser Combinator libraries that are part of the Scala’s core libraries. Such parser will be something like

class PyMatrixParser extends JavaTokenParsers {
  def matrix:Parser[Array[Array[Float]]] = "matrix([" ~> repsep(row,",") <~ "])" ^^ (_.toArray)
  def row:Parser[Array[Float]] = "["~> repsep(floatingPointNumber,",") <~ "]" ^^ (_.map(_.toFloat).toArray)
}

using this parser is then just a matter of

val parser = new PyMatrixParser
val scalaMatrix = parser.parseAll(parser.matrix, theStringMatrix).get

quite good result for just 2 lines of code defining the parser.

This is the quick overview for those not familiar with Scala’s syntax

  • Every String that is found where a Scala Parser was meant to be, is automatically transformed into a constant parser that matches that exact string. So for instance,“matrix([“gets converted into a parser by one of the implicits in the parser combinators libraries.
  • rep(row,”,”) takes two parsers as parameters and means that parser #1 will be repeated 0 or more times interleaved by parser #2
  • The parser combinators “~>” and “<~" denote that the parser pointed by the arrow must be keep as the result of the parsing while the parser pointed by the tail must be discarded. This is helpful for combining two parser where one of them is just ornamental.
  • floatingPointNumber is a parser provided by the library that parses float point string representations
  • Each parser returns either the parsed string or a more complex Scala structure, like for instance, a list of strings in the case of rep(parser1,parser2). Those results are sent to a parser transformator (the ^^ operator) that works on the parser results and generates the true parsing result. In the example, first we create an array of Floats, and then an Array of Arrays of Floats that represent my matrix.

Really cool feature that has spared me a lot of grunt work by just writing two lines of code.

Categories: Programming Tags: ,

Scala project template (SBT 0.11.2)

February 15th, 2012 No comments

Some months ago I wrote a post about setting up a Scala project with SBT. SBT has evolved and the new version differs from the 0.7.7 instructions. There is a Github repository with the updated version for SBT 0.11.2.

  1. Download the sbt jar and create the script as it is explained here
  2. Create the project folder
    mkdir sbt-project-template
    cd sbt-project-template
    
  3. Create the project definition with the build.sbt file that must reside in the project’s root directory
    name := "TestProject"
    
    version := "1.0-SNAPSHOT"
    
    organization := "com.example"
    
    scalaVersion := "2.9.1"
    
    libraryDependencies += "org.scalatest" %% "scalatest" % "1.7.1" % "test"
    

    Blank lines in between are important. Don’t remove them.

  4. Let’s write some classes and tests in order to check that everything is OK. First create the class under test in src/main/scala/Calc.scala
    package com.example
    
    object Calc {
        def add(x:Int, y:Int) : Int = x + y
    }
    
  5. Now, create a test in src/test/scala/CaltTest.scala that checks that the class is working as expected
    package com.example.test
    
    import com.example.Calc
    import org.scalatest.Suite
    
    class CalcTest extends Suite {
    
        def testAddition() {
            assert(4 === Calc.add(2,2))
        }
    }
    
  6. Now just check that tests pass by doing
    sbt clean test
    

    you should see something like this:

    [info] Loading project definition from /home/cebrian/GIT/sbt-project-template/project
    [info] Set current project to TestProject (in build file:/home/cebrian/GIT/sbt-project-template/)
    [success] Total time: 0 s, completed 14-feb-2012 23:29:30
    [info] Updating {file:/home/cebrian/GIT/sbt-project-template/}default-4639fd...
    [info] Resolving org.scala-lang#scala-library;2.9.1 ...
    [info] Resolving org.scalatest#scalatest_2.9.1;1.7.1 ...
    [info] Done updating.
    [info] Compiling 1 Scala source to /home/cebrian/GIT/sbt-project-template/target/scala-2.9.1/classes...
    [info] Compiling 1 Scala source to /home/cebrian/GIT/sbt-project-template/target/scala-2.9.1/test-classes...
    [info] CalcTest:
    [info] - testAddition
    [info] Passed: : Total 1, Failed 0, Errors 0, Passed 1, Skipped 0
    [success] Total time: 7 s, completed 14-feb-2012 23:29:37
    

Simpler than before.

Categories: Programming Tags: , ,