Why I am learning Haskell

I can not remember exactly when and why I heard of Haskell in the first time. What I can recall vividly is the moment when I decided that I was committed to learn the language. It happened when reading the article “Beating the averages” from Paul Graham. In this article, Paul Graham describes how he was coding in Lisp the software of his startup, an online store called ViaWeb that later was acquired by Yahoo and became Yahoo Store. He says that coding in Lisp was their main advantage against possible competitors. He even jokes about not worrying of competitors that where hiring Oracle or Java specialists, he knew they were doomed from then on. This article resonated inside me when I read it because at that time I was tinkering with the idea of starting a Micro-ISV as a side project. Haskell was going to be for me what Lisp was for Paul Graham, the high level language able to increase the productivity of a solo programmer by several folds.

On the other hand, learning Haskell is an stimulating intellectual task. Pure functional programming is something worth learning by itself. It will make you a better programmer even if you never happen to write a single functional line of code. Your imperative style will also improve. Like quiting smoking, Haskell is very easy to learn, I’ve learned it a lot of times. I started with a Spanish book, Razonando con Haskell, that makes a very good job introducing the language but that fails blatantly from a pedagogical point of view when you get into monads and the design of programs. But then, Real World Haskell came out and IMHO it has helped to push forward Haskell into a real mainstream language. If you are planning to learn the language don’t forget to read also Learn You a Haskell for Great Good! a high quality freely available Haskell book.

More recently, and related to my current work, Haskell seems the perfect tool for writing parallel programs. Haskell is recommended by data scientists for doing real programming. Due to its pure functional nature, bugs are easier to spot in Haskell because the language design difficults introducing side effects that are so common in other languages. Haskell also supports esoteric concurrent constructs you haven’t seen in other programming languages like Software Transactional Memory. But above all, pure code can be easily manipulated and reasoned about by other programs. That means that you, as Joe the programmer, can rely on the compiler for doing automatic parallelization. You can easily parallelize already functioning algorithms by just doing some code annotations, the compiler will do all the spawning and synchronization under the hood.

In this vein, there is an under explored area that I’d like to learn more about, Nested Data Parallelism. Unless you have been living in a cave during the past years, the MapReduce programming paradigm and its reference implementation Hadoop have taken the parallel world by storm. This new paradigm allows to process data volumes that were unthinkable some years ago. The MapReduce paradigm represents a pattern for parallelization called Flat Data Parallelism. This means that in these problems, data is more important than computation in contrast to usual scientific calculations where operations are the dominant factor. The flat word means that the problems that are best suited for this paradigm are those in which the data involved can easily be bucketed on equal sized bins and processed in parallel. MapReduce is the best tool for performing ETL in a data warehouse, indexing the web like Google does, log mining, or image transformations. Huge amounts of independent data, processed on parallel tasks. But what happens when you have irregular data like trees or graphs? You can not partition the data into equally sized bins because you don’t know the size of each bin. For instance, imagine that you want to multiply a big sparse matrix with a vector (an operation that happens a lot in Recommender Systems using Collaborative Filtering). Your first thought will be to split equally the rows in the matrix and send each bin to a thread or computing node, but because of the sparsity of the matrix, some rows are short while others are longer. That means that some bins take longer than others to compute introducing delays when the main program has to wait to gather the results. Data Parallel Haskell is an attempt to implement Nested Data Parallelism in the Haskell language itself. Simont Peyton Jones and other Haskell gurus are working hard to explore this new kind of parallelism using the facilities of the language.