Home > Data Mining > Getting started with JAGS for Bayesian Modelling

Getting started with JAGS for Bayesian Modelling

In the past if you wanted to do some Bayesian Modelling using a Gibbs sampler you had to rely on Winbugs but it isn’t open source and it only runs in Windows. The JAGS project started as a full feature alternative for the Linux platform. Here are some instructions for getting started

First install the required dependencies. In my Ubuntu 11.04, is something like:

sudo apt-get install gfortran liblapack-dev libltdl-dev r-base-core

For Ubuntu 11.04 you have to install JAGS from sources, but it seems that this version will be packaged in the next Ubuntu release. Download the software from here and install.

tar xvfz JAGS-3.1.0.tar.gz
cd JAGS-3.1.0
sudo make install

Now fire R and install the interface package rjags

$ R
> install.packages("rjags",dependencies=TRUE)

Now let's do something interesting (although pretty simple). Let's assume we have a stream of 1s and 0s with an unknown proportion of each one. From R we can generate such distribution with the command

> points <- floor(runif(1000)+.4)

that generates a distribution with roughly 40% of 1s and 60% of 0s. So, our stream consists of a sequence of 0s and 1s generated using the uniform(phi) distribution where the phi parameter equals 0.4.

If we don't know this parameter and we try to learn it, we can assume that this parameter has prior uniform in the range [0,1] and thus the model that describes this scenario in the Winbugs language is

    phi ~ dunif(0,1);
    y ~ dbin(phi,N);

In this model N and y are known, so we provide this information in order to estimate our unknown parameter phi. We create the model and query the resulting parameter distribution:

> result <- list(y=sum(points), N=1000)
> result
[1] 393

[1] 1000

> library(rjags)
Loading required package: coda
Loading required package: lattice
linking to JAGS 3.1.0
module basemod loaded
module bugs loaded
> myModel <- jags.model("model.dat", result)
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 5

Initializing model

> update(myModel, 1000)
  |**************************************************| 100%
> x <- jags.samples(myModel, c('phi'), 1000)
  |**************************************************| 100%
> x
[1] 0.3932681

Marginalizing over: iteration(1000),chain(1) 


So the inferred value of phi is 0.3932. One interesting thing in Bayesian statistics is that it does not estimate points, but probabilistic distributions over the parameters. We can see how the phi parameter was estimated by examining the Monte Carlo Chain and the distribution of the generated values during the simulation

> chain <- as.mcmc.list(x$phi)
> plot(chain)

Where we can see that the values for phi in the chain were centered around the 0.4, the true parameter value.

Categories: Data Mining Tags: , , ,
  1. mrskills
    April 24th, 2012 at 13:51 | #1

    I don’t understand why you go to all that trouble to call JAGS when you’ve already got the estimate for phi after summing all the 1s? It comes out as 393/1000 for your example which is exactly what your parameter is estimated at the end. Can you explain?

  2. tonicebrian
    April 24th, 2012 at 14:01 | #2

    We you are doing bayesian statistics your parameters are random variables not point estimations. A single number, 0.393, gives you no information about how the parameter under study is distributed. I need all the other stuff for knowing how sure I am about picking the median (0.3932681) as the true parameter value (0.4)

  3. mrskills
    April 24th, 2012 at 14:17 | #3

    I’m trying to learn Bayesian analysis so please bear with me :) The information you have acquired from the data is merely N=1000 and y=393, nothing more is known from the data (ie points). We aren’t looking at the distribution of the points after plucking the two summaries (N,y). That means in those two values you have all the information available to you at the beginning of the inference.

    So my question is how are we getting more information out of the JAGS model when it doesn’t even look at all the data we know?

  4. tonicebrian
    April 24th, 2012 at 14:50 | #4

    First, keep in mind that the motivation of the post was to show how to install JAGS, build a model and query it from R, so here we are dealing with the simplest model it came to my mind. And, given the model and the data, what JAGS does is to run lots of random samples conditioned to the evidence (y=393) and from those points I can infer different moments of the random variable for instance.

  5. mrskills
    April 24th, 2012 at 15:31 | #5

    Understood. I guess I need to do more reading on Bayesian Analysis.

    Btw yours was the best introduction i’ve come across for getting started with Bayesian software. I’ve read quite a few and this was the most accessible. The first one that I actually got to work. The fact that i’ve used it as a complete beginner to try out a whole analysis is amazing. Thanks for writing it up.

  6. tonicebrian
    April 25th, 2012 at 12:34 | #6

    Thanks a lot!! it helps to know that your rants are useful to someone ;)

  7. December 14th, 2012 at 08:02 | #7

    Nice post, thanks for writing it Toni!
    ( one really minor point; it’s install.packages(“rjags”,dependencies=TRUE) ie packages is plural )

  8. Toni Cebrián
    December 14th, 2012 at 13:05 | #8

    Done!! Thanks for pointing it out

  1. No trackbacks yet.