2014-04-20

Cassandra with Node.js and Arduino

Intro

This post continues where this post stopped. The Cassandra setup used for this post is more or less the same so please read this post if you are interested in cassandra setup before continuing with the rest of the post.

Arduino

Learning big data stuff is most exciting when the data represents something from the real world and not something generated with a help of big loop and then randomized data in it. To create data for this example I've used the following components:

  1. arduino uno
  2. Photoresistor GL5528 LDR
  3. 10K OHM NTC Thermistor 5mm
  4. 2x 10k resistor
  5. Protoboard
  6. Wires
Couple of this inexpensive components combined with arduino give us a nice big data sensor / generator. Now it might not seem that complicated but sampling any data at a one second level will hit on the cassandra limitations after one month of sampling if not done right, so having a simple arduino setup is fun and motivating way to tackle learning cassandra stuff. For now let's concentrate on the arduino part. The wiring is shown here:


The Arduino sketch will be on the gitHub, so we'll concentrate on the important parts. The light level in this example is read at analog 0. Reading analog values in arduino results in values ranging from 0-1023. We'll define light level as a mapping from 0-1023 into 0-100. Arduino already has a built in function for this called map. Also, I had some trouble in my initial experiments with Arduino serial communication and reading pin values. The data written to the serial port simply got corrupted after a while. I've read a couple of forums on this subject and found out that it actually helps when one delays execution after reading a pin value for 1ms. Also to keep the things as stable as possible we'll pause the execution for 1 second after writing to serial port as shown here:

  int light = map(analogRead(0), 0, 1023, 0, 100);
  delay(1);

    ....

  sprintf(sOut, "%d,%s", light, deblank(sTemp));

  Serial.println(sOut);
  delay(1000);
 


Node.js and Cassandra

Parsing the messages that come from the measuring devices is pretty repetitive stuff that causes pretty ugly code. I've learned that the hard way. To make parsing of this messages as easy as possible I've written a small utility package for parsing the messages that come from the measuring devices and it's available on npm.

Using serial ports in node.js doesn't take a lot of steps to setup:

  var serial = require( "serialport" );
  var SerialPort = serial.SerialPort;

  var portName = "/dev/tty.usb-something";

  var sp = new SerialPort(portName, {
      baudrate:9600,
      parser:serial.parsers.readline("\n")
  });

  sp.on("data", function ( data ) {
  var arduinoData = translator.parse(data);
  //...
 

To make the data handling easier and more in accordance with cassandra best practices the readings will be partitioned by date when they were recorded.

  CREATE TABLE room_data (
    day text,
    measurementtime timestamp,
    light int,
    temperature float,
    PRIMARY KEY (day, measurementtime)
  ) WITH CLUSTERING ORDER BY (measurementtime DESC);
 

Also the data will probably be more often fetched for recent time stamps with queries that have limits set on them. To make this fetching easier we've added a clustering statement above. Also to get the current light and temperature level we would just have to run the following query (no where combined with now function):

  SELECT * FROM room_data LIMIT 1;
 

After setting up the cassandra and reading the data from the serial port and parsing the data it's time to write this data into the cassandra. Analyzing the data and doing something useful with it will be in some future posts that I'll make but for now I'll stop with writing the data into cassandra:

  client.execute('INSERT INTO room_data ' + 
   '(day, measurementtime, light, temperature)' + 
   ' VALUES (?, dateof(now()), ?, ?)',
   [
    moment().format('YYYY-MM-DD'),
    arduinoData.light,
    arduinoData.temperature
   ],
   function(err, result) {
    if (err) {
     console.log('insert failed', err);
    }
   }
  );
 

On the fifth line I've used moment.js to format current time into string representation of current date used for partitioning in cassandra. The rest of the code is pretty much the usual sql stuff found in other database environments.

I recorder couple of hours worth of data here. Just in case anybody wants a sneak peak without having to setup everything up. I've exported the data out from cassandra trought cql using this command:

  COPY room_data (day, measurementtime, light, temperature) 
   TO 'room_data.csv';
 

The rest of the example is located on gitHub.