My first brush with Cairngorm

So after reading “Professional Cairngorm” I decided that I would try using Cairngorm on my next project. Well that project is is the final stages of completion and I’d like to share my thoughts.

The project was a dynamically driven Flex app embedded in a web-browser that could also run on AIR. The application was read-only and did not collect data from the user but simply presented data one way. This app consisted of a your typical main screen, featuring configurable “portlets” that gave teasers into the deeper content contained within. It also had primary top navigation each of which drove to sub-screens which could render their data-sets in various customizable layout views, each of which having its own set of second level navigation to navigate within that sub screen. The data was stored in an XML implementation of an “Content Store” interface which could also be swapped out with a SQLite or web-service backed version if necessary. The lower level service layer that stored and managed all of the content as well as the navigation meta-data was wired together using Spring Actionscript and also leveraged other libraries like AS3 Commons.

For the application logic itself (i.e. how the views render, how the user navigates around, what screen renders what data etc) was all handled by using Caringorm 2.2.

By default, Cairngorm seems to drive the developer to be extremely granular with their Command, CairngormEvent and Delegate declarations. What this means is that if you are not careful, you will end up with an absolute overload of classes for a simple application. Instead I took the approach consolidating domain related functionality into larger Delegate classes which could handle more than one type of small specific operation. My Delegate classes abstracted away my lower level service layer (where the content and navigation meta-data was stored) away from the application and fulfilled its contracts through the delivery of VOs to the commands, subsequently to the model and finally to the views via Flex data binding.

I also tried to limit the number of events down to a total of four, one for bootstrapping Spring, one for handling a screen change, one for rendering the correct body and finally one for launching external resources. The number of VOs (value objects) I ended up with was fairly large (about 10) as I was very careful to create granular VOs so that I had zero coupling with the underlying service layer which was hidden away by the Delegate classes. The underlying service layer certainly had its own VO like objects which I could have permitted to bubble up to the model/view, but taking the decoupled VO approach is the way to go in an MVC framework such as Cairngorm. You can end up replicating a lot of properties etc, but it is worth it in the long run as your VOs help you gel your domain model without tying it to the underlying implementation.

In the past I’ve played around with PureMVC, as well as my own custom MVC like framework for throwing together quick Flex apps. Overall I would say that my experience with Cairngorm in this small app was a positive one. There was a slight learning curve to get going, but it was not difficult at all if you have used other MVC frameworks in any other language. Cairngorm is heavily dependent upon Flex/AS3’s internal data binding and that can be a positive or negative depending on your personal views, however I found it to be quite powerful and convenient.

On the downside, the most awkward thing in Cairngorm for me was the Model (ModelLocator), for a small app like this one it was fine, but it just seems like a giant “global variable” store (not seems, it is in fact one) and if you are doing a complex app, I feel it would be necessary to break this up into different model’s for different parts of the application. Ideally each view should not be directly bound to your custom ModelLocator as Cairngorm forces this which basically couples your view directly to a specific model implementation. Instead a DI approach might be better.

Thankfully this application did not have to use the ServiceLocator as I was using in-process services (not web-services or remoting). Why was I thankful? Well when I looked at the ServiceLocator the whole thing made me cringe! The documentation on how to use it as well as the source, plus the fact you declare an wire it up in MXML…. It just seemed like a combination of bad naming and bad implementation. Secondly, unless you are using the hardwired web-service, http object or remoting versions in that class, the ServiceLocator in Cairngorm appears to be useless out of the box without customization. You just can’t easily plop in your own services bound by an arbitrary key. Why don’t they support custom abstractions of a “service” beyond the default “remote” three provided in Flex? Why does a “service” have to be remote in Cairngorm? Why should it care?

Overall… like I said my experience was fairly positive and would use it again for a similar project. That said, I do have some issues with the framework. I also fundamentally don’t understand why this framework is the “de-facto” standard promoted by Adobe and seemingly viewed as required knowlege for any “real Flex/AS3 developer”. Shouldn’t knowing the principles of MVC be more of a requirement than simply the nuts and bolts of a specific framework implementation?

Next stop… I am going to try out some other AS3/Flex MVC frameworks. I’ll report back on those.

Security Sandbox Violation during run and debug after moving or copying a Flex project in Flex Builder

Ugh, this is one of those classic examples of getting sidetracked in your daily work because of some annoyances with Flex Builder. Today I copied an existing Flex Builder project in Eclipse to another directory (new renamed project). I then edited the project settings (renamed the project) before importing the new copied project into Eclipse. This worked great and Eclipse immediately imported the new project and recognized it as a Flex project.

The problem came up when I tried to debug my app. When I hit debug I got the dreaded *** Security Sandbox Violation *** (Adobe’s equivalent of the blue screen of death) once my app attempted to make an UrlLoader request for a sibling resource to the SWF. Now, at this point I was frustrated as this was an exact cloned copy of the project that still worked under the original project name in Eclipse.

Turns out, that during a true “Create new Flex Builder Project” setup in Eclipse, Flex Builder adds your project’s bin-release and bin-debug directories to the local Flash Players global security settings as a trusted resource. This is necessary so you can debug/run your Flex projects in eclipse without getting the Security Sandbox Violations when accessing local resources.

Now how to fix this?

Simply go here: http://www.macromedia.com/support/documentation/en/flashplayer/help/settings_manager04.html

and proceed to add your copied/new project’s bin-debug and bin-release dirs. Then next time you run, the security sandbox exceptions will go away, just as if you originally created the project in Flex Builder’s “wizards” of smart.

Book Review: Professional Cairngorm

This is a review of the book Professional Cairngorm by Jeremy Wischusen. When describing his motivations for writing this book the author tells a story about experience with Flex/AS3 development, but none of it using Cairngorm, then he had a job interview…. And in typical fashion, when he answered no to the “cairngorm experience” question, the interviewer basically said that if you have not used “Cairngorm” you don’t know anything about Flex/AS3. What a joke! This is a side rant, but I can relate to this guy as I have done about a year and a half of pretty serious AS3 development without Cairngorm, and after having read this book, I really don’t see why Cairngorm is such a major “must know” AS3/Flex MVC framework. The author does a good job covering the history of this framework, and all said and done, it sounds likes some flash developers originally threw the framework together then Adobe Consulting threw its official “stamp of approval” on it, and there you go. Everyone must learn Cairngorm…..

So that said, just take a look at Chapter 3, which describes the Cairngorm’s “ServiceLocator”. Wow, if that does not seem like an overly convoluted set of classes and interfaces just to lookup HTTP/WS/RMI services, then I am not sure what it is.

Moving on….. to the book. This is a decent book, and the author is correct that for a newcomer to Cairngorm, there is not a ton of very good documentation out there for those starting with this framework. The author does a fairly good job of presenting the framework and how it works in a simple straightforward approach. Roughly the first 1/2 of the book is dedicated to discussing the inner working of the major Cairngorm players, the last 1/2 of the book covers a detailed step by step case study (blog app) followed by good coverage of the communities Cairngorm complaints, expert tips and 3rd party extensions.

Complaints about this book: First off I felt like the author may have been rehashing a lot of the material that is already out there, given the many references to official docs and other resources that were cited. The second complaint is the horrible formatting of the code samples. The wrapping of code lines is awful and so is the spacing between methods etc. Very hard to read and they could have done a better job with that.

Overall: Do you need to learn Cairngorm in about a day? Get this book.

Skill levels/Audience: Targeted towards people with zero Cairngorm experience, but intermediate to advanced AS3/Flex experience. Backgrounds with other MVC frameworks is helpful in understanding the material. This book is NOT for people brand new to Flex/AS3.

FYI, last year I played around with PureMVC, and this year (2010) after reading this book, I am going to make an effort to do at least one or two projects using Cairngorm. Lets see how it goes and I will report back!

Book Review: Solr 1.4 Enterprise Search Server

This is a book review of Solr 1.4 Enterprise Search Server, by David Smiley and Eric Pugh

I picked up this book after hearing about Solr. I was looking into Drupal and trying to see what indexing engines were available for it and the only option that seemed to fit the bill was Solr. It became quite apparent to me, that Solr, being built on top of Lucene was quickly becoming a a favorite of developers out there. Having a fair amount of previous experience directly with Lucene I figured this would be a good book to get an introduction to this package.

Overall this book is pretty good. I’ve never read a book by Packt Publishing and compared to the Wroxs’, Oriellys’ and Mannings’ of the world, the publication does seem a bit rougher with some minor grammatical and editing errors… but overall those things don’t bother me.

This book pretty much appears to cover all the guts to get you up and running with Solr. Chapter one gives a solid overview of the platform, which Chapter 2 dives right into one of the most important items for anyone working with an indexing engine: text analysis (stemming, tokenization, index vs. query time analysis etc). Having dived into the guts of writing my own search engine using Lucene, I felt the authors did a pretty good job covering this important topic in the 2nd chapter.

Chapters 3 and 4 cover the basics of indexing and basic searching, which chapters 5 and 6 jump into the higher level components that Solr provides and which lots people are interested in nowadays: faceting, term highlighting, suggestions, spell checking etc.

Chapters 7 through 9 cover more systems administration related topics, such as deployment options, logging, monitoring, non-java clients/langs (PHP, Javascript, JSON) and finally on how to scale Solr both with vertical tips and solutions for horizontal (master/slave scenarios).

Overall I would highly recommend this book for anyone looking at Solr as a solution to add an indexing engine to their application. Having written a Lucene implementation in the past, I can appreciate a lot of the features that Solr appears to bring to the table so you don’t have to write them from scratch. The book presents much of the material in a straightforward manner targeted towards intermediate to advanced readers. Solr’s scaling capabilities look very attractive as well, either way I hope to get an opportunity to try this project out in the near future.

Gotcha! Tomcat strict quote escaping

So I was recently upgrading an older application from Tomcat 6.0.16 to 6.0.20 and started to see a JasperException appear on several JSP pages I was testing. The exception stated something similar to the following:

[insert your var here] is quoted with " which must be escaped when used within the value..

So I looked at my code and saw the exception was complaining about a line of code similar to the following which worked for years with no problems:

<jsp:include page="<%= "/dir/" +someVariable + "/page.jsp" %>" />

Huh? I had hundreds of JSP pages with syntax similar to the above which had been working for years. I thought to myself what a nightmare this would be to go find and fix all of these! However, after some searching around online I discovered this bug issue which talks about how Tomcat was not totally JSP 2.0 compliant and was handling such variable declarations incorrectly. Anyways, they “fixed” this major issue (following my sarcasm here?), in Tomcat 6.0.17 and luckily they anticipated what a huge issue this might cause for folks out here, so they introduced a catalina.propeties property that you can specify in your TOMCAT_HOME/conf/catalina.properties.

Just add the following config to your catalina.properties file and restart Tomcat:

# TURN OF STRICT PARSING EXCEPTIONS
org.apache.jasper.compiler.Parser.STRICT_QUOTE_ESCAPING=false

Hope that helps! Keep in mind that your code, if you don’t eventually “fix” it, may not work on other app servers.

Book Review: Java SOA Cookbook

This is a review of the book Java SOA Cookbook by Eben Hewitt

Not having much experience with “SOA”, other than a horrible experience with Apache Axis years ago, I figured it would be good to pick up a book that covers the latest and greatest in this field. “Java SOA Cookbook” is quite the publication. My impression was that this book covered just about everything someone would need to be exposed to, to either get started in the SOA world, or if you are already up to your neck in it and want a good desk reference. This book appears to be it. From SAAJ to JAX-WS, BPEL, ESBs, etc etc, this covers it all. From the very high level 50,000 architecture view (i.e modeling, governance), all the way down to the minutia of various XML Schema design patterns, vendor specific HOWTOs, and details of using TCPMon to monitor SOAP traffic, this book literally appears to cover all the bases.

In the traditional Orielly cookbook model, the book takes a problem solution approach. One of my favorite “problems” presented in the book, which is quite representative of what developers face when approaching the giant world of SOA, is worded as follows, and is VERY relateable!

“You need to start developing the web services for your SOA solution, but there are so many different pieces to put in place that you are not sure where to begin. You need to determine what has to be written by hand and what doesn’t.”

No **it… I loved this book!

Weighing in at ~700 pages, after reading this book I felt like I walked away with two things. First an overwhelming impression that the world of SOA is one of enormous complexity; and secondly, as with anything, when broken down to its smallest parts, anything can be accomplished and understood. This book does just that, and I highly recommend it to anyone who wants to get a solid understanding of SOA and secondly it would likely serve as a great desk reference for anyone who works in the SOA universe on a daily basis.

Side note, this book is heavily oriented to the traditional “big web-services” world (SOAP, WSDL, JAX-WS etc), however the author does give a good treatment to REST with a full chapter on it.

Recommended: YES

Skill level: Intermediate to advanced Java developers. I think this book would be beneficial to BOTH sides of the fence in the SOA world, those with little to no experience, all the way to those who build services on a daily basis.

Reading fixed length/width input records with Hadoop mapreduce

While working on a project where I needed to quickly import 50-100 million records I ended up using Hadoop for the job. Unfortunately the input files I was dealing with were fixed width/length records, hence they had no delimiters which separated records, nor did they have any CR/LFs to separate records. Each record was exactly 502 bytes in size. Hadoop provides a TextInputFormat out of the box for reading input files, however it requires that your files contain CR/LFs or some combination thereof.

So…. I went ahead a wrote a couple of classes to support fixed length, fixed width (same thing) records in input files. These classes were inspired by Hadoop’s TextInputFormat and LineRecordReader. The two classes are FixedLengthInputFormat and FixedLengthRecordReader, they are presented below. I have also created a Hadoop JIRA issue to contribute these classes to the Hadoop project.

This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat’s compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength)

FixedLengthInputFormat does NOT support compressed files. To use this input format, you do so as follows:

// setup your job configuration etc
...

// be sure to set the length of your fixed length records, so the
// FixedLengthRecordReader can extract the records correctly.
myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 502);

// OR alternatively you can set it this way, the name of the property is
// "mapreduce.input.fixedlengthinputformat.record.length"
myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",502);

// create your job
Job job = new Job(myJobConf);
job.setInputFormatClass(FixedLengthInputFormat.class);

// do the rest of your job setup, specifying input locations etc
...

myJob.submit();

Below are the two classes which you are free to use. Hope this helps you out if you have a need to read fixed width/length records out of input files using Hadoop MapReduce! Enjoy.

FixedLengthInputFormat.javadownload

package org.bitsofinfo.hadoop.mapreduce.lib.input;

import java.io.IOException;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

/**
 * FixedLengthInputFormat is an input format which can be used
 * for input files which contain fixed length records with NO
 * delimiters and NO carriage returns (CR, LF, CRLF) etc. Such
 * files typically only have one gigantic line and each "record"
 * is of a fixed length, and padded with spaces if the record's actual
 * value is shorter than the fixed length.<BR><BR>
 *
 * Users must configure the record length property before submitting
 * any jobs which use FixedLengthInputFormat.<BR><BR>
 *
 * myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);<BR><BR>
 *
 * This input format overrides <code>computeSplitSize()</code> in order to ensure
 * that InputSplits do not contain any partial records since with fixed records
 * there is no way to determine where a record begins if that were to occur.
 * Each InputSplit passed to the FixedLengthRecordReader will start at the beginning
 * of a record, and the last byte in the InputSplit will be the last byte of a record.
 * The override of <code>computeSplitSize()</code> delegates to FileInputFormat's
 * compute method, and then adjusts the returned split size by doing the following:
 * <code>(Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength)</code>
 *
 * <BR><BR>
 * This InputFormat returns a FixedLengthRecordReader. <BR><BR>
 *
 * Compressed files currently are not supported.
 *
 * @see	FixedLengthRecordReader
 *
 * @author bitsofinfo.g (AT) gmail.com
 *
 */
public class FixedLengthInputFormat extends FileInputFormat<LongWritable, Text> {

	/**
	 * When using FixedLengthInputFormat you MUST set this
	 * property in your job configuration to specify the fixed
	 * record length.
	 * <BR><BR>
	 *
	 * i.e. myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
	 */
	public static final String FIXED_RECORD_LENGTH = "mapreduce.input.fixedlengthinputformat.record.length"; 

	// our logger reference
	private static final Log LOG = LogFactory.getLog(FixedLengthInputFormat.class);

	// the default fixed record length (-1), error if this does not change
	private int recordLength = -1;

	/**
	 * Return the int value from the given Configuration found
	 * by the FIXED_RECORD_LENGTH property.
	 *
	 * @param config
	 * @return	int record length value
	 * @throws IOException if the record length found is 0 (non-existant, not set etc)
	 */
	public static int getRecordLength(Configuration config) throws IOException {
		int recordLength = config.getInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 0); 

		// this would be an error
		if (recordLength == 0) {
			throw new IOException("FixedLengthInputFormat requires the Configuration property:" + FIXED_RECORD_LENGTH + " to" +
					" be set to something > 0. Currently the value is 0 (zero)");
		}

		return recordLength;
	}

	/**
	 * This input format overrides <code>computeSplitSize()</code> in order to ensure
	 * that InputSplits do not contain any partial records since with fixed records
	 * there is no way to determine where a record begins if that were to occur.
	 * Each InputSplit passed to the FixedLengthRecordReader will start at the beginning
	 * of a record, and the last byte in the InputSplit will be the last byte of a record.
	 * The override of <code>computeSplitSize()</code> delegates to FileInputFormat's
	 * compute method, and then adjusts the returned split size by doing the following:
	 * <code>(Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength)</code>
	 *
	 * @inheritDoc
	 */
	@Override
	protected long computeSplitSize(long blockSize, long minSize, long maxSize) {
		long defaultSize = super.computeSplitSize(blockSize, minSize, maxSize);

		// 1st, if the default size is less than the length of a
		// raw record, lets bump it up to a minimum of at least ONE record length
		if (defaultSize <= recordLength) {
			return recordLength;
		}

		// determine the split size, it should be as close as possible to the
		// default size, but should NOT split within a record... each split
		// should contain a complete set of records with the first record
		// starting at the first byte in the split and the last record ending
		// with the last byte in the split.

		long splitSize = ((long)(Math.floor((double)defaultSize / (double)recordLength))) * recordLength;
		LOG.info("FixedLengthInputFormat: calculated split size: " + splitSize);

		return splitSize;

	}

	/**
	 * Returns a FixedLengthRecordReader instance
	 *
	 * @inheritDoc
	 */
	@Override
	public RecordReader<LongWritable, Text> createRecordReader(InputSplit split,
			TaskAttemptContext context) throws IOException, InterruptedException {
		return new FixedLengthRecordReader();
	}

	/**
	 * @inheritDoc
	 */
 	@Override
 	protected boolean isSplitable(JobContext context, Path file) {

 		try {
			if (this.recordLength == -1) {
				this.recordLength = getRecordLength(context.getConfiguration());
			}
			LOG.info("FixedLengthInputFormat: my fixed record length is: " + recordLength);

 		} catch(Exception e) {
 			LOG.error("Error in FixedLengthInputFormat.isSplitable() when trying to determine the fixed record length, returning false, input files will NOT be split!",e);
 			return false;
 		}

 		CompressionCodec codec = new CompressionCodecFactory(context.getConfiguration()).getCodec(file);
	 	if (codec != null) {
	 		return false;
	 	}

	 	return true;
	 } 

}

FixedLengthRecordReader.javadownload

package org.bitsofinfo.hadoop.mapreduce.lib.input;

import java.io.IOException;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.Seekable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.MapContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

/**
 *
 * FixedLengthRecordReader is returned by FixedLengthInputFormat. This reader
 * uses the record length property set within the FixedLengthInputFormat to
 * read one record at a time from the given InputSplit. This record reader
 * does not support compressed files.<BR><BR>
 *
 * Each call to nextKeyValue() updates the LongWritable KEY and Text VALUE.<BR><BR>
 *
 * KEY = byte position in the file the record started at<BR>
 * VALUE = the record itself (Text)
 *
 *
 * @author bitsofinfo.g (AT) gmail.com
 *
 */
public class FixedLengthRecordReader extends RecordReader<LongWritable, Text> {

	// reference to the logger
	private static final Log LOG = LogFactory.getLog(FixedLengthRecordReader.class);

	// the start point of our split
	private long splitStart;

	// the end point in our split
	private long splitEnd; 

	// our current position in the split
	private long currentPosition;

	// the length of a record
	private int recordLength; 

	// reference to the input stream
	private FSDataInputStream fileInputStream;

	// the input byte counter
	private Counter inputByteCounter; 

	// reference to our FileSplit
	private FileSplit fileSplit;

	// our record key (byte position)
	private LongWritable recordKey = null;

	// the record value
	private Text recordValue = null; 

	@Override
	public void close() throws IOException {
		if (fileInputStream != null) {
			fileInputStream.close();
		}
	}

	@Override
	public LongWritable getCurrentKey() throws IOException,
			InterruptedException {
		return recordKey;
	}

	@Override
	public Text getCurrentValue() throws IOException, InterruptedException {
		return recordValue;
	}

	@Override
	public float getProgress() throws IOException, InterruptedException {
		if (splitStart == splitEnd) {
			return (float)0;
		} else {
			return Math.min((float)1.0, (currentPosition - splitStart) / (float)(splitEnd - splitStart));
		}
	}

	@Override
	public void initialize(InputSplit inputSplit, TaskAttemptContext context)
			throws IOException, InterruptedException {

		// the file input fileSplit
		this.fileSplit = (FileSplit)inputSplit;

		// the byte position this fileSplit starts at within the splitEnd file
		splitStart = fileSplit.getStart();

		// splitEnd byte marker that the fileSplit ends at within the splitEnd file
		splitEnd = splitStart + fileSplit.getLength();

		// log some debug info
		LOG.info("FixedLengthRecordReader: SPLIT START="+splitStart + " SPLIT END=" +splitEnd + " SPLIT LENGTH="+fileSplit.getLength() );

		// the actual file we will be reading from
		Path file = fileSplit.getPath(); 

		// job configuration
		Configuration job = context.getConfiguration(); 

		// check to see if compressed....
		CompressionCodec codec = new CompressionCodecFactory(job).getCodec(file);
	 	if (codec != null) {
	 		throw new IOException("FixedLengthRecordReader does not support reading compressed files");
	 	}

		// for updating the total bytes read in
	 	inputByteCounter = ((MapContext)context).getCounter("FileInputFormatCounters", "BYTES_READ"); 

	 	// THE JAR COMPILED AGAINST 0.20.1 does not contain a version of FileInputFormat with these constants (but they exist in trunk)
	 	// uncomment the below, then comment or discard the line above
	 	//inputByteCounter = ((MapContext)context).getCounter(FileInputFormat.COUNTER_GROUP, FileInputFormat.BYTES_READ); 

		// the size of each fixed length record
		this.recordLength = FixedLengthInputFormat.getRecordLength(job);

		// get the filesystem
		final FileSystem fs = file.getFileSystem(job); 

		// open the File
		fileInputStream = fs.open(file,(64 * 1024)); 

		// seek to the splitStart position
		fileInputStream.seek(splitStart);

		// set our current position
	 	this.currentPosition = splitStart;
	}

	@Override
	public boolean nextKeyValue() throws IOException, InterruptedException {
		if (recordKey == null) {
		 	recordKey = new LongWritable();
	 	}

		// the Key is always the position the record starts at
	 	recordKey.set(currentPosition);

	 	// the recordValue to place the record text in
	 	if (recordValue == null) {
	 		recordValue = new Text();
	 	} else {
	 		recordValue.clear();
	 	}

	 	// if the currentPosition is less than the split end..
	 	if (currentPosition < splitEnd) {

	 		// setup a buffer to store the record
	 		byte[] buffer = new byte[this.recordLength];
	 		int totalRead = 0; // total bytes read
	 		int totalToRead = recordLength; // total bytes we need to read

	 		// while we still have record bytes to read
	 		while(totalRead != recordLength) {
	 			// read in what we need
	 			int read = this.fileInputStream.read(buffer, 0, totalToRead);

	 			// append to the buffer
	 			recordValue.append(buffer,0,read);

	 			// update our markers
	 			totalRead += read;
	 			totalToRead -= read;
	 			//LOG.info("READ: just read=" + read +" totalRead=" + totalRead + " totalToRead="+totalToRead);
	 		}

	 		// update our current position and log the input bytes
	 		currentPosition = currentPosition +recordLength;
	 		inputByteCounter.increment(recordLength);

	 		//LOG.info("VALUE=|"+fileInputStream.getPos()+"|"+currentPosition+"|"+splitEnd+"|" + recordLength + "|"+recordValue.toString());

	 		// return true
	 		return true;
	 	}

	 	// nothing more to read....
		return false;
	}

}

HBase examples on OS-X and Maven

Ok, so today I needed to get HBase 0.20.0 running on my local os-x box, simply in standalone mode. I am starting a project where I need to manage 50-100 million records and I wanted to try out HBase.

Here are the steps I took, the steps below are a consolidation of some pointers found in the HBase and Hadoop quick start guides.

A) Download HBase 0.20.X (currently 0.20.0), extract and install to /my/dir/hbase

B) Make sure your shell environment is setup to point to your Java 1.6 Home and your PATH is setup correctly which should be something like:

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
export HBASE_HOME=/my/dir/hbase
PATH=$PATH:$HBASE_HOME/bin:$JAVA_HOME:bin
export PATH

C) Even though we are running in standalone mode. HBase is built on top of Hadoop and Hadoop uses SSH to communicate with masters/slaves. So we need to make sure the process can ssh to the localhost without a passphrase. (My standalone setup of HBase would not start properly without this).

Lets check to see if you can SSH locally without a password. Type ssh localhost. If this fails, we need to permit this so run the following two commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost (you should be able to connect now)

D) Ok, at this point you should be able to fire up HBase, lets do the following:

/my/dir/hbase/bin/start-hbase.sh

Once started up type: /my/dir/hbase/bin/hbase shell this brings up the interactive console, sort of like the mysql console where you can directly interact with the database or in this case, not a database, but the HBase KV store. While on the console type status 'detailed'. If you get successful output we are good to go! Hbase is running, type exit to get back to the bash shell. Lets leave HBase running

MAVEN INTEGRATION

Ok, now we need to setup your Java classpath to include the HBase jars. They are all located at /my/dir/hbase/lib. If you are using Maven and want to get HBase configured in your project. You can use the following UN-OFFICIAL HBase Maven POM and deploy script listed below. These files were originally provided by Fivecloud's post located here and I upgraded them for HBase 0.20.0.

Deploy Script For Maven Dependancies

#! /bin/sh
#
# Deploy all HBase dependencies which are not available via the official
#	 maven repository at http://repo1.maven.org.
#
#
# This is for HBase 0.20.0
#
# Modified for HBase 0.20.0 from the original located at
# http://www.fiveclouds.com/2009/04/13/deploying-hbase-to-your-local-maven-repo/
#
# The maven repository to deploy to.
#

REPOSITORY_URL=file:///$HOME/.m2/repository

if [ -z $HBASE_HOME ]; then
	echo "Error: HBASE_HOME is not set." 2>&1
	exit 1
fi

HBASE_LIBDIR=$HBASE_HOME/lib

# HBase
#
mvn deploy:deploy-file -Dfile=$HBASE_HOME/hbase-0.20.0.jar \
	-DpomFile=hbase.pom -Durl=$REPOSITORY_URL

#Hadoop
mvn deploy:deploy-file -DgroupId=org.apache -DartifactId=hadoop \
	-Dversion=0.20.0 -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/hadoop-0.20.0-plus4681-core.jar

#thrift
mvn deploy:deploy-file -DgroupId=com.facebook -DartifactId=thrift \
	-Dversion=r771587 -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/libthrift-r771587.jar

#apache commons cli
mvn deploy:deploy-file -DgroupId=commons-cli -DartifactId=commons-cli \
	-Dversion=2.0-SNAPSHOT -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/commons-cli-2.0-SNAPSHOT.jar

#zookeeper
mvn deploy:deploy-file -DgroupId=org.apache.hadoop -DartifactId=zookeeper \
	-Dversion=r785019-hbase-1329 -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/zookeeper-r785019-hbase-1329.jar

# EOF

Unofficial "hbase.pom"

<?xml version="1.0" encoding="UTF-8"?>

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hbase</artifactId>
  <packaging>jar</packaging>
  <version>0.20.0</version>

  <name>Hadoop HBase</name>

  <dependencies>

    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>zookeeper</artifactId>
      <version>r785019-hbase-1329</version>
    </dependency>

    <dependency>
      <groupId>commons-cli</groupId>
      <artifactId>commons-cli</artifactId>
      <version>2.0-SNAPSHOT</version>
    </dependency>
    <dependency>
      <groupId>commons-collections</groupId>
      <artifactId>commons-collections</artifactId>
      <version>3.2</version>
    </dependency>
    <dependency>
      <groupId>commons-httpclient</groupId>
      <artifactId>commons-httpclient</artifactId>
      <version>3.0.1</version>
    </dependency>
    <dependency>
      <groupId>commons-logging</groupId>
      <artifactId>commons-logging</artifactId>
      <version>1.1.1</version>
    </dependency>
    <dependency>
      <groupId>commons-math</groupId>
      <artifactId>commons-math</artifactId>
      <version>1.1</version>
    </dependency>

    <dependency>
      <groupId>org.apache</groupId>
      <artifactId>hadoop</artifactId>
      <version>0.20.0</version>
    </dependency>

    <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.13</version>
    </dependency>

    <dependency>
      <groupId>jetty</groupId>
      <artifactId>org.mortbay.jetty</artifactId>
      <version>5.1.4</version>
    </dependency>
    <dependency>
      <groupId>jline</groupId>
      <artifactId>jline</artifactId>
      <version>0.9.91</version>
    </dependency>
    <dependency>
      <groupId>com.facebook</groupId>
      <artifactId>thrift</artifactId>
      <version>r771587</version>
    </dependency>
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-core</artifactId>
      <version>2.2.0</version>
    </dependency>
    <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.15</version>
    </dependency>
    <dependency>
      <groupId>xmlenc</groupId>
      <artifactId>xmlenc</artifactId>
      <version>0.52</version>
    </dependency>
    <dependency>
	    <groupId>org.apache.geronimo.specs</groupId>
	    <artifactId>geronimo-j2ee_1.4_spec</artifactId>
	    <version>1.0</version>
	    <scope>provided</scope>
    </dependency>

  </dependencies>

	<repositories>
		<repository>
			<id>virolab.cyfronet.pl</id>
			<name>virolab.cyfronet.pl (used for commons-cli-2.0)</name>
			<url>http://virolab.cyfronet.pl/maven2</url>
		</repository>
	</repositories>

</project>

LETS ACTUALLY USE HBASE...

F) Now it's time to fire up a Java app to do some basic HBase operations. The examples below are simple tests that are setup to run in JUnit within Spring on my box, so you can ignore the method names and the @Test annotations, as the meat of the examples are in the method bodies.

CREATE A TABLE EXAMPLE

Ok, HBase is NOT an RDBMS but a Big Table implementation. Think of it as a giant Hashtable with more advanced features. However for the purposes of this example I will speak in terms of Rows/Columns etc which are similar in concept to that of a database and are what most folks are familiar with.

The MOST important thing is that you COPY the /my/dir/hbase/conf/*.xml default HBase configuration files to someplace on your classpath. These files can be customized, but for straight out of the box testing they work as-is. Just MAKE SURE they are on your classpath before starting as HBaseConfiguration instances look for them there.

HBaseConfiguration config = new HBaseConfiguration();
HBaseAdmin admin = null;
try {
	// HBaseAdmin is where all the "DDL" like operations take place in HBase
	admin = new HBaseAdmin(config);
} catch(MasterNotRunningException e) {
	throw new Exception("Could not setup HBaseAdmin as no master is running, did you start HBase?...");
}

if (!admin.tableExists("testTable")) {
	admin.createTable(new HTableDescriptor("testTable"));

	// disable so we can make changes to it
	admin.disableTable("testTable");

	// lets add 2 columns
	admin.addColumn("testTable", new HColumnDescriptor("firstName"));
	admin.addColumn("testTable", new HColumnDescriptor("lastName"));

	// enable the table for use
	admin.enableTable("testTable");

}

// get the table so we can use it in the next set of examples
HTable table = new HTable(config, "testTable");

After running the above code fire up the HBase shell /my/dir/hbase/bin/hbase shell and once up, type "list" at the hbase shell prompt, you should see your testTable listed! Yeah!

ADD A ROW TO THE TABLE

// lets put a new object with a unique "row" identifier, this is the key
// HBase stores everything in bytes so you need to convert string to bytes
Put row = new Put(Bytes.toBytes("myID"));

/* lets start adding data to this row. The first parameter
is the "familyName" which essentially is the column name, the second
parameter is the qualifier, think of it as a way to subqualify values
within a particular column. For now we won't so we just make the qualifier
name the same as the column name. The last parameter is the actual value
to store */

row.add(Bytes.toBytes("firstName"),Bytes.toBytes("firstName"),Bytes.toBytes("joe"));
row.add(Bytes.toBytes("lastName"),Bytes.toBytes("lastName"),Bytes.toBytes("smith"));

try {
	// add it!
	table.put(row);
} catch(Exception e) {
    // handle me!
}

Ok, now go back to the HBase shell and type count 'testTable' and you should get one record accounted for. Good to go!

GET A ROW FROM THE TABLE

// a GET fetches a row by it's identifier key
Get get = new Get(Bytes.toBytes("myID"));

Result result = null;
try {
	// exec the get
	 result = table.get(get);
} catch(Exception e) {
	// handle me
}

// not found??
if (result == null) {
	// NOT FOUND!
}

// Again the result speaks in terms of a familyName(column)
// and a qualifier, since ours our both the same, we pass the same
// value for both
byte[] firstName = Bytes.toBytes("firstName");
byte[] lastName = Bytes.toBytes("lastName");
byte[] fnameVal = result.getValue(firstName,firstName);
byte[] lnameVal = result.getValue(lastName,lastName);

System.out.println(new String(fnameVal) + " " + new String(lnameVal));

DELETE A ROW FROM THE TABLE

Delete d = new Delete(Bytes.toBytes("myID"));

try {
	table.delete(d);
} catch(Exception e) {
	// handle me
}

Now fire up the hbase shell again and type "count 'testTable'", you should now get zero rows.

OK, well I hope that helped you get up and running with some HBase basics!

Integrating Restlet with Spring

For those of you out there who would like to get Restlet 2.0 (currently the M5) release up integrated with your existing Spring application, hopefully this post will be of some help. I recently had to do this and unfortunately the documentation related to Spring integration on the Restlet site is scattered across various docs and some of it appears out of date. What I am describing below worked with source code straight from the Restlet SVN trunk just before the M5 release so you should be good to go if you use the M5 release (JEE edition)

First off, I am assuming you have an existing web application with a web.xml file and are using Spring. Secondly I am just trying to give you some working web.xml and the corresponding Spring configuration to get up and running. I am not explaining the details of how Restlet works as you can find that on the Restlet site.

First you will want to make sure you have the Restlet JEE 2.0 M5 edition. Make sure you grab the JEE version and not the JSE version as the latter does not include the Spring integration extension. Once downloaded extract the ZIP to a location on your drive. The JEE zip package contains a ton of Restlet Jar files. The three we care about are org.restlet.jar, org.restlet.ext.spring.jar, org.restlet.ext.servlet.jar

If you are using Maven you can add the following repository and dependencies to your POM by using the repository instructions on the Restlet site. NOTE as of today: the Restlet repository currently does NOT have the M5 release up there so you are going to manually have to add the M5 jar to your repository by doing the following for each of the 3 jars.

mvn install:install-file -Dfile=/PATH/TO/RESTLET-m5-ZIP-EXTRACT-DIR/lib/org.restlet.jar -DgroupId=org.restlet -DartifactId=org.restlet -Dversion=2.0-SNAPSHOT-M5 -Dpackaging=jar

mvn install:install-file -Dfile=/PATH/TO/RESTLET-m5-ZIP-EXTRACT-DIR/lib/org.restlet.ext.spring.jar -DgroupId=org.restlet -DartifactId=org.restlet.ext.spring -Dversion=2.0-SNAPSHOT-M5 -Dpackaging=jar

mvn install:install-file -Dfile=/PATH/TO/RESTLET-m5-ZIP-EXTRACT-DIR/lib/org.restlet.ext.servlet.jar -DgroupId=org.restlet -DartifactId=org.restlet.ext.servlet -Dversion=2.0-SNAPSHOT-M5 -Dpackaging=jar

The above 3 commands will manually install the three Jars into your Maven repository. Next you can configure your POM to add the official Maven repository plus the dependencies to the 3 Jars you installed above. Note that the repository entry is sort of meaningless at this point because you manually installed the jars above. It is IMPORTANT that the version elements in your dependencies below MATCH exactly the versions you specified in the commands above!

	<repository>
    	<id>maven-restlet</id>
    	<name>Public online Restlet repository</name>
    	<url>http://maven.restlet.org</url>
	</repository>

	<dependency>
    	<groupId>org.restlet</groupId>
    	<artifactId>org.restlet</artifactId>
    	<version>2.0-SNAPSHOT-M5</version>
	</dependency>

	<dependency>
    	<groupId>org.restlet</groupId>
    	<artifactId>org.restlet.ext.spring</artifactId>
    	<version>2.0-SNAPSHOT-M5</version>
	</dependency>

	<dependency>
    	<groupId>org.restlet</groupId>
    	<artifactId>org.restlet.ext.servlet</artifactId>
    	<version>2.0-SNAPSHOT-M5</version>
	</dependency>

Ok, great. Next we need to configure your web.xml, open it up and add the following entries in the appropriate spots:

  	<servlet>
      	<servlet-name>myRESTApi</servlet-name>
      	<servlet-class>org.restlet.ext.spring.SpringServerServlet</servlet-class>
      	 <init-param>
                <param-name>org.restlet.component</param-name>
                 <!-- this value must match the bean id of the Restlet component you will configure in Spring (below) -->
                <param-value>restletComponent</param-value>
         </init-param>
  	</servlet>

  	<servlet-mapping>
        <servlet-name>myRESTApi</servlet-name>
        <url-pattern>/my/REST/api/*</url-pattern>
  	</servlet-mapping>

Now your web.xml is configured to take all requests to /my/REST/api/* and send those to a Restlet Component which you will wire up in your Spring configuration. So... bring up your applicationContext.xml or whatever you have it named and add the following entries:


<!-- our SpringComponent which binds us to the Restlet servlet configured above -->
<bean id="restletComponent" class="org.restlet.ext.spring.SpringComponent">
         <!-- the defaultTarget for this component is our Restlet Application -->
	<property name="defaultTarget" ref="myRestletApplication" />
</bean>

<!-- your Restlet application. This class extends "org.restlet.Application" -->
<bean id="myRestletApplication" class="my.restlet.MyRestletApplication">
         <!-- all requests to this Application will be sent to myPath2BeanRouter -->
	<property name="root" ref="myPath2BeanRouter"/>
</bean>

<!-- This router automagically routes requests to beans that extend org.restlet.resource.ServerResource or org.restlet.Restlet who's name starts with a "/" slash which matches the request-->
<bean name="myPath2BeanRouter" class="org.restlet.ext.spring.SpringBeanRouter"/>

<!-- This extension of org.restlet.resource.ServerResource bean will handle all requests to made to /my/REST/api/myResource (GET/POST/PUT etc)
This class extends "org.restlet.Restlet" or "org.restlet.resource.ServerResource" -->
 <bean name="/myResource" autowire="byName" scope="prototype"
    		class="my.restlet.package.resources.MyResourceBean">

    	<property name="somePropertyOfMine" ref="someOtherSpringBean"/>
 </bean>

Ok, well if you were having trouble trying to get Spring working with Restlet I hope this helped get you rolling. Restlet is a cool project that works great and can get a REST API up and running pretty quickly (granted you are good at crawling through somewhat scattered documentation) Here are a few other links which you may want to reference:

Restlet 2.0 Extensions API
Restlet 2.0 JEE API

Also, I am posting the following error messages that troubled me when trying to get this to work. The configuration I show above was the result of getting beyond the below errors by using the correct fixed releases. To AVOID the errors below, ENSURE you are using Restlet 2.0 M5 or a custom build from the trunk. Prior to 9/25/09 people were getting the errors below.

Message ID
No target class was defined for this finder
Complete Message org.restlet.ext.spring.SpringFinder $$ EnhancerByCGLIB

Review: Cloud Application Architectures

caaThis is a review of the book “Cloud Application Architectures” by George Reese

At about 200 pages, this book packs a lot of solid recommendations on deploying and managing an application within the cloud. The book has an admitted AWS slant, however the author covers two other providers, GoGrid and Rackspace in the appendix. That said, the book does not treat the cloud computing topic with low-level command references or specific examples using AWS; you will not find those here (except a handy AWS command reference in the appendix), however this book covers the higher level application architectural issues within an AWS framework. The book does this nicely as the author covers many issues that face architects who need to deploy to the cloud, from both the technical and business consideration perspectives. This includes such things as application design issues, machine images, performance and disaster recovery, but also security, regulatory compliance and cost issues from the business side of things.

The author gives good coverage to the various issues you will need to keep in mind when it comes to using cloud services, specifically backup strategies, security, strategies, database performance and capacity planning. However when reading this book, one cannot help but to think… “How are these issues that much different than a non-cloud deployment? Why are they more important in the cloud than outside of it?” Well, the answer is that they are not, they apply to both worlds. When it comes to application design, database strategies, backups, security and capacity planning, all of these details and strategies laid out in this book are great advice for operating outside of the cloud as well. But what you will find in this book are some of those AWS nuances that the author has encountered which are very important to be aware of and will vary the ways you approach different problems when using such a service.

That said, I really recommend this book for any architect who wants to learn more about some of the issues you will face when deploying in the cloud, as well as simply a great book on general architectural and business issues that any application will face; whether it is deployed within or outside of a cloud service.

Recommended: Yes
Skill Level: Intermediate to advanced system architects, CTOs etc.