Monthly Archives: October 2009

HBase examples on OS-X and Maven

Ok, so today I needed to get HBase 0.20.0 running on my local os-x box, simply in standalone mode. I am starting a project where I need to manage 50-100 million records and I wanted to try out HBase.

Here are the steps I took, the steps below are a consolidation of some pointers found in the HBase and Hadoop quick start guides.

A) Download HBase 0.20.X (currently 0.20.0), extract and install to /my/dir/hbase

B) Make sure your shell environment is setup to point to your Java 1.6 Home and your PATH is setup correctly which should be something like:

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
export HBASE_HOME=/my/dir/hbase
PATH=$PATH:$HBASE_HOME/bin:$JAVA_HOME:bin
export PATH

C) Even though we are running in standalone mode. HBase is built on top of Hadoop and Hadoop uses SSH to communicate with masters/slaves. So we need to make sure the process can ssh to the localhost without a passphrase. (My standalone setup of HBase would not start properly without this).

Lets check to see if you can SSH locally without a password. Type ssh localhost. If this fails, we need to permit this so run the following two commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost (you should be able to connect now)

D) Ok, at this point you should be able to fire up HBase, lets do the following:

/my/dir/hbase/bin/start-hbase.sh

Once started up type: /my/dir/hbase/bin/hbase shell this brings up the interactive console, sort of like the mysql console where you can directly interact with the database or in this case, not a database, but the HBase KV store. While on the console type status 'detailed'. If you get successful output we are good to go! Hbase is running, type exit to get back to the bash shell. Lets leave HBase running

MAVEN INTEGRATION

Ok, now we need to setup your Java classpath to include the HBase jars. They are all located at /my/dir/hbase/lib. If you are using Maven and want to get HBase configured in your project. You can use the following UN-OFFICIAL HBase Maven POM and deploy script listed below. These files were originally provided by Fivecloud's post located here and I upgraded them for HBase 0.20.0.

Deploy Script For Maven Dependancies

#! /bin/sh
#
# Deploy all HBase dependencies which are not available via the official
#	 maven repository at http://repo1.maven.org.
#
#
# This is for HBase 0.20.0
#
# Modified for HBase 0.20.0 from the original located at
# http://www.fiveclouds.com/2009/04/13/deploying-hbase-to-your-local-maven-repo/
#
# The maven repository to deploy to.
#

REPOSITORY_URL=file:///$HOME/.m2/repository

if [ -z $HBASE_HOME ]; then
	echo "Error: HBASE_HOME is not set." 2>&1
	exit 1
fi

HBASE_LIBDIR=$HBASE_HOME/lib


# HBase
#
mvn deploy:deploy-file -Dfile=$HBASE_HOME/hbase-0.20.0.jar \
	-DpomFile=hbase.pom -Durl=$REPOSITORY_URL

#Hadoop
mvn deploy:deploy-file -DgroupId=org.apache -DartifactId=hadoop \
	-Dversion=0.20.0 -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/hadoop-0.20.0-plus4681-core.jar

#thrift
mvn deploy:deploy-file -DgroupId=com.facebook -DartifactId=thrift \
	-Dversion=r771587 -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/libthrift-r771587.jar

#apache commons cli
mvn deploy:deploy-file -DgroupId=commons-cli -DartifactId=commons-cli \
	-Dversion=2.0-SNAPSHOT -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/commons-cli-2.0-SNAPSHOT.jar
	
#zookeeper
mvn deploy:deploy-file -DgroupId=org.apache.hadoop -DartifactId=zookeeper \
	-Dversion=r785019-hbase-1329 -Dpackaging=jar -Durl=$REPOSITORY_URL \
	-Dfile=$HBASE_LIBDIR/zookeeper-r785019-hbase-1329.jar


# EOF

Unofficial "hbase.pom"

<?xml version="1.0" encoding="UTF-8"?>

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hbase</artifactId>
  <packaging>jar</packaging>
  <version>0.20.0</version>

  <name>Hadoop HBase</name>

  <dependencies>
  
    <dependency> 
      <groupId>org.apache.hadoop</groupId>
      <artifactId>zookeeper</artifactId>
      <version>r785019-hbase-1329</version>
    </dependency>
  
    <dependency>
      <groupId>commons-cli</groupId>
      <artifactId>commons-cli</artifactId>
      <version>2.0-SNAPSHOT</version>
    </dependency>
    <dependency>
      <groupId>commons-collections</groupId>
      <artifactId>commons-collections</artifactId>
      <version>3.2</version>
    </dependency>
    <dependency>
      <groupId>commons-httpclient</groupId>
      <artifactId>commons-httpclient</artifactId>
      <version>3.0.1</version>
    </dependency>
    <dependency>
      <groupId>commons-logging</groupId>
      <artifactId>commons-logging</artifactId>
      <version>1.1.1</version>
    </dependency>
    <dependency>
      <groupId>commons-math</groupId>
      <artifactId>commons-math</artifactId>
      <version>1.1</version>
    </dependency>

    <dependency>
      <groupId>org.apache</groupId>
      <artifactId>hadoop</artifactId>
      <version>0.20.0</version>
    </dependency>

    <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.13</version>
    </dependency>

    <dependency>
      <groupId>jetty</groupId>
      <artifactId>org.mortbay.jetty</artifactId>
      <version>5.1.4</version>
    </dependency>
    <dependency>
      <groupId>jline</groupId>
      <artifactId>jline</artifactId>
      <version>0.9.91</version>
    </dependency>
    <dependency>
      <groupId>com.facebook</groupId>
      <artifactId>thrift</artifactId>
      <version>r771587</version>
    </dependency>
    <dependency>
      <groupId>org.apache.lucene</groupId>
      <artifactId>lucene-core</artifactId>
      <version>2.2.0</version>
    </dependency>
    <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.15</version>
    </dependency>
    <dependency>
      <groupId>xmlenc</groupId>
      <artifactId>xmlenc</artifactId>
      <version>0.52</version>
    </dependency>
    <dependency>
	    <groupId>org.apache.geronimo.specs</groupId>
	    <artifactId>geronimo-j2ee_1.4_spec</artifactId>
	    <version>1.0</version>
	    <scope>provided</scope>
    </dependency>

  </dependencies>

	<repositories>
		<repository>
			<id>virolab.cyfronet.pl</id>
			<name>virolab.cyfronet.pl (used for commons-cli-2.0)</name>
			<url>http://virolab.cyfronet.pl/maven2</url>
		</repository>
	</repositories>

</project>

LETS ACTUALLY USE HBASE...

F) Now it's time to fire up a Java app to do some basic HBase operations. The examples below are simple tests that are setup to run in JUnit within Spring on my box, so you can ignore the method names and the @Test annotations, as the meat of the examples are in the method bodies.

CREATE A TABLE EXAMPLE

Ok, HBase is NOT an RDBMS but a Big Table implementation. Think of it as a giant Hashtable with more advanced features. However for the purposes of this example I will speak in terms of Rows/Columns etc which are similar in concept to that of a database and are what most folks are familiar with.

The MOST important thing is that you COPY the /my/dir/hbase/conf/*.xml default HBase configuration files to someplace on your classpath. These files can be customized, but for straight out of the box testing they work as-is. Just MAKE SURE they are on your classpath before starting as HBaseConfiguration instances look for them there.

HBaseConfiguration config = new HBaseConfiguration(); 
HBaseAdmin admin = null;
try {
	// HBaseAdmin is where all the "DDL" like operations take place in HBase
	admin = new HBaseAdmin(config);
} catch(MasterNotRunningException e) {
	throw new Exception("Could not setup HBaseAdmin as no master is running, did you start HBase?...");
}

if (!admin.tableExists("testTable")) {
	admin.createTable(new HTableDescriptor("testTable"));
	
	// disable so we can make changes to it
	admin.disableTable("testTable");
	
	// lets add 2 columns
	admin.addColumn("testTable", new HColumnDescriptor("firstName"));
	admin.addColumn("testTable", new HColumnDescriptor("lastName"));
	
	// enable the table for use
	admin.enableTable("testTable");

}

	
// get the table so we can use it in the next set of examples
HTable table = new HTable(config, "testTable");

After running the above code fire up the HBase shell /my/dir/hbase/bin/hbase shell and once up, type "list" at the hbase shell prompt, you should see your testTable listed! Yeah!

ADD A ROW TO THE TABLE

// lets put a new object with a unique "row" identifier, this is the key
// HBase stores everything in bytes so you need to convert string to bytes
Put row = new Put(Bytes.toBytes("myID"));

/* lets start adding data to this row. The first parameter
is the "familyName" which essentially is the column name, the second
parameter is the qualifier, think of it as a way to subqualify values
within a particular column. For now we won't so we just make the qualifier
name the same as the column name. The last parameter is the actual value
to store */

row.add(Bytes.toBytes("firstName"),Bytes.toBytes("firstName"),Bytes.toBytes("joe"));
row.add(Bytes.toBytes("lastName"),Bytes.toBytes("lastName"),Bytes.toBytes("smith"));

try {
	// add it!
	table.put(row);
} catch(Exception e) {
    // handle me!
}

Ok, now go back to the HBase shell and type count 'testTable' and you should get one record accounted for. Good to go!

GET A ROW FROM THE TABLE

// a GET fetches a row by it's identifier key
Get get = new Get(Bytes.toBytes("myID"));

Result result = null;
try {
	// exec the get
	 result = table.get(get);
} catch(Exception e) {
	// handle me
}

// not found??
if (result == null) {
	// NOT FOUND!
}

// Again the result speaks in terms of a familyName(column)
// and a qualifier, since ours our both the same, we pass the same
// value for both
byte[] firstName = Bytes.toBytes("firstName");
byte[] lastName = Bytes.toBytes("lastName");
byte[] fnameVal = result.getValue(firstName,firstName);
byte[] lnameVal = result.getValue(lastName,lastName);

System.out.println(new String(fnameVal) + " " + new String(lnameVal));

DELETE A ROW FROM THE TABLE

Delete d = new Delete(Bytes.toBytes("myID"));

try {
	table.delete(d);
} catch(Exception e) {
	// handle me
}

Now fire up the hbase shell again and type "count 'testTable'", you should now get zero rows.

OK, well I hope that helped you get up and running with some HBase basics!

Tagged , , ,

Integrating Restlet with Spring

For those of you out there who would like to get Restlet 2.0 (currently the M5) release up integrated with your existing Spring application, hopefully this post will be of some help. I recently had to do this and unfortunately the documentation related to Spring integration on the Restlet site is scattered across various docs and some of it appears out of date. What I am describing below worked with source code straight from the Restlet SVN trunk just before the M5 release so you should be good to go if you use the M5 release (JEE edition)

First off, I am assuming you have an existing web application with a web.xml file and are using Spring. Secondly I am just trying to give you some working web.xml and the corresponding Spring configuration to get up and running. I am not explaining the details of how Restlet works as you can find that on the Restlet site.

First you will want to make sure you have the Restlet JEE 2.0 M5 edition. Make sure you grab the JEE version and not the JSE version as the latter does not include the Spring integration extension. Once downloaded extract the ZIP to a location on your drive. The JEE zip package contains a ton of Restlet Jar files. The three we care about are org.restlet.jar, org.restlet.ext.spring.jar, org.restlet.ext.servlet.jar

If you are using Maven you can add the following repository and dependencies to your POM by using the repository instructions on the Restlet site. NOTE as of today: the Restlet repository currently does NOT have the M5 release up there so you are going to manually have to add the M5 jar to your repository by doing the following for each of the 3 jars.

mvn install:install-file -Dfile=/PATH/TO/RESTLET-m5-ZIP-EXTRACT-DIR/lib/org.restlet.jar -DgroupId=org.restlet -DartifactId=org.restlet -Dversion=2.0-SNAPSHOT-M5 -Dpackaging=jar

mvn install:install-file -Dfile=/PATH/TO/RESTLET-m5-ZIP-EXTRACT-DIR/lib/org.restlet.ext.spring.jar -DgroupId=org.restlet -DartifactId=org.restlet.ext.spring -Dversion=2.0-SNAPSHOT-M5 -Dpackaging=jar

mvn install:install-file -Dfile=/PATH/TO/RESTLET-m5-ZIP-EXTRACT-DIR/lib/org.restlet.ext.servlet.jar -DgroupId=org.restlet -DartifactId=org.restlet.ext.servlet -Dversion=2.0-SNAPSHOT-M5 -Dpackaging=jar

The above 3 commands will manually install the three Jars into your Maven repository. Next you can configure your POM to add the official Maven repository plus the dependencies to the 3 Jars you installed above. Note that the repository entry is sort of meaningless at this point because you manually installed the jars above. It is IMPORTANT that the version elements in your dependencies below MATCH exactly the versions you specified in the commands above!

	<repository>  
    	<id>maven-restlet</id>  
    	<name>Public online Restlet repository</name>  
    	<url>http://maven.restlet.org</url>  
	</repository>

	<dependency>
    	<groupId>org.restlet</groupId>
    	<artifactId>org.restlet</artifactId>
    	<version>2.0-SNAPSHOT-M5</version>
	</dependency>

	<dependency>
    	<groupId>org.restlet</groupId>
    	<artifactId>org.restlet.ext.spring</artifactId>
    	<version>2.0-SNAPSHOT-M5</version>
	</dependency>

	<dependency>
    	<groupId>org.restlet</groupId>
    	<artifactId>org.restlet.ext.servlet</artifactId>
    	<version>2.0-SNAPSHOT-M5</version>
	</dependency>

Ok, great. Next we need to configure your web.xml, open it up and add the following entries in the appropriate spots:

  	<servlet>
      	<servlet-name>myRESTApi</servlet-name>
      	<servlet-class>org.restlet.ext.spring.SpringServerServlet</servlet-class>
      	 <init-param>
                <param-name>org.restlet.component</param-name>
                 <!-- this value must match the bean id of the Restlet component you will configure in Spring (below) -->
                <param-value>restletComponent</param-value>
         </init-param>
  	</servlet>

  	<servlet-mapping>
        <servlet-name>myRESTApi</servlet-name>
        <url-pattern>/my/REST/api/*</url-pattern>
  	</servlet-mapping>

Now your web.xml is configured to take all requests to /my/REST/api/* and send those to a Restlet Component which you will wire up in your Spring configuration. So... bring up your applicationContext.xml or whatever you have it named and add the following entries:


<!-- our SpringComponent which binds us to the Restlet servlet configured above -->
<bean id="restletComponent" class="org.restlet.ext.spring.SpringComponent">
         <!-- the defaultTarget for this component is our Restlet Application -->
	<property name="defaultTarget" ref="myRestletApplication" />
</bean>

<!-- your Restlet application. This class extends "org.restlet.Application" -->
<bean id="myRestletApplication" class="my.restlet.MyRestletApplication">
         <!-- all requests to this Application will be sent to myPath2BeanRouter -->
	<property name="root" ref="myPath2BeanRouter"/>
</bean>

<!-- This router automagically routes requests to beans that extend org.restlet.resource.ServerResource or org.restlet.Restlet who's name starts with a "/" slash which matches the request-->
<bean name="myPath2BeanRouter" class="org.restlet.ext.spring.SpringBeanRouter"/>
 
<!-- This extension of org.restlet.resource.ServerResource bean will handle all requests to made to /my/REST/api/myResource (GET/POST/PUT etc) 
This class extends "org.restlet.Restlet" or "org.restlet.resource.ServerResource" -->
 <bean name="/myResource" autowire="byName" scope="prototype" 
    		class="my.restlet.package.resources.MyResourceBean">
    		
    	<property name="somePropertyOfMine" ref="someOtherSpringBean"/>
 </bean>

Ok, well if you were having trouble trying to get Spring working with Restlet I hope this helped get you rolling. Restlet is a cool project that works great and can get a REST API up and running pretty quickly (granted you are good at crawling through somewhat scattered documentation) Here are a few other links which you may want to reference:

Restlet 2.0 Extensions API
Restlet 2.0 JEE API

Also, I am posting the following error messages that troubled me when trying to get this to work. The configuration I show above was the result of getting beyond the below errors by using the correct fixed releases. To AVOID the errors below, ENSURE you are using Restlet 2.0 M5 or a custom build from the trunk. Prior to 9/25/09 people were getting the errors below.

Message ID
No target class was defined for this finder
Complete Message org.restlet.ext.spring.SpringFinder $$ EnhancerByCGLIB

Tagged , , ,
Follow

Get every new post delivered to your Inbox.