Came across this creative presentation on Redis: http://www.slideshare.net/JustinCarmony/blazing-data-with-redis-20
Came across this creative presentation on Redis: http://www.slideshare.net/JustinCarmony/blazing-data-with-redis-20
Recently I was working on implementing a custom IAuthenticator and IAuthority for Cassandra 1.1.1 because really there is not much/any security out of the box. For those of you familiar with Cassandra, its distribution used to include a simple property file based implementation of the IAuthentication and IAuthority that you could reference in your cassandra.yaml file however they removed them from the main distribution and placed them under the examples/ section due to weak security concerns. They are a decent starting point to reference when building your own implementations however they are not recommended for real production use; hence why I started to implement my own.
Doing this, I came across a situation trying to use the Netflix Astyanax client API to talk to Cassandra, while Cassandra was running with th custom IAuthenticator and IAuthorities that I made. When testing the initializations of connections to Cassandra, while specifying invalid credentials (intentionally) instead of seeing some sort of AuthenticationException dumped to my client Astyanax log file, I was getting “PoolTimeoutException“s instead…. which was odd. I scratched my head on this for a while as I cloned Astyanax from GitHub and began digging into the source. I suspected that the Thrift AuthenticationException might be supressed somewhere…. well after reading the source, I realized it wasn’t being suppressed per-say, but rather sent to Astyanax’s ConnectionPoolMonitor, which is something you can configure programatically when you are defining your client code’s AstyanaxContext object which manages all connectivity to Cassandra. Out of the box Astyanax ships with a few ConnectionPoolMonitor implementations, one is the CountingConnectionPoolMonitor (does no logging, just collects stats) and the second is the Slf4jConnectionPoolMonitorImpl (logs to SLF4J). Depending on which one you specify in your context’s configuration you may or may not see AuthenticationException information in your client’s logs/console.
In my case, I was specifying the CountingConnectionPoolMonitor which was receiving the AuthenticationException, but not doing anything with it other than incrementing some counter, effectivly hiding it from me. The pool ran out of connections (could not create any) and the code waiting on getting a connection just threw a PoolTimeoutException, adding to my confusion.
To correct this, as I was using Log4J, I just created a custom ConnectionPoolMonitor which logged everything to Log4J instead. (@see Astyanax’s SLF4J monitor implementation as an example for how to create one for Log4j) See below for how to specify the monitor. Creating your own ConnectionPoolMonitor implementation is easy and pretty self explanatory.
Below is an example of setting up an AstyanaxContext and how you specify the ConnectionPoolMonitor that should be used. Once I used the correct monitor for my needs, I was able to see the true source of the PoolTimeoutExceptions (i.e. the AuthenticationExceptions) because now my monitor was logging them. (NOTE: the example below is just a test context, not something for a robust setup)
AstyanaxContext context = new AstyanaxContext.Builder() .forCluster(clusterName) .forKeyspace(keyspaceName) .withAstyanaxConfiguration(new AstyanaxConfigurationImpl() .setDiscoveryType(NodeDiscoveryType.NONE) ) .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl(clusterName+"-"+keyspaceName+"_CONN_POOL") .setPort(defaultConnectionPoolHostPort) .setInitConnsPerHost(1) .setMaxConnsPerHost(2) .setSeeds(connectionPoolSeedHosts) .setAuthenticationCredentials( new SimpleAuthenticationCredentials(new String(principal), new String(credentials))) ) .withConnectionPoolMonitor(new Log4jConnPoolMonitor()) .buildKeyspace(ThriftFamilyFactory.getInstance()); context.start();
Today I pushed up some source to Github for a utility I was previously working on to load data from USPS AIS data files into HBase/Mysql using Hadoop mapreduce and simpler data loaders. Source @ https://github.com/bitsofinfo/bitsofinfo.usps.ais
This project was originally started to create a framework for loading data files from the USPS AIS suite of data products (zipPlus4, cityState). The project has not been worked on in a while but I figured I’d open-source it and maybe some folks would like to team up to work on it further, if so let me know! Throwing it out there under the Apache 2.0 license. Some of the libs need updating etc as well, for instance it was originally developed w/ Spring 2.5.
USPS AIS data files are fixed length format records. This framework was created to handle bulk loading/updating this data into a structured/semi-structured data store of address data (i.e. MySql or HBase). It is wired together using Spring and built w/ Maven. A key package is the “org.bitsofinfo.util.address.usps.ais” package which defines the pojos for the records, and leverages a custom annotation which binds record properties to locations within the fixed length records which contain the data being loaded.
Initial loader implementations include both a single JVM multi-threaded version as well as a second one that leverages Hadoop Mapreduce to split the AIS files up across HDFS and process them in parallel using Hadoop mapreduce nodes to ingest the data much faster then just on one box. Both of these obviously operate asynchronously given a load job submission. Ingestion times are significantly faster using Hadoop.
This project also had a need for a Hadoop InputFormat/RecordReader that could read from fixed length data files (none existed), so I created it for this project (FixedLengthInputFormat). This was also contributed as a patch to the Hadoop project. This source is included in here and updated for Hadoop 0.23.1 (not yet tested), however the patch that was submitted to the Hadoop project is still pending and was compiled under 0.20.x. The 0.20.x version in the patch files was tested and functionally running on a 4 node Hadoop and Hbase cluster.
You can read more about the fixed length record reader patch @
https://issues.apache.org/jira/browse/MAPREDUCE-1176
The USPS AIS products have some sample data-sets available online at the USPS website, however for the full product of data-files you need to pay for the data and/or subscription for delta updates. Some of the unit-tests reference files from the real data-sets, they have been omitted, you will have to replace them with the real ones. Other unit tests reference the sample files freely available via USPS or other providers.
Links where USPS data files can be purchased:
https://www.usps.com/business/address-information-systems.htm
I recently started playing around with Redhat’s Openshift PaaS and installed the MongoDB and RockMongo cartridges on my application. My use case was just to leverage the Openshift platform to run my MongoDB instance for me, and I really was ready (nor needing) to push an actual application out to the application running @ openshift; instead I just wanted my local Java program to leverage the remote MongoDB instance, pump some data into it and then view it in Rockmongo (also running on the app at openshift).
Turns out you can enable this by enabling port forwarding locally on the computer you want to connect from. This is on OS-X:
ssh-add /path/to/your/openshift/ssh.key” This should be the key that you created when you initially signed up for Openshiftrhc port-forward -a [yourAppName] -l [yourOpenShiftLoginId]“, it will prompt you for your openshift credentials, then will likely output something like the below: Checking available ports...
Binding httpd -> 127.5.198.2:8080…
Binding mongod -> 127.5.198.1:27017…
Use ctl + c to stop
bind: Can’t assign requested address
channel_setup_fwd_listener: cannot listen to port: 27017
bind: Can’t assign requested address
channel_setup_fwd_listener: cannot listen to port: 8080
Could not request local forwarding.