Tagged: architecture

Reactive re-engineering with Akka

Everyone once in a while during the life cycle of any given piece of software comes that time where you have the opportunity to improve it in a major way….if that is, its lucky enough to still be in production.

One particular system I’ve been involved with is responsible for processing a lot of data and keeping that data in sync across many systems. For purposes of this little case study I’ve dumbed down the overall use-case, concept, architecture and implementation details to this simple idea. We need to synchronize data.


Something in the environment (i.e. a user or other process) makes a request for some operation to be done that generates a change operation against a “DataEntry”. This DataEntry is manipulated in the primary database and then the change needs to be synchronized numerous other systems to count. The changes could be described as “create DataEntry item number XYZ”, “Mutate DataEntry XYZ in fashion Z” or simply “Delete DataEntry item XYZ”.

Each target system where a DataEntry is to be synchronized is called a DataStore and involves its own complicated process of mutating our representation of a DataEntry into the target DataStore’s representation and the means to do it can vary wildly; i.e. web-service calls, RDBMS dml, nosql operations etc etc. Not to mention, as with any integration, each of these DataStore sync calls has the possibility being fast, very slow, not working at all, or experiencing random transient failures.

Version 1

For most of its life the system functioned as follows, each DataEntry mutated in the system was placed in a queue, and then processed by a consumer node’s DataSyncProvider who’s responsibility is to determine all the DataStores to process the DataEntry in via interrogating a DataStoreLocator and then make sure it happens.  It worked similar to the diagrams below (highly simplified!), and note the bottleneck.

Screen Shot 2016-03-19 at 5.37.13 PM

Version 1, synchronization flow, within one node

Screen Shot 2016-03-19 at 5.38.55 PM

Version 1. Overall cluster view

Version 1 issues

Version 1 functioned fine for most of its life, however the biggest issues with is were simply its lack of efficiency and speed in synchronizing any given DataEntry across all of the DataStores it was applicable for. More often than not any given DataEntry mutation would result in dozens of target DataStores that it needed to be synchronized against. Due to the sequential processing of each DataStore, accommodating for retries, and waiting for the overall result….before moving on to the next one, this would result in a sizable delay until the mutation materialized in all target DataStores. (not to mention lack of good core utilization across the cluster). What did this mean? Well an opportunity for improvement.

Version 2

Obviously, the choice here was to move to asynchronous parallel DataStore execution and decoupling from the main DataEntry mutation consumer thread(s)….. and there are many ways you could go about doing that. Fortunately the overall modeling of the synchronization engine enabled considerably flexibility in swapping out the implementation with a little refactoring. The key points being introducing the concept of a DataEntry logic execution engine; aptly named LogicExecutionEngine and adding a new implementation of our DataStoreLocator that could decouple any given DataStore’s location from any dependency on its actual residency within the local JVM.

Great. Now that the modeling is adjusted, what about implementation? For one, there was no interest it writing a multi-threaded execution engine, even though one could with the modeling in place; any implementation could have been be developed and plugged in. That said, after looking around for a good framework that provided location transparency, parallel execution management, clustering and good resiliency, it was decided that Akka, and moving to an Actor model for the new engine would be a good fit.

Screen Shot 2016-03-20 at 11.44.57 AM.png

Version 2. Actor based DataStore’s and LogicExecutionEngine

As shown above, the DataStores actually are now implemented via an ActorRef version which is then passed to the LogicExectionEngine who’s new Actor based implementation injects them into yet another Actor for the DataEntry logic processing awaiting a Future<Result>. This model increased overall execution time to completion by roughly 80% as everything now executed in parallel.

Another benefit was additional resiliency and distribution of load due to the location transparency of the actual DataStore itself. Utilizing Akka’s various Routers, such as in this case the ClusterRouterGroup Actor, we were able to further redistribute the processing of any given DataStore workload across the cluster and appropriately react as nodes came on and offline. See exploded view below.

Screen Shot 2016-03-20 at 11.22.38 AM

Version 2. Exploded view of DataStore location transparency

Lastly, the diagram below shows how execution of these DataEntry tasks is now more evenly distributed across the entire set of available nodes in the cluster. All nodes can now be potentially involved in processing any DataEntry workload. Also by feeding dynamic configuration into the construction of each ClusterRouterGroup Actor the system could also fine tune the distribution and amount of Actors in the cluster that are available to process entries targeted at any given DataStore. This permits for custom down-scaling based on the limitations or load ceilings that any given downstream target DataStore may present. In other words it permits throttling of loads.


Screen Shot 2016-03-19 at 6.02.24 PM.png

Version 2. Better utilization of core resources across cluster

Overall my experience with Akka was positive. After working some of the bugs out, so far in production this solution has been quite stable and Akka’s clustering protocol quite stable. If you are considering moving to a more reactive design approach for the back end of a system, I highly recommend giving Akka a consideration.

Lastly, as always I highly recommend going with a pure interface oriented design in any system you build. In this use-case, this system’s entire platform itself, having been designed from the ground up using interfaces extensively and then plugging in different “providers” (i.e. things like Camel or Akka) for each aspect of implementation has proved out to be very important as it has evolved over time. This gives the system tremendous flexibility as it matures over time and additional longevity.

Review: 97 Things Every Software Architect Should Know

97This is a book review for “97 Things Every Software Architect Should Know” by O’Reilly with dozens of contributors.

This book is a quick read at roughly 200 pages and is targeted towards those folks who find themselves in the role of the “software architect”. Many of the contributors will be names you recognize such as Mike Nygard, Neal Ford and many others, some of which get more “booktime” than others with multiple blurbs each.

Each small contribution is no more than a page and a half as each author shares some personal experience, lesson learned or rule of thumb when it comes to being a architect. This may not be a book you sit down and read all at once, but maybe a few articles a day and move through it that way. I also think I might open it up again right before I start a brand new large project as this would be a good refresher on many concepts and methodologies which can help you get your mind in the right place before starting something new.

This book is not super technical in nature and is recommended for advanced developers or anyone who considers themselves to be an architect.

Recommended? Yes!

Review: Release it!

releaseit This is a review of the book “Release it!” by Michael T. Nygard

Go get this book. That is the short of this review. This is an excellent book, mainly targeted towards architects however I think developers should read it too. The author does an great job throughout the book showing how small issues can grow into gigantic problems and cost your project/business some big $$$, when in reality, had the developer or architect paid a bit more attention to certain pieces of the application, it all could have been avoided. The author shares many of his personal experiences related to this subject.

The two best parts of the book where the coverage of stability issues (patterns and anti-patterns), and the capacity planning section. The final two main sections of the book cover general design issues and operations (monitoring, logging, metrics etc).

Recommended: Yes
Skill levels: Any