Category: Design Patterns

Reactive re-engineering with Akka

Everyone once in a while during the life cycle of any given piece of software comes that time where you have the opportunity to improve it in a major way….if that is, its lucky enough to still be in production.

One particular system I’ve been involved with is responsible for processing a lot of data and keeping that data in sync across many systems. For purposes of this little case study I’ve dumbed down the overall use-case, concept, architecture and implementation details to this simple idea. We need to synchronize data.

Use-Case

Something in the environment (i.e. a user or other process) makes a request for some operation to be done that generates a change operation against a “DataEntry”. This DataEntry is manipulated in the primary database and then the change needs to be synchronized numerous other systems to count. The changes could be described as “create DataEntry item number XYZ”, “Mutate DataEntry XYZ in fashion Z” or simply “Delete DataEntry item XYZ”.

Each target system where a DataEntry is to be synchronized is called a DataStore and involves its own complicated process of mutating our representation of a DataEntry into the target DataStore’s representation and the means to do it can vary wildly; i.e. web-service calls, RDBMS dml, nosql operations etc etc. Not to mention, as with any integration, each of these DataStore sync calls has the possibility being fast, very slow, not working at all, or experiencing random transient failures.

Version 1

For most of its life the system functioned as follows, each DataEntry mutated in the system was placed in a queue, and then processed by a consumer node’s DataSyncProvider who’s responsibility is to determine all the DataStores to process the DataEntry in via interrogating a DataStoreLocator and then make sure it happens.  It worked similar to the diagrams below (highly simplified!), and note the bottleneck.

Screen Shot 2016-03-19 at 5.37.13 PM

Version 1, synchronization flow, within one node

Screen Shot 2016-03-19 at 5.38.55 PM

Version 1. Overall cluster view

Version 1 issues

Version 1 functioned fine for most of its life, however the biggest issues with is were simply its lack of efficiency and speed in synchronizing any given DataEntry across all of the DataStores it was applicable for. More often than not any given DataEntry mutation would result in dozens of target DataStores that it needed to be synchronized against. Due to the sequential processing of each DataStore, accommodating for retries, and waiting for the overall result….before moving on to the next one, this would result in a sizable delay until the mutation materialized in all target DataStores. (not to mention lack of good core utilization across the cluster). What did this mean? Well an opportunity for improvement.

Version 2

Obviously, the choice here was to move to asynchronous parallel DataStore execution and decoupling from the main DataEntry mutation consumer thread(s)….. and there are many ways you could go about doing that. Fortunately the overall modeling of the synchronization engine enabled considerably flexibility in swapping out the implementation with a little refactoring. The key points being introducing the concept of a DataEntry logic execution engine; aptly named LogicExecutionEngine and adding a new implementation of our DataStoreLocator that could decouple any given DataStore’s location from any dependency on its actual residency within the local JVM.

Great. Now that the modeling is adjusted, what about implementation? For one, there was no interest it writing a multi-threaded execution engine, even though one could with the modeling in place; any implementation could have been be developed and plugged in. That said, after looking around for a good framework that provided location transparency, parallel execution management, clustering and good resiliency, it was decided that Akka, and moving to an Actor model for the new engine would be a good fit.

Screen Shot 2016-03-20 at 11.44.57 AM.png

Version 2. Actor based DataStore’s and LogicExecutionEngine

As shown above, the DataStores actually are now implemented via an ActorRef version which is then passed to the LogicExectionEngine who’s new Actor based implementation injects them into yet another Actor for the DataEntry logic processing awaiting a Future<Result>. This model increased overall execution time to completion by roughly 80% as everything now executed in parallel.

Another benefit was additional resiliency and distribution of load due to the location transparency of the actual DataStore itself. Utilizing Akka’s various Routers, such as in this case the ClusterRouterGroup Actor, we were able to further redistribute the processing of any given DataStore workload across the cluster and appropriately react as nodes came on and offline. See exploded view below.

Screen Shot 2016-03-20 at 11.22.38 AM

Version 2. Exploded view of DataStore location transparency

Lastly, the diagram below shows how execution of these DataEntry tasks is now more evenly distributed across the entire set of available nodes in the cluster. All nodes can now be potentially involved in processing any DataEntry workload. Also by feeding dynamic configuration into the construction of each ClusterRouterGroup Actor the system could also fine tune the distribution and amount of Actors in the cluster that are available to process entries targeted at any given DataStore. This permits for custom down-scaling based on the limitations or load ceilings that any given downstream target DataStore may present. In other words it permits throttling of loads.

 

Screen Shot 2016-03-19 at 6.02.24 PM.png

Version 2. Better utilization of core resources across cluster

Overall my experience with Akka was positive. After working some of the bugs out, so far in production this solution has been quite stable and Akka’s clustering protocol quite stable. If you are considering moving to a more reactive design approach for the back end of a system, I highly recommend giving Akka a consideration.

Lastly, as always I highly recommend going with a pure interface oriented design in any system you build. In this use-case, this system’s entire platform itself, having been designed from the ground up using interfaces extensively and then plugging in different “providers” (i.e. things like Camel or Akka) for each aspect of implementation has proved out to be very important as it has evolved over time. This gives the system tremendous flexibility as it matures over time and additional longevity.

Book review: Building Microservices

Screen Shot 2015-04-06 at 10.11.15 PMRecently I read Sam Newman’s “Building Microservices” , at ~280 pages its a fairly quick read. The reviews on this book overall are mixed and I can see where readers are coming from. By the title of this book one might expect some coverage of some of the microservices frameworks out there, concrete examples, maybe some actual code… but you won’t really find that here. Instead you will find a pretty good overview of various architectural approaches to modern application design in today’s world; covering general topics such a proper separation of concerns, unit-testing, continuous integration, automation, infrastructure management, service discovery, fault tolerance, high-availability and security etc.

In reality, none of the principles covered in this book are the exclusive domain of “microservice” application architectures, but rather can (and should be) applied to any application you are considering deploying; whether its a “monolithic” application or a suite of microservices interacting as parts of a larger functioning application.

In that right I think this book is definitely a good read and worth a look, if for nothing more than to ensure your team gets a refresher on good design principles and how they can be materialized with some of the newer frameworks and tool sets that have come out of our community in recent years. The material presented is sound.

Review: Real World Java EE Patterns

I picked this book up a while back after looking for a book on Java EE patterns. The mainstream standard seemed to be the Core J2EE Patterns book, but the more I looked at it it just seemed outdated. So when I found Adam’s book just the title looked practical “Rethinking Best Practices”. So in short I’ve finally read it and was quite impressed. This is a great book. The author basically goes through the standard patterns and tackles each one by explaining its overall objective, the forces at play, a bit how it works, THEN… the big part how to re-think it in the context of Java EE ejb3/3.1. For example the author does a good job explaining how DAO’s are now simply replaced by JPA’s EntityManager and everyone should really re-consider if they still need this abstraction layer in green field projects. Although he does admit that they can still serve a purpose as a place to consolidate boilerplate, common EntityManager related code. Regardless, pattern by pattern, the author does a great job giving real world examples of how these standard patterns can be modified or adapted to the ejb3 realm. (He also covers which J2EE patterns can now be retired). One of my favorite parts of the book which gave me a great glimpse on how to tackle an immediate problem I am faced with was “Dependency Injection Extender” pattern. This will come in use for me as we have a Spring codebase which we will want to utilize in a JBoss environment. I’d like to use @Inject and JSR330 but our container (JBoss 5) does not yet support that. So by using this pattern with interceptors I think we will be able to annotate our beans, test them outside of JBoss, yet still wire everything up properly using this idea presented in the book.

Downsides to the book? Some of the headings, intros to subsections are not bolded in the text when it appears they should have been.

Overall, 5 out of 5 for this book. I recommend it! You can tell this was written by someone with a ton of experience under their belt.

Review: 97 Things Every Software Architect Should Know

97This is a book review for “97 Things Every Software Architect Should Know” by O’Reilly with dozens of contributors.

This book is a quick read at roughly 200 pages and is targeted towards those folks who find themselves in the role of the “software architect”. Many of the contributors will be names you recognize such as Mike Nygard, Neal Ford and many others, some of which get more “booktime” than others with multiple blurbs each.

Each small contribution is no more than a page and a half as each author shares some personal experience, lesson learned or rule of thumb when it comes to being a architect. This may not be a book you sit down and read all at once, but maybe a few articles a day and move through it that way. I also think I might open it up again right before I start a brand new large project as this would be a good refresher on many concepts and methodologies which can help you get your mind in the right place before starting something new.

This book is not super technical in nature and is recommended for advanced developers or anyone who considers themselves to be an architect.

Recommended? Yes!

Review: Interface Oriented Design

iodThis is a review of “Interface Oriented Design” by Ken Pugh.

I don’t have a ton to say about this book, it was a fairly quick read at about 200 pages. If you have a lot of experience with OO, the concepts presented within it are nothing that startling, however they may serve the reader as a nice refresher or presentation of important principles in a “interface” oriented and focused approach. I think if anything, one handy thing that the author presented were the “laws” of any interface:

a) “Do what its methods say it does” : this is obviously important as how many times have you seen methods in a program that do what their name implies, but… sometimes add on a bit more undocumented functionality.

b) “Do no harm”: the implementation to should avoid affecting other modules or consuming resources that it does not explicitly declare it will etc.

c) “Provide meaningful errors”: I think this is often missing. Too many times I’ve called some method to get a horrible low-level exception thrown back at me, or a custom one with some un-intelligible error code. Rarely wil you get back something meaningful, like the author suggests which contains not only the problem, but why it occurred and how you can possibly avoid it on the next call.

Overall I found chapters 1-7 to be the most informative, while chapters 8-10 went through some examples which focused on the principles covered in 1-7 in working examples. Chapter 11 covers the standard GoF patterns but from a “interface” based perspective.

Overall, I would say that if you are pretty versed in OO you could get a way with skipping this book. Otherwise if you are intermediate or new to OO and the proper use of interfaces as opposed to constantly using inheritance, this might be a good book for you.

Lastly I will say that after reading this book, when faced with new design decisions, I found myself focusing much, much more on pure interface based designs and minimizing inheritance except where absolutely necessary.