Category: Java

Docker container peer IP discovery with Swarm services


When running applications deployed as a docker “service” within Docker Swarm (post 1.12 swarm mode), often those application’s have the need to discover their peer containers for purposes such as forming a cluster. Similar to “service discovery”, I’ll just use the term “peer discovery” for this use-case.

The simple example is as follows: You create a swarm service bound to a docker network “N” and launch/scale X containers over it. How do those container instances find one another on the docker network?

As it stands today, simple dynamic discovery of peer container instances within a named swarm “service” residing on a particular docker network is not trivial. The Docker engine’s service API, can provide the information you need to figure this out, however this still puts a lot of work on you, the container developer who needs to collect this info and then act upon it in order to form a higher level cluster between your containers.

I had this specific challenge for a Java based service that needed to form a peer cluster dynamically. Out of that use case I came up with a generic library that you can drop into your Java container application called docker-discovery-swarm-service which is available at:

The purpose of this library is for “self-discovery” from within your JVM based Docker service application where you need to discover what your accessible docker service network bound IP is, as well as that of your peers within the same named docker service. As noted above this is critical if your container has to do further peer discovery for other services it provides or clustering groups it must form.

You can read all the details of how it works and how to use it here:

Hopefully it will be of use to you as well.


Microservices with Spring Cloud & Docker

In the recent past, a team I was working with was facing an architectural decision regarding what technology and deployment footprint to go with for a greenfield project.

Its been about five months now since this application has been in production.

Use case:

The use-case in question was to present a suite of REST services to front a large set of “master data” dimensions for a data warehouse as well securing that data (record level ACLs). In addition to this, the security ACLs it would manage needed to be transformed and pushed downstream to various legacy systems.

With that general use-case in mind, some other items of note that were to be considered:

  • The specific use case of “REST services facade for master data” was generic in nature, so the security model was to be agnostic of the specific data set being secured and have the capability of being applied across different data sets for different clients.
  •  Changes for a given service should be easy to fix and deploy and independent of one another with minimal interruption.
  • The services need to scale easily and be deployed across several different data centers which are a mix of traditional bare-metal/ESX vm’s as well as in the cloud (azure/aws). Tight coupling to each DC should be minimized when possible.
  • The services stack would potentially serve as the hub for orchestrating various other ETL related processes for the warehouse, so adding new “services” should be easy to integrate into the larger application.
  • Given the sensitivity of the data, all traffic should be secured w/ TLS and REST apis locked down w/ OAuth2 client credentials based access.

Given the above requirements and much discussion we decided to go with a container based microservices architecture.


First off, this team already had significant experience w/ the traditional monolithic approach to applications and had already run into the many shortcomings of this architecture over the long term. As new features needed to be deployed, it was becoming more of a pain to add new “services” to the monolith as it required the entire stack to be redeployed which is disruptive. Given this new application would have a similar lifecycle (new services needing to be added over time) we wanted to try a different approach…. and who was the new kid on the block? “microservices”; and it was time to get one’s feet wet.

This shop was primarily focused on NodeJS, LAMP and Java stacks so after doing some research the decision was made to go with Spring Cloud as the base framework to build this new suite of services. If one does any reading on the topic of microservices, you will quickly see such architectures involve many moving parts: service discovery, configuration, calling tracing (i.e. think google dapper), load balancing etc.

Do you want to write these pattern implementations this all yourself? Probably not; I sure didn’t. So after evaluating the space at the time, Spring Cloud was the most robust solution for this and one of its biggest selling points is that it was based on many of the great frameworks that have come out of Netflix’s OSS project (Eureka, Hystrix and more..)

Lastly the decision to go w/ Docker was really a no brainer. The services would potentially need to be deployed and moved across various data centers. By using Docker DevOps would be able to have a footprint and deployment process that would be consistent regardless of what data center we would be pushing to. The only data center specific particulars our DevOps guys had to care about was, setting up the Docker infrastructure (i.e. think Docker hosts on VMs via Ansible coupling to DC specific host provisioning APIs) and the DC specific load balancers, who’s coupling to the application was just a few IP’s and ports (i.e. the IPs of the swarm nodes with exposed ports of our Zuul containers). Everything downstream from that was handled by Docker Swarm and the microservices framework itself (discovery, routing etc)


The acronym for this services backend ended up being CELL which stands for… well whatever you want it to stand for…. I guess think of it (the app) as an organism made up of various cells (services). CELL’s services are consumed by various applications that present nice user interfaces to end users.

Screen Shot 2017-05-16 at 1.34.04 PM

The above diagram gives a high level breakdown of its footprint. Its broken up into several services:

Core services that all other app services utilize:

  • cell-discovery: Netflix Eureka: Participating services both register on startup and use this to discover the cell-config service (to bootstrap themselves) plus discover any other peer level services they need to talk to.
  • cell-config: spring-cloud-config: Git sourced application configuration (w/ encryption support). Each application connects to this on startup to configure itself.
  • oauth2-provider: All services are configured w/ an OAuth2 client credentials compliant token generation endpoint to authenticate and get tokens that all peer services validate (acting as resource servers)
  • tracing-service: zipkin: All services are instrumented w/ hooks that decorate all outbound http requests (and interpret them upon reception) with zipkin compliant tracing headers to collect call tracing metrics etc. Background threads send this data periodically to the tracing service.
  • cell-event-bus: kafka and spring-cloud-stream: Certain services publish events that other services subscribe to to maintain local caches or react to logic events. This provides a bit looser coupling than direct service to service communication; leveraging Kafka gives us the ability to take advantage of such concepts of consumer groups for different processing requirements. (i.e. all or one)
  • cell-router: Netflix zuul: Router instances provide a single point of access to all application services under a https://router/service-name/ facade (discovered via the discovery service). Upstream data center specific FQDN bound load balancers only need to know about the published ports for the Zuul routers on the swarm cluster to be able to access any application service that is available in CELL.
  • cell-service-1-N: These represent domain specific application services that contain the actual business logic implementation invoked via external callers. Over time, more of these will be added to CELL and this is where the real modularity comes into play. We try to stick to the principle of one specific service per specific business logic use-case.

CELL Security

As noted above, one of the requirements for CELL was that participating services could have data they manage, gated by a generic security ACL system. To fulfill this requirement, one of those domain specific apps is the cell-security service.

Screen Shot 2017-05-16 at 10.00.34 PM

The cell-security service leverages a common library that both cell-security servers and clients can leverage to fulfill both ends of the contract. The contract being defined via some general modeling (below) and standard server/client REST contracts that can easily be exposed in any new “service” via including the library and adding some spring @[secConfig] annotations in an app’s configuration classes.

  • Securable: a securable is something that can have access to it gated by a SecurityService. Securables can be part of a chain to implement inheritance or any strategy one needs.
  • Accessor: is something that can potentially access a securable
  • ACL: Binds an Accessor to a Securable with a set of Permissions for a given context and optional expression to evaluate against the Securable
  • SecurableLocator: given a securable‘s guid, can retrieve a Securable or a chain of Securables
  • AccessorLocator: given a accessor‘s guid, can retrieve the Accessor
  • AccessorLocatorRegistry: manages information about available AccessorLocators
  • SecurableLocatorRegistry: manages information about available SecurableLocators
  • ACLService: provides access to manage ACLs
  • PrincipalService: provides access to manage Principals
  • LocatorMetadataService: provides access to manage meta-data about Securable|Accessor Locators
  • ACLExpressionEvaluator: evaluates ACL expressions against a Securable
  • SecurityService:  Checks access to a Securable for a requesting Accessor

The model above is expressed via standard REST contracts and interfaces in code, that are to be fulfilled by a combination of default implementations and those customized by individual application CELL services who wish to leverage the security framework. There are also a few re-usable cell-security persistence libraries we created to let services that leverage this to their persist security data (both authoritative and local consumer caches) across various databases (Mongo DB and or JPA etc). As well a another library to hook into streams of security events that flow through CELL’s Kakfa event bus.

Spring Cloud impressions

When I started using Spring Cloud (in the early days of the Brixton release), I developed a love – hate relationship with it. After a few initial early successes with a few simple prototypes I was extremely impressed with the discovery, configuration and abstract “service name” based way of access peer services (via feign clients bound to the discovery services)…. you could quickly see the advantageous to using these libraries to really build a true platform that could scale to N in several different ways and take care of a lot of the boilerplate “microservices” stuff for you.

That said, once we really got into the developing CELL we ended up having two development paths.

The first being one team working on creating a set of re-usable libraries for CELL applications to leverage and integrate into the CELL microservice ecosystem. This consisted of creating several abstractions that would bring together some of the required spring cloud libraries, pre-integrated via base configuration for CELL, and just make it easier to “drop-in” to a new CELL app without having to wade into the details of spring cloud too much and just let the service developer focus on their service. The amount of time on this part was about 70% of the development effort, heavily front loaded in the start of the project.

The second being the other team using the latter to actually build the business logic services, which was the whole point of this thing in the first place. This accounted for about 30% of the work in the beginning and today… about 80-90% of the work now that the base framework of CELL is established.

The hate part (well not true hate, but you know what I mean… friendly frustration) of this ended up being the amount of man hours spent in the start of the project dealing/learning spring-cloud. There is a tangible learning curve to be aware of. Working around bugs, finding issues in spring-cloud, both real ones or just working through perceived ones via misunderstandings due to the complexity of spring-cloud itself.

I’m not going to go into each specific issue here, however there were simply a lot of issues and time spent debugging spring cloud code trying to figure out why certain things failed or to learn how they behaved so we could customize and properly configure things. In the end most of the issues could be worked around or were not that hard to fix…. its just the time it took to figure out the underlying causation’s, produce a reproducible sample and then convey it to the spring-cloud developers to get help with. (The spring-cloud developers BTW are excellent and VERY responsive) kudos to them for that.

Lastly, taking each CELL artifact (jar) and getting it wrapped up in a Docker container was not an huge ordeal. In the deployed footprint, each CELL artifact is a separate Docker Swarm Service that is deployed on its own overlay network (separate one per CELL version). As stated previously, the CELL router (Zuul) is the only service necessary to be exposed on a published swarm port and then upstream datacenter load balancers can just point to that.

So would I recommend Spring-Cloud?

Yes. Spring Cloud at its heart is really an pretty impressive wrapper framework around a lot of other tools that are out there for microservices. It has a responsive and helpful community. (definitely leverage if you need help!) The project has matured considerably since I first used it and many of the issues I was dealing with are now fixed. Compared to writing all the necessary things to have a robust microservices ecosystem yourself….. I’ll take this framework any day.

Final note. I would NOT recommend using spring-data-rest. We used that on a few of the CELL application logic services and its main benefit of providing you a lot of CRUD REST services in a HATE-OS fashion…. its just not that easy to customize the behavior of, has a lot of bugs and just generally was a pain to work with. At the end of the day it would have just been easier to code our own suite of CRUD services instead of relying on it.



Docker container IP and port discovery with Consul and Registrator

Do you use Docker?

Does your containerized app have the need to discover both its own IP and one or more mapped ports?

How can another container access my exposed ports and how can I do the same of my peers?

As it stands today, simple self discovery of your container’s accessible IP and one or more of its mapped ports is not exposed to your Docker container process as a native feature of the engine itself.

If you’ve attempted to containerize an app that attempts to discover its peers in order to form its own peer-level cluster etc, you’ve likely run into this challenge.

That said there are several tools out there with can help you with this issue. One of which is Registrator which is a special container that listens for events from a Docker host and acts as service discovery bridge that relays this info into other tooling such as Consul and etcd etc.  In short, when your container is launched, the Registrator container collects all the info about the docker host it is running on and its exposed ports and registers this under a named service in one of the aforementioned backends.

This is all fine and great, however this still puts a lot of work on you, the container developer who needs to collect this info and then act upon it in order to form a higher level cluster between your containers.

I had this exact same problem for a Java based service that needed to form a Hazelcast cluster dynamically. Out of that use case I came up with a generic library that you can drop into your Java container application called docker-discovery-registrator-consul which is available at:

The purpose of this library is for “self-discovery” from within your JVM based Docker application where you need to discover what your accessible docker-host bound IP and mapped port(s) are, as well as your peers within the same service. As noted above this is critical if your container has to do further peer discovery for other services it provides or clustering groups it must form.

You can read all the details of how it works and how to use it here:

Hopefully it will be of use to you as well.

Reactive re-engineering with Akka

Everyone once in a while during the life cycle of any given piece of software comes that time where you have the opportunity to improve it in a major way….if that is, its lucky enough to still be in production.

One particular system I’ve been involved with is responsible for processing a lot of data and keeping that data in sync across many systems. For purposes of this little case study I’ve dumbed down the overall use-case, concept, architecture and implementation details to this simple idea. We need to synchronize data.


Something in the environment (i.e. a user or other process) makes a request for some operation to be done that generates a change operation against a “DataEntry”. This DataEntry is manipulated in the primary database and then the change needs to be synchronized numerous other systems to count. The changes could be described as “create DataEntry item number XYZ”, “Mutate DataEntry XYZ in fashion Z” or simply “Delete DataEntry item XYZ”.

Each target system where a DataEntry is to be synchronized is called a DataStore and involves its own complicated process of mutating our representation of a DataEntry into the target DataStore’s representation and the means to do it can vary wildly; i.e. web-service calls, RDBMS dml, nosql operations etc etc. Not to mention, as with any integration, each of these DataStore sync calls has the possibility being fast, very slow, not working at all, or experiencing random transient failures.

Version 1

For most of its life the system functioned as follows, each DataEntry mutated in the system was placed in a queue, and then processed by a consumer node’s DataSyncProvider who’s responsibility is to determine all the DataStores to process the DataEntry in via interrogating a DataStoreLocator and then make sure it happens.  It worked similar to the diagrams below (highly simplified!), and note the bottleneck.

Screen Shot 2016-03-19 at 5.37.13 PM

Version 1, synchronization flow, within one node

Screen Shot 2016-03-19 at 5.38.55 PM

Version 1. Overall cluster view

Version 1 issues

Version 1 functioned fine for most of its life, however the biggest issues with is were simply its lack of efficiency and speed in synchronizing any given DataEntry across all of the DataStores it was applicable for. More often than not any given DataEntry mutation would result in dozens of target DataStores that it needed to be synchronized against. Due to the sequential processing of each DataStore, accommodating for retries, and waiting for the overall result….before moving on to the next one, this would result in a sizable delay until the mutation materialized in all target DataStores. (not to mention lack of good core utilization across the cluster). What did this mean? Well an opportunity for improvement.

Version 2

Obviously, the choice here was to move to asynchronous parallel DataStore execution and decoupling from the main DataEntry mutation consumer thread(s)….. and there are many ways you could go about doing that. Fortunately the overall modeling of the synchronization engine enabled considerably flexibility in swapping out the implementation with a little refactoring. The key points being introducing the concept of a DataEntry logic execution engine; aptly named LogicExecutionEngine and adding a new implementation of our DataStoreLocator that could decouple any given DataStore’s location from any dependency on its actual residency within the local JVM.

Great. Now that the modeling is adjusted, what about implementation? For one, there was no interest it writing a multi-threaded execution engine, even though one could with the modeling in place; any implementation could have been be developed and plugged in. That said, after looking around for a good framework that provided location transparency, parallel execution management, clustering and good resiliency, it was decided that Akka, and moving to an Actor model for the new engine would be a good fit.

Screen Shot 2016-03-20 at 11.44.57 AM.png

Version 2. Actor based DataStore’s and LogicExecutionEngine

As shown above, the DataStores actually are now implemented via an ActorRef version which is then passed to the LogicExectionEngine who’s new Actor based implementation injects them into yet another Actor for the DataEntry logic processing awaiting a Future<Result>. This model increased overall execution time to completion by roughly 80% as everything now executed in parallel.

Another benefit was additional resiliency and distribution of load due to the location transparency of the actual DataStore itself. Utilizing Akka’s various Routers, such as in this case the ClusterRouterGroup Actor, we were able to further redistribute the processing of any given DataStore workload across the cluster and appropriately react as nodes came on and offline. See exploded view below.

Screen Shot 2016-03-20 at 11.22.38 AM

Version 2. Exploded view of DataStore location transparency

Lastly, the diagram below shows how execution of these DataEntry tasks is now more evenly distributed across the entire set of available nodes in the cluster. All nodes can now be potentially involved in processing any DataEntry workload. Also by feeding dynamic configuration into the construction of each ClusterRouterGroup Actor the system could also fine tune the distribution and amount of Actors in the cluster that are available to process entries targeted at any given DataStore. This permits for custom down-scaling based on the limitations or load ceilings that any given downstream target DataStore may present. In other words it permits throttling of loads.


Screen Shot 2016-03-19 at 6.02.24 PM.png

Version 2. Better utilization of core resources across cluster

Overall my experience with Akka was positive. After working some of the bugs out, so far in production this solution has been quite stable and Akka’s clustering protocol quite stable. If you are considering moving to a more reactive design approach for the back end of a system, I highly recommend giving Akka a consideration.

Lastly, as always I highly recommend going with a pure interface oriented design in any system you build. In this use-case, this system’s entire platform itself, having been designed from the ground up using interfaces extensively and then plugging in different “providers” (i.e. things like Camel or Akka) for each aspect of implementation has proved out to be very important as it has evolved over time. This gives the system tremendous flexibility as it matures over time and additional longevity.

Hazelcast discovery with Etcd

I’ve used Hazelcast for years and have generally relied upon the availability of multicast for Hazelcast cluster discovery and formation (within a single data-center). Recently was faced with two things, expand the footprint into a non-multicast enabled data-center and secondly pre-prep the service for containerization where nodes will come and go as scaling policies dictate it…. hardwired Hazelcast clustering via an XML configuration and/or reliance on multicast is a no-go.

With Hazelcast 3.6, they now support a pluggable implementation for a cluster discovery mechanism called the Discovery SPI. (Discovery Strategy) Perfect timing, given we are already playing with Etcd as part of our Docker container strategy, this was an opportunity to let our application’s native clustering mechanism (coded on top of Hazelcast) to leverage Etcd as well as discover/remove peers both within, and potentially across data-centers.

So I coded up hazelcast-etcd-discovery-spi available on GitHub.


This works with Hazelcast 3.6-EA+ and Etcd to provide (optional) automatic registration of your hazelcast nodes as Etcd services and automatic peer discovery of the Hazelcast cluster.

Note that the automatic registration of each hazelcast instance as a Etcd service is OPTIONAL if you want to manually maintain these key-paths in etcd. I added that in simply because I think it will be convenient for folks, especially when containerizing a Hazelcast enabled app (such as via Docker) where the less “dependencies” and manual things to do (i.e. register your hazelcast nodes manually).. the better. You can totally embedded this functionality with this discovery strategy SPI.

I hope others find this helpful, and please leave your feedback, pull-requests or issues on the project!

NOTE, if you are running your app in Docker you have a separate issue where you need to determine your own externally accessible IP/PORT that the docker host has mapped for you on 5701… well how can you determine that so that you can publish the correct IP/PORT info to Etcd? Check out:

NOTE! Interested in consul? There is a separate project which is built around Consul for your discovery strategy located here:


Hazelcast discovery with Consul

I’ve used Hazelcast for years and have generally relied upon the availability of multicast for Hazelcast cluster discovery and formation (within a single data-center). Recently was faced with two things, expand the footprint into a non-multicast enabled data-center and secondly pre-prep the service for containerization where nodes will come and go as scaling policies dictate it…. hardwired Hazelcast clustering via an XML configuration and/or reliance on multicast is a no-go.

With Hazelcast 3.6, they now support a pluggable implementation for a cluster discovery mechanism called the Discovery SPI. (Discovery Strategy) Perfect timing, given we are already playing with Consul as part of our Docker container strategy, this was an opportunity to let our application’s native clustering mechanism (coded on top of Hazelcast) to leverage Consul as well as discover/remove peers both within, and potentially across data-centers.

So I coded up hazelcast-consul-discovery-spi available on GitHub.


This works with Hazelcast 3.6-EA+ and Consul to provide automatic registration of your hazelcast nodes as Consul services (without having to run a local Consul agent) and automatic peer discovery of the Hazelcast cluster.

Note that the automatic registration of each hazelcast instance as a Consul service is OPTIONAL if you already have Consul agents running that define your Hazelcast service nodes. I added that in simply because I think it will be convenient for folks, especially when containerizing a Hazelcast enabled app (such as via Docker) where the less “dependencies” like a Consul agent available on the host, or in the container (or another container).. the better. You can totally embedded this functionality with this discovery strategy SPI.

I hope others find this helpful, and please leave your feedback, pull-requests or issues on the project!

NOTE, if you are running your app in Docker you have a separate issue where you need to determine your own externally accessible IP/PORT that the docker host has mapped for you on 5701… well how can you determine that so that you can publish the correct IP/PORT info to Consul? Check out:

NOTE! Interested in etcd? There is a separate project which is built around etcd for your discovery strategy located here:


Is the Jasypt project dead?

I’m posting this because I don’t know where else to ask…. is the Jasypt project dead? I ask this because I have no idea where to ask questions or get community support for this Java JCE encryption library. The mailing lists seem dormant, the “forums” do not load and the issue tracker seems pretty idle….

Is anyone using this library anymore, does a support community still exist?

UPDATE: (from the project maintainer)

Jasypt is DEFINITELY alive! Last version was released on February 25th, 2014.

Both the mailing lists and the forums were removed because of very low traffic, in favour of the SourceForge tracking system you are now using.

As for users… figures indicate almost 60,000 downloads in July 2014.

Generating Java classes for the Azure AD Graph API

NOTE: I’ve since abandoned this avenue to generate pojos for the GraphAPI service. The Restlet Generator simply has too many issues in the resulting output (i.e. not handling package names properly, generics issues, not dealing with Edm.[types] etc). However this still may be of use to someone who wants to explore it further

Recently had to write some code to talk to the Azure AD Graph API. This is a REST based API that exchanges data via typical JSON payloads. For those having to a Java client to talk to this, a good starting point is taking a look at this sample API application to get your feet wet. However to those familiar in Java, this code is less that desirable;  however it has no dependencies which is nice.

When coding something against a REST service its often nice to have a set of classes that you can marshal to/from the JSON payloads to you are interacting with.  Behind the scenes it appears that this Azure Graph API is an OData app, which does present “$metadata” about itself… cool! So now we can generate some classes…..$metadata


So what can we use to generate some Java classes against this? Lets use the Restlet OData Extension. This Restlet extension can generate code off OData schema documents.  You will want to follow these instructions as well for the code generation.

IMPORTANT: You will also need a fork/version of Restlet that incorporates this pull request fix for a NullPointer that you will encounter with the code generation. (The bug exists in Restlet 2.2.1)

The command I ran to generate the Java classes for all the Graph API entities was as follows (run this from WITHIN the lib/ directory in the extracted Restlet zip/tarball you downloaded). In the command below, do NOT specify “$metadata” in the Generator URI as the tool appends that automatically.

java -cp org.restlet.jar:org.restlet.ext.xml.jar:org.restlet.ext.atom.jar:org.restlet.ext.freemarker.jar:org.restlet.ext.odata.jar:org.freemarker_2.3/org.freemarker.jar org.restlet.ext.odata.Generator ~/path/to/your/target/output/dir


  •  If you run the command and get the error “Content is not allowed in prolog.” (which I did). There might be some extra characters that are being prepended to the starting “<edmx:Edmx”  in the “$metadata” schema document that this endpoint is returning. If this is the case do the following:
    • Download the source XML $metadata document to your hard-drive
    • Open it up in a editor and REMOVE the “<?xml version=”1.0″ encoding=”utf-8″?>” that precedes the first “<edmx…” element
    • Next, just to ensure there are no more hidden chars at the start of the document, open up a hex editor on the document to get rid of any other hidden chars that precede the first “<edmx…”  element.
    • Save the changes locally, and fire-up a local webserver (or a webserver that lives anywhere) and set it up so that http://yourwebserver/whatever/$metadata will serve up that XML file.
    • Then proceed to alter the Restlet Generator command above to reference this modified URI as appropriate. Remember that you do NOT specify “$metadata” in the Generator URI as the tool appends that automatically.

Liferay clustering internals

Ever wondered how Liferay’s internal clustering works? I had to dig into it in context of my other article on globally load balancing Liferay across separate data-centers. This blog post merely serves as a place for my research notes and might be helpful for someone else who is trying to follow what is going on under the covers in Liferay in lieu of any real documentation, design document or code comments (which Liferay seems to have none of)

Current set of mostly unanswered questions at the Liferay community forums:

Liferay Cluster events, message types, join/remove

  • Anytime ClusterExecutorImpl.memberJoined/memberRemoved() is invoked all implementations of ClusterEventListener have processClusterEvent() invoked
    • MemorySchedulerClusterEventListener: calls ClusterSchedulerEngine.getMasterAddressString() which can trigger slaveToMaster or masterToSlave @see EXECUTE notes below
    • LuceneHelperImpl.LoadIndexClusterEventListener: @see NOTIFY/JOIN section below
    • LiveUsersClusterEventListenerImpl: @see DEPART/JOIN section below
    • DebuggingClusterEventListenerImpl: logs the event to the log
    • ClusterMasterTokenClusterEventListener: on any event, invokes getMasterAddressString() to determine who the master is, or become so itself if nobody is (controlled via entry in Lock_ table) After this runs it invokes notifyMasterTokenTransitionListeners() which calls ClusterMasterTokenTransitionListener.masterTokenAcquired() or masterTokenReleased()
      • Implementors of ClusterMasterTokenTransitionListener:
        • BackgroundTaskClusterMasterTokenTransitionListener: calls BackgroundTaskLocalServiceUtil.cleanUpBackgroundTasks() which in turn ends up invoking BackgroundTaskLocalServiceImpl -> cleanUpBackgroundTasks() which is annotated with @Clusterable(onMaster=true) gets all tasks with STATUS_IN_PROGRESS and changes their status to FAILED. Then gets the BackgroundTaskExecutor LOCK and UNLOCKs it if this server is the current owner/master for that lock…
    • JGroups
      • Member “removed”: triggers ClusterRequestReceiver.viewAccepted() that in turn invokes ClusterExecutorImpl.memberRemoved() which fires ClusterEventType.DEPART (see below)
    • ClusterMessageType.NOTIFY/UPDATE: These message types are requested from other nodes when a new node comes up. When received they are in-turn translated and re-broadcast locally within a individual node as a ClusterEvent of type JOIN via ClusterExecutorImpl.memberJoined() method. Note this latter class is different in the EE jar, the implementation of this method additionally invokes the EE LicenseManager.checkClusterLicense() method
      • “Live Users” message destination:
        • triggers actions done by LiveUsersMessageListener
        • invokes cluster RPC invocation of LiveUsers.getLocalClusterUsers()
        • response is processed by LiveUsers.addClusterNode
        • proceeds to “sign in” each user that is signed in on the other joined node, appears to do session tracking etc
      • LoadIndexClusterEventListener:
        • Loads a full stream of the lucene index per company id from any node that responds who has a local lucene index last generation long value that is > this node’s index’s value. This is only triggered on a JOIN event when the “control” JChannel’s total number of peer nodes minus one is <= 1, so that this only runs on a node boot and not at other times.
      • EE LicenseManager.checkClusterLicense()

        • Please see the notes below on licensing and verification in regards to clustering on bootup
    • ClusterEventType.DEPART: This event is broadcast locally on a node when it is shutdown
      • LiveUsersClusterEventListenerImpl: listens to local DEPART event and sends message over MessageBus to other cluster members letting them know this node is gone
    • ClusterMessageType.EXECUTE: These cluster messages contain serialized ClusterRequest objects that contain RPC like information on a class, parameters and a method to invoke on other node(s). Generated by ClusterRequest.createMulticastRequest and createUnicastRequest messages. The result of these are typically passed off to ClusterExecutorUtil.execute() methods (which end up calling ClusterExecutorImpl.execute())
    • ClusterRequest.createUnicastRequest() invokers:
      • LuceneHelperImpl: _getBootupClusterNodeObjectValuePair() called by getLoadIndexesInputStreamFromCluster when it needs to connect to another node only in order to load indexes on bootup from another node who’s index is up to date
      • ClusterSchedulerEngine: callMaster() called by getScheduledJob(s)(), initialize() and masterToSlave() only if MEMORY_CLUSTERED is enabled for the job engine. Initialize() calls initMemoryClusteredJobs() if the local node is a slave to get a list of of all MEMORY_CLUSTERED jobs. MEMORY_CLUSTERED jobs mean they only run on whoever the master is which is designated by which node owns the lock named “com.liferay.portal.scheduler.SchedulerEngine” in the Lock_ table in the database. masterToSlave() demotes the current node to a slave. slaveToMaster() gives current node opportunity to acquire lock above to become the scheduler master node
      • ClusterMasterExecutorImpl: used for executing EXECUTE requests against the master only. Leverages method getMasterAddressString() to determine who the master is (via which node owns the Lock_ table entry named “ClusterMasterExecutorImpl”.
    • ClusterRequest.createMulticastRequest() invokers:
      • JarUtil: Instructs other peer nodes to JarUtil.installJar(). Invoked by DataSourceFactoryUtil.initDataSource. Also by DataSourceSwapper.swap*DataSource(). SetupWizardUtil.updateSetup()
      • ClusterLoadingSyncJob: Invoked by EditServerAction.reindex(). This sends a multicast ClusterRequest of EXECUTE for LuceneClusterUtil.loadIndexesFromCluster(nodeThatWasJustReindexedAddress) to all nodes. So all nodes, even those on the other side of the RELAY2 bridge will get this, and then will attempt to connect over the bridge back to stream the “latest” index from the node that was just re-indexed. Note however that nodes across the bridge will output this stack trace in their logs when attempting to connect back to get a token on the re-indexed node (this routine is in _getBootupClusterNodeObjectValuePair() in LuceneHelperImpl and tries to do a unicast ClusterRequest to invoke TransientTokenUtil.createToken(). My guess on this error has something to do with the “source address” which is a jgroups local UUID, being sent to the remote bridge, and on receipt the local list in a different DC does not recognize that multicast address. This “source address” that is sent ClusterExecutorImpl.getLocalClusterNodeAddress(). Also see ClusterRequestReceiver *does* get invoked w/ a property scoped SiteUUID in the format of hostname:sitename, however ClusterRequestReceiver.processClusterRequest() defers to ClusterRequest.getMethodHandler().invoke() which has no knowledge of this originating SiteUUID and only the internal “local” jgroups UUID address that is embedded as an “argument” to loadIndexesFromCluster() in the ClusterRequest, hence this can’t effectively/properly call TransientTokenUtil.createToken() on the sender (and the error below). When it attempts the call via ClusterExecutorImpl (jgroups control channel .send()) using the “address” argument it received in the original inbound payload. We see the message is dropped cause the target address is invalid/unknown..
        • 21:10:30,485 WARN [TransferQueueBundler,LIFERAY-CONTROL-CHANNEL,MyHost-34776][UDP:1380] MyHost-34776: no physical address for 27270628-816b-ddba-872f-1f2d27327f2f, dropping message
        • 18:37:40,169 INFO [Incoming-1,shared=liferay-control][LuceneClusterUtil:47] Start loading Lucene index files from cluster node 9c4870e7-17c2-18fb-01c4-5817bb5d1290
          18:37:46,538 WARN [TransferQueueBundler,shared=liferay-control][TCP:1380] null: no physical address for 9c4870e7-17c2-18fb-01c4-5817bb5d1290, dropping message
        • 18:01:56,302 ERROR [Incoming-1,LIFERAY-CONTROL-CHANNEL,mm2][ClusterRequestReceiver:243] Unable to invoke method {arguments=[[J@431bd9ee, 5e794a8e-0b7f-c10d-be17-47ce43b66517],[J,com.liferay.portal.kernel.cluster.Address)}
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(
          at java.lang.reflect.Method.invoke(
          at com.liferay.portal.kernel.util.MethodHandler.invoke(
          at com.liferay.portal.cluster.ClusterRequestReceiver.processClusterRequest(
          at com.liferay.portal.cluster.ClusterRequestReceiver.receive(
          at org.jgroups.JChannel.invokeCallback(
          at org.jgroups.JChannel.up(
          at org.jgroups.stack.ProtocolStack.up(
          at org.jgroups.protocols.relay.RELAY2.deliver(
          at org.jgroups.protocols.relay.RELAY2.up(
          at org.jgroups.protocols.FORWARD_TO_COORD.up(
          at org.jgroups.protocols.RSVP.up(
          at org.jgroups.protocols.FRAG2.up(
          at org.jgroups.protocols.FlowControl.up(
          at org.jgroups.protocols.FlowControl.up(
          at org.jgroups.protocols.pbcast.GMS.up(
          at org.jgroups.protocols.pbcast.STABLE.up(
          at org.jgroups.protocols.UNICAST2.up(
          at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(
          at org.jgroups.protocols.pbcast.NAKACK2.up(
          at org.jgroups.protocols.VERIFY_SUSPECT.up(
          at org.jgroups.protocols.FD_ALL.up(
          at org.jgroups.protocols.FD_SOCK.up(
          at org.jgroups.protocols.MERGE3.up(
          at org.jgroups.protocols.Discovery.up(
          at org.jgroups.protocols.TP.passMessageUp(
          at org.jgroups.protocols.TP$
          at java.util.concurrent.ThreadPoolExecutor.runWorker(
          at java.util.concurrent.ThreadPoolExecutor$
          Caused by: com.liferay.portal.kernel.exception.SystemException: java.lang.NullPointerException
          ... 32 more
          Caused by: java.lang.NullPointerException
          ... 35 more
      • PortalImpl: resetCDNHosts() does a ClusterRequest.EXECUTE for the same PortalUtil.resetCDNHosts() method on all peer nodes. To avoid recursive loop uses ThreadLocal to enter the EXECUTE block (relevant for remote nodes)
      • LuceneHelperImpl: _loadIndexFromCluster() (invoked via JOIN event above). Invoked via EXECUTE remote method calls against LuceneHelperUtil.getLastGeneration() against all peers to see who’s lucene indexes generation time is newer up earler than ME, picks that one and then initiate a direct http stream connection to it to fetch the entire index from it …. Who is a “peer” candidate is determined by invoking ClusterExecutorUtil.getClusterNodeAddresses() which is ultimately realized by calling the “control” channel’s getView() method
      • MessageSenderJob: notifyClusterMember() invoked by MessageSenderJob.doExecute() when a job execution context’s next execution time is null. The notifyClusterMember() method invokes SchedulerEngineHelperUtil.delete() via a ClusterRequest.EXECUTE
      • ClusterableAdvice: AOP advice class that intercepts any Liferay method annotatted with @Clusterable, AFTER returning, will then create a ClusterRequest.EXECUTE for the same method passed to all nodes in the cluster. Used by ClusterSchedulerEngine (delete, pause, resume, schedule, suppressError, unschedule, update) BackgroundTaskLocalServiceImpl, and PortletLocalServiceImpl. Has an ‘onMaster’ property that awkwardly controls whether or not the advice will actually be applied or not. If onMaster=false, then the advice fast returns like a no-op. But if onMaster=true then the advice runs (locally on the node if THIS node is the master, or via a ClusterRequest.EXECUTE). WHO is the master is dictated by which nodes owns the LOCK in the database named “com.liferay.portal.scheduler.SchedulerEngine”.
      • EhcacheStreamBootstrapCacheLoader: start() and doLoad() invokes EhcacheStreamBootstrapHelpUtil -> loadCachesFromCluster() broadcasts a ClusterRequest for EXECUTE of the method createServerSocketFromCluster() on all peer nodes, and then attempts to connect to a peer node to load all named caches from the peer node…
  • ClusterExecutorImpl notes:
    • This class has several methods relating to figuring out “who” is in the “cluster” and due to the implementation of these methods you will get inconsistent “views” of who are the local cluster members (if using RELAY2 in JGroups)
      • getClusterNodeAddresses() = consults the “control” JChannel’s receiver’s “view” of the cluster. This will only return local nodes subscribed to this JChannel. These are fetched real-time every time its invoked.
      • getClusterNodes() = consults a local map called “_liveInstances”. Live instances is populated when the “memberJoined()” method is invoked, in response to ClusterRequestReceiver.processClusterRequest() being invoked when a new node comes up. Note that because this is in response to ClusterRequests…. which WILL be transmitted over RELAY2 in JGroups, this will reflect MORE nodes than what getClusterNodeAddresses() returns…..
      • There is also another local member called “_clusterNodeAddresses” which also is populated w/ addresses of other nodes when “memberJoined()” is invoked…. I really don’t understand why this exists, AND getClusterNodeAddresses() exists, AND getClusterNodes() exists when they could potentially return inconsistent data.
  • Quartz/Jobs scheduling
    •  ClusterSchedulerEngine proxies all requests for scheudling jobs through SchedulerEngingProxyBean (BaseProxyBean) which just relays schedule() invocations through Messaging which ends up invoking QuartzSchedulerEngine.schedule()
    • ClusterSchedulerEngine on initialization calls getMasterAddressString() which gets the Lock from the _Lock table named “com.liferay.portal.scheduler.SchedulerEngine” via LockLocalServiceImpl.lock() which is a row based locking in the database this is based on “owner” which is based on the local IP?
    • _doMasterToSlave() – queries current master for list of all MEMORY_CLUSTERED jobs (does this over RMI or Jgroups method invoke of getScheduledJobs), gets this list of job names and unschedules them locally
    • slaveToMaster() – takes everything in _memoryClusteredJobs and schedules it locally. _memoryClusteredJobs seems to just be a global list of the MEMORY_CLUSTERED jobs that are kept track of on each node…
  • Licensing: If running Liferay EE the LicenseManager.checkClusterLicense() -> checkBinaryLicense() is invoked via ClusterExecutorImpl.memberJoined() (note that this is only invoked in the EE version as the class is different in the portal-impl.jar contained in the EE distribution). This particular invocation chain appears to respond to the JOIN event for the peer node by checking if the “total nodes” the node knows about is greater than the total number of nodes the license permits, if it is, the local node gets a System.exit() invoked on it (killing itself). For learning purposes if you are so interested to learn more about this obfuscated process, you can do so by investigating further on your own, its all in the EE portal-impl.jar. Here are some other notes on the licensing and how it relates to clustering and issues one might encounter.
    • LicenseManager. getClusterLicense() invokes ClusterExecutorUtil.getClusterNodes() and does validation to check if you have more nodes in the cluster than your installed license supports (amongst other things).
    • Another thing that happens when ClusterExecutorImpl.memberJoined() is invoked is that ClusterExecutorImpl.getClusterNodes() appears to be invoked (which @see above, in addition to local peer nodes in the same DC, will return nodes that exist across a JGroups RELAY2 bridge). It picks one of these peer nodes and remotely invokes LicenseManager.getLicenseProperties() against it, this is done via a ClusterExecutorUtil.execute(). The return is a Future with a timeout, upon reception the Future has a Map in it containing the license properties, these are then compared agains the local node’s license, if they match we are good to go, otherwise….. read below (a mixing licenses error will occurr) Why is this relvent? Well despite having legit licenses, when using RELAY2, you can come across two erronious errors that will prevent your cluster from booting, as described below.
      • The first error you might see is a timeout error, when it reaches out to a peer node to verify a license in response to a new member joining the cluster (likely a member over a RELAY2 bridge). It will look like the below. Typically just rebooting Liferay takes care of the issue, as explained in the followup bullets:
        • java.util.concurrent.TimeoutException
          at com.liferay.portal.kernel.cluster.FutureClusterResponses.get(
          at com.liferay.portal.license.a.a.f(Unknown Source)
          at com.liferay.portal.license.a.a.b(Unknown Source)
          at com.liferay.portal.license.a.f.d(Unknown Source)
          at com.liferay.portal.license.a.f.d(Unknown Source)
          at com.liferay.portal.license.LicenseManager.c(Unknown Source)
          at com.liferay.portal.license.LicenseManager.checkBinaryLicense(Unknown Source)
          at com.liferay.portal.license.LicenseManager.checkClusterLicense(Unknown Source)
          at com.liferay.portal.cluster.ClusterExecutorImpl.memberJoined(
          at com.liferay.portal.cluster.ClusterRequestReceiver.processClusterRequest(
          at com.liferay.portal.cluster.ClusterRequestReceiver.receive(
          at org.jgroups.JChannel.invokeCallback(
          at org.jgroups.JChannel.up(
          at org.jgroups.stack.ProtocolStack.up(
          at org.jgroups.protocols.relay.RELAY2.up(
          at org.jgroups.protocols.FORWARD_TO_COORD.up(
          at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(
          at org.jgroups.protocols.FRAG2.up(
          at org.jgroups.protocols.FlowControl.up(
          at org.jgroups.protocols.FlowControl.up(
          at org.jgroups.protocols.pbcast.GMS.up(
          at org.jgroups.protocols.pbcast.STABLE.up(
          at org.jgroups.protocols.UNICAST.up(
          at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(
          at org.jgroups.protocols.pbcast.NAKACK2.up(
          at org.jgroups.protocols.BARRIER.up(
          at org.jgroups.protocols.VERIFY_SUSPECT.up(
          at org.jgroups.protocols.FD.up(
          at org.jgroups.protocols.FD_SOCK.up(
          at org.jgroups.protocols.MERGE2.up(
          at org.jgroups.protocols.Discovery.up(
          at org.jgroups.protocols.TP$ProtocolAdapter.up(
          at org.jgroups.protocols.TP.passMessageUp(
          at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(
          at org.jgroups.protocols.TP$
          at java.util.concurrent.ThreadPoolExecutor.runWorker(
          at java.util.concurrent.ThreadPoolExecutor$
    • When booting up several Liferay EE nodes (configured for clustering) simultaneously it appears you can run into a situation where all the nodes fail to boot and say they have “invalid licenses” (despite the fact you have valid cluster licenses installed under data/license)…and then your container processes are automatically shut down… nice
    • This appears to be due to the verification process that LicenseManager does on bootup where it receives license information from peer nodes, however if all nodes are started up at the same time it appears there is a condition of timing, when none of the nodes are yet in a “ready” state, so when peers connect to them for their license properties, they respond w/ nothing…. which is then compared against the local node’s license and so since they don’t “equals()” it decides the local license (despite being legit) is invalid..
    • Here is example output from NODE1 (booted up at same time as NODE2, note the blank remote license):
        • 13:55:20,381 ERROR [Incoming-5,shared=liferay-control][a:?] Remote license does not match local license. Local license: {productEntryName=Liferay Portal, startDate=11, expirationDate=22, description=Liferay Portal, owner=Liferay Portal, maxServers=4, licenseEntryName=Portal6 Non-Production (Developer Cluster), productVersion=6.1, type=developer-cluster, accountEntryName=Liferay Trial, version=2}
          Remote node: {clusterNodeId=11111111111, inetAddress=/, port=-1}
          Remote license: {}
          Mixing licenses is not allowed. Local server is shutting down.
      • Here is example output from NODE2 (booted up at same time as NODE1):
        • 13:55:20,388 ERROR [Incoming-1,shared=liferay-control][a:?] Remote license does not match local license. Local license: {productEntryName=Liferay Portal, startDate=11, expirationDate=22, description=Liferay Portal, owner=Liferay Portal, maxServers=4, licenseEntryName=Portal6 Non-Production (Developer Cluster), productVersion=6.1, type=developer-cluster, accountEntryName=Liferay Trial, version=2}
          Remote node: {clusterNodeId=222222222222, inetAddress=/, port=-1}
          Remote license: {}
          Mixing licenses is not allowed. Local server is shutting down.
      • How to solve this? The way I did it was to always boot one node first, with NO peers… wait for it to complete booting before bringing up all other nodes, then these errors seem to go away.


 JGroups RELAY2 notes

Liferay does NOT use RpcDispatcher but has its own “rpc” abstraction via “ClusterRequest” which is Serializable. (contains target addresses, MethodHandler, has a UUID, and type of EXECUTE) MethodHandler is just a serializable bag of arguments and a MethodKey which stores the target class, methodName and paramTypes. MethodHandler uses these deserialized parts on the receiving side to invoke the method locally via reflection

On the RECEIVING side is ClusterRequestReceiver which is the JGroups receiver that processes all inbound messages from peers. It takes the ClusterRequest and gets the MethodHandler (if type is EXECUTE) and invokes it, then returns the response as a ClusterNodeResponse



Clustering Liferay globally across data centers (GSLB) with JGroups and RELAY2

Recently I’ve have been looking into options to solve the problem of GSLB’ing (global server load balancing) a Liferay Portal instance.

This article is a work in progress… and a long one. Jan Eerdekens states it correctly in his article, “Configuring a Liferay cluster is part experience and part black magic” …. however doing it across data-centers however is like wielding black magic across N black holes….

Footnotes for this article are here:

The objective is a typical one.

  • You have a large Liferay portal with users accessing the portal from lots of different location across the world
  • The Liferay “cluster” is hosted in a single data-center in a particular region
  • The users outside of that region complain that using Liferay is slow for them
  • Your goal: extend the Liferay cluster across multiple regions while keeping all the functionality intact and boosting response times for users across the world by globally load balancing (GLSB) the application so users in Asia hit a local data-center (DC) while users in South America hit a DC closer to them in their respective region
  • Solution: Good luck finding publicly available documentation on this for Liferay

Hopefully this article will help others out there, point them in a new direction and give them some ideas on how to put something like this together.

I’d like to note that this is not necessarily the ideal way to do this…. but just one way you can do it “out of the box” without extending Liferay in other ways. (I say this because there are many optimizations one could do for slimming down what Liferay sends around a cluster by default… i.e. lots of heavier object serializations as events happen) I’d also like to note that I am not a Liferay expert and I’m sure some things I am describing here are not 100% accurate. What is described below is the result of a lot of analysis of the uncommented, undocumented Liferay source code. I’m sure the developers at Liferay could provide additional insight, corrections and clarifications to what is stated below, but in lieu of design documentation, code-comments and the alike this is all that we in the community have to go on.

For my use case I tested this with two data-centers. One home-grown bare-metal DC in “regionA” and the other in AWS “regionB”. Point being we have two DC’s that are geographically separated over a WAN; could be dedicated line, VPN tunnel; whatever; point being the bandwidth available between DC’s is nothing compared to within DC.


Little bit of background:

Liferay has some light documentation for clustering, however this documentation is focused on a Liferay cluster within a single DC. I say “light documentation” because that is the nicest way I can state it. The Liferay project in general is quite void of any real documentation when it comes to how things work internally “under the hood” in the Liferay core codebase (design documents, code comments etc). If you want to know how things work, you have to crawl through tons of un-commented source code and figure it out for yourself.

First off Liferay uses JGroups under the hood (specifically JGroups version 3.2.10 in Liferay 6.2 as of this post). JGroups is pretty much one of the de-facto “go-to’s” for building clusters in Java applications and has been around a very long time; many Java stacks use this. If you want to see a good example of a open-source project with good design documents that explain the internals, see JGroups (hint, hint Liferay guys) I’m not going to go much further into describing JGroups, you can do that on your own; as I’ll be using some JGroups terminology below.

Liferay & JGroups basics:

Liferay defines two primary JGroups channels for what Liferay calls “cluster link”.  You enable this in your by setting By default all channels in Liferay are UDP (multicast); if you are trying to cluster Liferay in a DC that does not support multicast (like AWS) you will want to configure it to use unicast (see this article)

  • Channel “control” entry = 
  • Channel “transport” entry =

Both of these channels are used for various things such as notifying other cluster members of a give nodes state (i.e. notifying they are up/down etc) and sending ClusterRequests to other nodes for invoking operations across the cluster. There does not seem to be any consistency as to why one is used over the other. For example streaming a Lucene index over from another node uses the control channel, which reindex() requests use the transport channel.

Out of the box, if you bring up more than one node in a local DC (configured w/ UDP multicast), the Liferay nodes will automatically discover each other, peer up in to a cluster and begin to send data to each other when appropriate. For unicast, again you have you make some changes to your to use unicast, but effectively the result is the same.

Great! Now what, we have have one DC with N local nodes that are all peered up with each other….. but how can this DC exchange state with DC2? Good question, when attempting to GSLB an application there are many considerations, specifically for Liferay the big ones that I noted that need to be addressed are below; note there are some more hidden in Liferay’s internals, but for the big picture lets just focus on these 🙂

Cluster “master” determination:

Liferay has the concept of a logical “master” and who is the “master” is determined by ownership of a named lock called “com.liferay.portal.cluster.ClusterMasterExecutorImpl” that resides in the “Lock_” table in the database. @see ClusterMasterTokenClusterEventListener. Note that the database is shared by all nodes across all DC’s (see database section below), and this presents a huge problem if clusters in separate DC’s (which are only locally aware of peer-DC-nodes by default w/ Jgroups) can’t talk to each other across DC’s; which is what this article is about. i.e. node1 in DC1 might acquire this Lock_ first, but the nodes in DC2 cannot communicate w/ the “master” because it is in an unreachable DC.


This is an enormous topic on its own, but for this article lets keep it dumb simple. Leverage Liferay’s reader-writer database configuration. For example in DC1 setup your master instance for your database and designate as your “write” database, then configure two read-slave instances; one slave in DC1 and another in DC2. In your files for nodes in both DC’s configure “jdbc.write.url” to hit the master instance in DC1 and “” to hit whichever read-slave instance is local within each DC.

Cached data:

Liferay leverages Ehcache for caching within the application. Clustering for Ehcache can be enabled by reading this article. The configuration relies on a separate JGroups channel within the Liferay cluster and you need to properly configure it for unicast if your DC does not support multicast just like previously described.

Fire up two separate DC’s pointing to a database setup as previously described, go change some data in the app via a node in DC1, and because it is likely cached, when you view that same page/data in DC2 you won’t see the change visible in the UI. Why? Well because the clustering is only local to each DC. When model objects are updated in Liferay they are saved in the database, and then an event occurs (distributed via JGroups) that tells peer nodes to dump the cache entry…. but only local to that DC. So you say, “well why not just enable unicast for all nodes in every DC so they are aware of all other nodes in all other DCs?” You could, but imagine the cross-talk; no thanks. There are a few solutions to this, one will be described below (via RELAY2) and another could be provided by the ehcache-jms-wan-replicator project.

Indexed data:

Liferay’s default search index is implemented using Lucene. There are several ways to configure Lucene for a Liferay cluster... but for this article lets keep it simple and not setup Solr to avoid the complexity of having to GSLB that as well… and just enable “lucene.replicate.writes=true”. So each peer node within a local DC has its own Lucene index copy, and once again Liferay leverages JGroups (via ClusterExecutorImpl) triggered by all sorts of fancy AOP (see, @Indexable annotation, IndexWriterProxyBean + review search-spring.xml in the codebase) to essentially intercept a index write and broadcast a”reindex this thing” message to peer nodes in the local DC cluster. Note that Liferay sends the entire object w/ data to be reindexed to all peer nodes over the JGroups channel (which is not necessarily efficient over a WAN). Secondly, when you go to Server Administration in the GUI and hit the “reindex all index data” button, the server you invoke this on also invokes that operation against all peer nodes. Lastly, another hidden thing is that peer nodes will suck over the entire Lucene index via an HTTP call on startup from a donor node…..again we’ll touch on this later and the considerations to think about.

Again, fire up two separate DC’s pointing to a database/caching setup as previously described. Go to control panel and add a new user in DC1. Great you see the new user in the user’s screen when accessing through a DC1 node. Go view the users from DC2’s perspective. You won’t see the user, nor can you find them in a search despite them being in the database. Why? Well two things, first that Lucene “reindex this thing” message did not make it to DC2, and secondly (unless at this point you have either RELAY2 setup OR ehcache-jms-wan-replicator configured) these screens are also reliant on what is in Ehcache and a combination of what is in the local Lucene index.

Document library files:

This is most definitely a consideration for GSLB’ing Liferay across DC’s, however it really does not have anything to do w/ JGroups and RELAY2 in particular so I’m not going to discuss it here. I’ll point you in a direction… consider putting Liferay’s files in S3 and abstracting that with something like YAS3FS which is quite promising for building a distributed file store with local node CDN read performance. Much faster than Liferay’s S3 implementation and globally aware of changes to the FS.

Job scheduling:

Liferay has two classes of “jobs” (implemented via Quartz); master jobs which run only on ONE node, the “master” node and “slave” jobs which can run on any node. Again who is the “master” job runner is determined by an entry in the Lock_ table named “com.liferay.portal.kernel.scheduler.SchedulerEngine”. Who gets the lock is essentially which server boots up and acquires it first. The same problem that exists as noted above with regards to slave->master communication exists here in separated DC to DC environments where the nodes in separate DC’s cannot talk to one another over a WAN. Liferay has a few job types denoted by com.liferay.portal.kernel.scheduler.StorageType of:

  • MEMORY: run on any node and transient in memory
  • MEMORY_CLUSTERED: run only on the “master” node but transient in memory
  • PERSISTED: runs on any node but state persisted in database

Point being is that the job engine in Liferay has a dependency to be ably to communicate to all other nodes.

“Live Users”:

If you have the “live.users.enabled=true” option set in your you can do user monitoring in the cluster and this as well, if clustering is enabled needs to see all nodes in the cluster. When a cluster node comes up it sends a ClusterMessageType.NOTIFY/UPDATE which goes to all nodes which in turn broadcast a local ClusterEvent.JOIN which is reacted to by sending a unicast ClusterRequest.EXECUTE for LiveUsers.getLocalClusterUsers() on the remote node to effectively sign-in the users on the other node locally on the requesting node. This appears to be there so that each node will locally reflect all logged in users across the cluster. Again this will be incomplete if node1 on DC1 cannot talk to node2 @ DC2.


There are a few other functions in Liferay that appear to leverage the underlying Jgroups channels (i.e. EE Licensing, JarUtil, DataSourceFactoryUtil, SetupWizard, PortalManagerUtil.manage() (related to JMX?), PortalImpl.resetCDNHosts() and EhcacheStreamBootstrapCacheLoader. You can see my other notes article for these tidbits.


It might be helpful to diagnose whats going on by tweaking the logging settings for Liferay. To do this you should read this article to permanently change your log settings (which will dump more info on bootup which is important). Don’t rely on changing your logging settings via the GUI screen as those are not permanently saved and are transient. Below is an example webapps/ROOT/WEB-INF/classes/META-INF/portal-log4j-ext.xml file I used to triage various issues on bootup related to clustering.

<?xml version="1.0"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">

<log4j:configuration xmlns:log4j="">

<category name="com.liferay.portal.cluster">
<priority value="TRACE" />

<category name="com.liferay.portal.license">
<priority value="TRACE" />

<category name="org.jgroups">
<priority value="DEBUG" />

<category name="com.liferay.portal.kernel.cluster">
<priority value="TRACE" />



How JGroups RELAY2 can bridge the gap.

So at this point it should be clear to anyone reading that we need a way to get these separated clusters in DC1 and DC2 to talk to one another. First off we could just change the control/transport channels in Liferay to force use TCP UNICAST and specifically list all nodes globally across DC1 and DC2. This would let every node know about every other node globally, however this won’t scale well as each node would talk to every other node. The other option is the RELAY2 functionality available in JGroups.

Essentially what RELAY2 provides is a “bridge” between N different JGroups clusters that are physically separated by a WAN or other network arrangement. By adding RELAY2 into your JGroups stack you can somewhat easily ensure that ALL messages that are passed through a JGroups channel will be distributed over the bridge to all other JGroups clusters in other DCs. Think of RELAY2 as secondary “cluster” in addition to your local cluster, however only ONE node in your local JGroups cluster (the coordinator) is responsible for “relaying” all messages over the bridge to the other “coordinators” in the relay cluster, for distribution to the other coordinators local JGroups cluster. So “out of the box” this can let you ensure that all Liferay cluster operations that occur in DC1 get transmitted to DC2 transparently. So with this enabled, when we add that user to our local Lucene index across all local nodes via UDP in the local cluster, the coordinator node in DC2 will also receive that event and transmit it locally to all nodes in the DC2 cluster. Now when you view the “users” on node2 in DC2 you will see the data that was added by node1 in DC1

One IMPORTANT caveat is that using the RPC building blocks in JGroups/RELAY2 does not effectively cascacde RPC calls over the bridge. See here and this thread. To Liferay’s credit they did not implement the “RPC” like method invocations across Liferay clusters’s using RPC in JGroups (i’m not sure why) but rather they serialize ClusterRequests which encapsulate what is to be invoked on the other side via method reflection, and just send these messages as serialized objects over the wire. Had they used RPC one would have to modify Liferay’s code to get these RPC invocations across the RELAY2 bridge.

Screen Shot 2014-05-30 at 3.44.38 PM

How to configure in Liferay:

What I am describing below assumes you are just using the Liferay cluster defaults of a local UDP multicast cluster, again if you are in AWS you will just need to adjust your unicast TCP JGroups stack accordingly, the configuration is pretty much the same w/ regards to where RELAY2 in configured the stack

IMPORTANT: this only covers the transport and control channels. If you want to enable this kind of relay bridge for the separate Ehcache channels that Liferay uses, you will repeat the process (described in the steps below) for the Ehcache JGroups channel definitions in Liferay as well… summary high-level steps below (note alternatively you could leave the ehcache jgroups configuration alone in liferay and just leverage the ehcache jms wan replicator.)

  • UNCOMMENT: ehcache.cache.manager.peer.provider.factory = net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory
  • MODIFY: “ehcache.multi.vm.config.location.peerProviderProperties” and add a “connect” property to manually define a JGroups stack that incorporates RELAY2 similar in the fashion to how we do it below for the “control” and “transport” channels. @see the ehcache documentation here.


Ok, first lets define our “relay clusters” for both the control and transport channels in Liferay. Note all the steps below need to be done for all nodes across all DC’s and you need to adjust certain things relative to what DC you are running in, particularly the “site” names in the RELAY2 configurations

1. Create a file in your WEB-INF/classes dir called “relay2-transport.xml”.

<RelayConfiguration xmlns="urn:jgroups:relay:1.0">
        <site name="dc1" id="0">
                <bridge config="relay2_global_transport_tcp.xml" name="global_transport" />

        <site name="dc2" id="1">
                <bridge config="relay2_global_transport_tcp.xml" name="global_transport" />

2. Create a file in your WEB-INF/classes dir called “relay2-control.xml”

<RelayConfiguration xmlns="urn:jgroups:relay:1.0">
        <site name="dc1" id="0">
                <bridge config="relay2_global_control_tcp.xml" name="global_control" />

        <site name="dc2" id="1">
                <bridge config="relay2_global_control_tcp.xml" name="global_control" />

3. Next we need to configure the actual TCP relay clusters for both the control/transport relay configurations. The sample below is a TEMPLATE you can use and copy twice, once for “relay2_global_control_tcp.xml” and another for “relay2_global_transport_tcp.xml”. Change the TCP.bind_port and TCPPING.initialhosts appropriately for each file. For TCPPING.inititalhosts, you will want to list ONE known node that lives in each DC. These will be the initial relay coordinator nodes.

    <config xmlns:xsi=""





    <TCPPING timeout="3000"
    <MERGE2  min_interval="10000"
    <FD timeout="3000" max_tries="3" />
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 use_mcast_xmit="false"
    <UNICAST2 />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
    <pbcast.GMS print_local_addr="true" join_timeout="3000"

    <MFC max_credits="2M"
    <FRAG2 frag_size="60K"  />
    <!--RSVP resend_interval="2000" timeout="10000"/-->


4. At this point we have the relay cluster configuration defined. Now we need to adjust Liferay’s “control” and “transport” channel JGroups stack to utilize them. Open up and add the following to the control and transport channels. Note these are long lines, copied from default to and then appending to the stack:

:FORWARD_TO_COORD:relay.RELAY2(site=[DC1 | DC2];config=${configRootDir}/relay2-[transport|control].xml;relay_multicasts=true)

IMPORTANT: be sure to set the “site=” property of RELAY2 to match the DC you are configuring this for and must match the “site” name in the relay2-transport.xml and relay2-control.xml files accordingly.${["cluster-link-control"]};mcast_group_addr=${["cluster-link-control"]};mcast_port=${["cluster-link-control"]}):PING(timeout=2000;num_initial_members=20;break_on_coord_rsp=true):MERGE3(min_interval=10000;max_interval=30000):FD_SOCK:FD_ALL:VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK2(xmit_interval=1000;xmit_table_num_rows=100;xmit_table_msgs_per_row=2000;xmit_table_max_compaction_time=30000;max_msg_batch_size=500;use_mcast_xmit=false;discard_delivered_msgs=true):UNICAST2(max_bytes=10M;xmit_table_num_rows=100;xmit_table_msgs_per_row=2000;xmit_table_max_compaction_time=60000;max_msg_batch_size=500):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=50000;max_bytes=4M):pbcast.GMS(join_timeout=3000;print_local_addr=true;view_bundling=true):UFC(max_credits=2M;min_threshold=0.4):MFC(max_credits=2M;min_threshold=0.4):FRAG2(frag_size=61440):RSVP(resend_interval=2000;timeout=10000):FORWARD_TO_COORD:relay.RELAY2(site=[DC1 | DC2];config=${configRootDir}/relay2-control.xml;relay_multicasts=true)${["cluster-link-udp"]};mcast_group_addr=${["cluster-link-udp"]};mcast_port=${["cluster-link-udp"]}):PING(timeout=2000;num_initial_members=20;break_on_coord_rsp=true):MERGE3(min_interval=10000;max_interval=30000):FD_SOCK:FD_ALL:VERIFY_SUSPECT(timeout=1500):pbcast.NAKACK2(xmit_interval=1000;xmit_table_num_rows=100;xmit_table_msgs_per_row=2000;xmit_table_max_compaction_time=30000;max_msg_batch_size=500;use_mcast_xmit=false;discard_delivered_msgs=true):UNICAST2(max_bytes=10M;xmit_table_num_rows=100;xmit_table_msgs_per_row=2000;xmit_table_max_compaction_time=60000;max_msg_batch_size=500):pbcast.STABLE(stability_delay=1000;desired_avg_gossip=50000;max_bytes=4M):pbcast.GMS(join_timeout=3000;print_local_addr=true;view_bundling=true):UFC(max_credits=2M;min_threshold=0.4):MFC(max_credits=2M;min_threshold=0.4):FRAG2(frag_size=61440):RSVP(resend_interval=2000;timeout=10000):FORWARD_TO_COORD:relay.RELAY2(site=[DC1 | DC2];config=${configRootDir}/relay2-transport.xml;relay_multicasts=true)


 5. Ok, now startup Liferay and adjust its startup java options to add the following. This is required so that the JGroups stacks can find the relay2-transport.xml and relay2-control.xml relay configuration files.


 6. In Liferay’s logs you should see some additional JGroups control/transport channels showing up that represent the bridge between the report sites.


At this point you should now have two separate Liferay clusters in two separate data-centers “bridged” using JGroups RELAY2 so that many of the issues described in this article are resolved and Liferay cluster events/messages are received across both data-centers. Note that unless you also tweaked the Ehcache JGroups configuration (as noted earlier) or are using the ehcache jms wan replicator, the Ehcache clustered cache replication events will not be sent to the other DC’s

That said, again this is not necessarily the best way to do this. What is the best way is to be determined as this is an experiment in progress. There are many things that may or may not really be necessary to warrant utilizing RELAY2 as the means to bridge multiple separated Liferay clusters. Liferay can generate a lot of cluster traffic and if you are bridging over a WAN this may not be efficient, or could potentially block operations in DC1 while waiting on a coordinator or timeout in DC2, resulting in perceived slowness on the sender side depending on if Liferay is invoking things remotely via ClusterRequests synchronously or asynchronously.

The latter could potentially be alleviated by tweaking your TCP bridge configuration to optimize the TCP “max_bundle_size” and “max_bundle_timeout” parameters. Doing so you could reduce the sheer amount of messages sent back and forth over the bridge. I.E. let the bridge queue up N messages or until the total amount of data to be sent is >= N(size); effectively “batching” the data to be sent around the WAN. Note when tweaking these configuration settings you may encounter this kind of warning, that you will need to adjust your OS settings accordingly :

“WARN  [localhost-startStop-1][UDP:547] [JGRP00014] the receive buffer of socket MulticastSocket was set to 500KB, but the OS only allocated 124.93KB. This might lead to performance problems. Please set your max receive buffer in the OS correctly (e.g. net.core.rmem_max on Linux)”

Potential alternatives to RELAY2 are implementing point-specific solutions for transmitting the most visible/critical information to other DC’s such as cache events in a batched optimized fashion by using something like the Ehcache JMS WAN replicator. Another example is writing an extension for Lifferay which would do something similar for batching “reindex()” requests across DC’s, so rather than relying on RELAY2 which would transmit full Lucene documents over the wire… one-by-one… and extension could be developed to batch these in an optimized fashion where only model ids, type and operation request are conveyed rather than the entire Lucene document.

Also note that because Liferay streams the entire contents of its Lucene indexes to peer nodes…  it is important to understand “who” Liferay considers a peer node. This is determined by calling its “control” JChannel’s, “receiver” who is Liferay’s BaseReceiver’s getView() method which should only return local DC peers, and not those across a RELAY2 bridge which should avoid it connecting to a node in a WAN separated DC to stream the index (note according to these docs, a JGroups channel stack with RELAY2 enabled does NOT return views that span bridges).  If you didn’t use RELAY2 but instead manually configured a giant UNICAST cluster across WANs, one would have to consider how clusters boot up, because if you start one node in DC1, then node2 in DC1 would get its index update from node1(DC1) (which is fine because they would be on the same local network). However when node3 comes up in DC2, because it recognizes the “master” node (via the shared database Lock_ table entry) as living in DC1 it would have to stream its index over the WAN. I’d also like to note that Liferay WILL attempt to stream an index from a node in another DC if you do a manual “reindex” that is initiated by a remote DC, however this will fail due to the way the ClusterRequest to reindex() in other DC’s sends along the jgroups “address” (see the footnotes article section on “ClusterLoadingSyncJob” for more details.)

There is also one other oddity I noted in “who” Liferay considers are “peers” when RELAY2 is used. See the section on ClusterMasterExecutorImpl in the footnotes for more information.

Regardless, doing any of the latter customizations requires analysis of Liferays behavior/code to determine the impact on what “events” in Liferay could be missed by not using RELAY2 (which will catch everything). You also would then be responsible for keeping tabs on what changes as Liferay does new releases, modifies the way one particular “clusterable” action behaves or adds totally new features that send messages over the cluster.

Hopefully this document will help others out there trying to solve this kind of problem and save others some valuable time!