Hazelcast discovery with Etcd

I’ve used Hazelcast for years and have generally relied upon the availability of multicast for Hazelcast cluster discovery and formation (within a single data-center). Recently was faced with two things, expand the footprint into a non-multicast enabled data-center and secondly pre-prep the service for containerization where nodes will come and go as scaling policies dictate it…. hardwired Hazelcast clustering via an XML configuration and/or reliance on multicast is a no-go.

With Hazelcast 3.6, they now support a pluggable implementation for a cluster discovery mechanism called the Discovery SPI. (Discovery Strategy) Perfect timing, given we are already playing with Etcd as part of our Docker container strategy, this was an opportunity to let our application’s native clustering mechanism (coded on top of Hazelcast) to leverage Etcd as well as discover/remove peers both within, and potentially across data-centers.

So I coded up hazelcast-etcd-discovery-spi available on GitHub.

diag.png

This works with Hazelcast 3.6-EA+ and Etcd to provide (optional) automatic registration of your hazelcast nodes as Etcd services and automatic peer discovery of the Hazelcast cluster.

Note that the automatic registration of each hazelcast instance as a Etcd service is OPTIONAL if you want to manually maintain these key-paths in etcd. I added that in simply because I think it will be convenient for folks, especially when containerizing a Hazelcast enabled app (such as via Docker) where the less “dependencies” and manual things to do (i.e. register your hazelcast nodes manually).. the better. You can totally embedded this functionality with this discovery strategy SPI.

I hope others find this helpful, and please leave your feedback, pull-requests or issues on the project!

NOTE! Interested in consul? There is a separate project which is built around Consul for your discovery strategy located here: https://github.com/bitsofinfo/hazelcast-consul-discovery-spi

 

Hazelcast discovery with Consul

I’ve used Hazelcast for years and have generally relied upon the availability of multicast for Hazelcast cluster discovery and formation (within a single data-center). Recently was faced with two things, expand the footprint into a non-multicast enabled data-center and secondly pre-prep the service for containerization where nodes will come and go as scaling policies dictate it…. hardwired Hazelcast clustering via an XML configuration and/or reliance on multicast is a no-go.

With Hazelcast 3.6, they now support a pluggable implementation for a cluster discovery mechanism called the Discovery SPI. (Discovery Strategy) Perfect timing, given we are already playing with Consul as part of our Docker container strategy, this was an opportunity to let our application’s native clustering mechanism (coded on top of Hazelcast) to leverage Consul as well as discover/remove peers both within, and potentially across data-centers.

So I coded up hazelcast-consul-discovery-spi available on GitHub.

diag.png

This works with Hazelcast 3.6-EA+ and Consul to provide automatic registration of your hazelcast nodes as Consul services (without having to run a local Consul agent) and automatic peer discovery of the Hazelcast cluster.

Note that the automatic registration of each hazelcast instance as a Consul service is OPTIONAL if you already have Consul agents running that define your Hazelcast service nodes. I added that in simply because I think it will be convenient for folks, especially when containerizing a Hazelcast enabled app (such as via Docker) where the less “dependencies” like a Consul agent available on the host, or in the container (or another container).. the better. You can totally embedded this functionality with this discovery strategy SPI.

I hope others find this helpful, and please leave your feedback, pull-requests or issues on the project!

NOTE! Interested in etcd? There is a separate project which is built around etcd for your discovery strategy located here: https://github.com/bitsofinfo/hazelcast-etcd-discovery-spi

 

Aggregate, backup elasticsearch fs snapshots across a widely distributed cluster

One of the Elasticsearch clusters I’ve worked on is spanned across multiple data-centers around the world and stores some very large indexes. Sometimes, but not often we have the need to get a backup of one of these indexes off of the cluster for restoration onto another cluster, but due to the sheer size of these indexes, its not practical for us to snapshot it to S3 or even a shared NFS mount (as the cluster spans multiple data-centers). Therefore the local file-system “fs” snapshot type is the only one really usable for us in this scenario.. but what you end up with is parts of the snapshot distributed across individual nodes all over the world.

So there was a need for a tool to automate the task of collecting all of the individual snapshot “parts” and downloading them to a central machine. If you’ve ever looked into the actual format of an elasticsearch snapshot its a little tedious… i.e. you just can’t blindly copy over the contents of snapshot shard directory contents as ES smartly does snapshots via diffs and keeping track of what files are relevant for each snapshot in metadata files; see here for an excellent overview: https://www.found.no/foundation/elasticsearch-snapshot-and-restore/.

So in the end I came up with elasticsearch-snapshot-manager (Scala) as a tool for handling all of this (analyzing, aggregating, downloading).

This tool is intended to aid with the following scenario:

  1. You have a large elasticsearch cluster that spans multiple data-centers
  2. You have a “shared filesystem snapshot repository” who’s physical location is local to each node and actually NOT on a “shared device” or logical mountpoint (i.e due to (1) above), the snapshots reside on local-disk only.
  3. You need a way to execute the snapshot, then easily collect all the different parts of that snapshot which are located across N nodes across your cluster
  4. This tool is intended to automate that process…

Please see the github project for all the details @ https://github.com/bitsofinfo/elasticsearch-snapshot-manager , feedback appreciated.

Book review: Building Microservices

Screen Shot 2015-04-06 at 10.11.15 PMRecently I read Sam Newman’s “Building Microservices” , at ~280 pages its a fairly quick read. The reviews on this book overall are mixed and I can see where readers are coming from. By the title of this book one might expect some coverage of some of the microservices frameworks out there, concrete examples, maybe some actual code… but you won’t really find that here. Instead you will find a pretty good overview of various architectural approaches to modern application design in today’s world; covering general topics such a proper separation of concerns, unit-testing, continuous integration, automation, infrastructure management, service discovery, fault tolerance, high-availability and security etc.

In reality, none of the principles covered in this book are the exclusive domain of “microservice” application architectures, but rather can (and should be) applied to any application you are considering deploying; whether its a “monolithic” application or a suite of microservices interacting as parts of a larger functioning application.

In that right I think this book is definitely a good read and worth a look, if for nothing more than to ensure your team gets a refresher on good design principles and how they can be materialized with some of the newer frameworks and tool sets that have come out of our community in recent years. The material presented is sound.

Fix: HDP, YARN, Spark “check your cluster UI to ensure that workers are registered and have sufficient resources”

Are you trying to submit a Spark job over YARN on an HDP Hadoop cluster and encounter these kinds of errors? (below)

If so just add the following 2 lines to your [spark-home]/conf/spark-defaults.conf file:

# customize for your HDP version...

spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041

 

ERRORS

You will see the errors below, stem from the root issue that occurs on an Spark Executor node where its trying to do a substitution for ${hdp.version} for which a definition variable does not exist, the above fixes that.

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

15/04/03 13:40:30 INFO yarn.YarnAllocator: Completed container container_1428072484378_0004_01_000003 (state: COMPLETE, exit status: 1)
15/04/03 13:40:30 INFO yarn.YarnAllocator: Container marked as failed: container_1428072484378_0004_01_000003. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1428072484378_0004_01_000003
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_1428072484378_0004_01_000003/launch_container.sh: line 26: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_1428072484378_0004_01_000003/launch_container.sh: line 26: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1</pre>


2015-04-03 13:41:27,055 INFO  container.Container (ContainerImpl.java:handle(999)) - Container container_1428072484378_0004_02_000013 transitioned from LOCALIZED to RUNNING
2015-04-03 13:41:27,068 INFO  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:buildCommandExecutor(267)) - launchContainer: [bash, /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_
1428072484378_0004_02_000013/default_container_executor.sh]
2015-04-03 13:41:27,614 WARN  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1428072484378_0004_02_000013 is : 1
2015-04-03 13:41:27,614 WARN  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(229)) - Exception from container-launch with container ID: container_1428072484378_0004_02_000013 and exit code: 1
ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_1428072484378_0004_02_000013/launch_container.sh: line 26: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hado
op-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/sh
are/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framewor
k/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitut
ion

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-04-03 13:41:27,614 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(283)) - Exception from container-launch.
2015-04-03 13:41:27,615 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(283)) - Container id: container_1428072484378_0004_02_000013
2015-04-03 13:41:27,615 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(283)) - Exit code: 1

Fix: HDP “Unauthorized connection for super-user: oozie from IP 127.0.0.1”

Recently have been playing with HortonWorks HDP 2.2. Was starting to configure some oozie workflows and when submitting the job the first step’s Hive script failed with this error and stack.


JA002: Unauthorized connection for super-user: oozie from IP 127.0.0.1

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): Unauthorized connection for super-user: oozie from IP 127.0.0.1
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy39.getDelegationToken(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:306)
... 30 more

To fix this, SSH into your HDP instance VM and edit: /etc/hadoop/conf/core-site.xml and change the following config to add “localhost”. Save and restart the relevant services or just reboot your HDP VM instances.


<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>sandbox.hortonworks.com,127.0.0.1,localhost</value>
</property>

Execute Powershell commands via Node.js, REST, AngularJS

Building on my last post on stateful-process-command-executor this post will cover how you can leverage that node.js module to expose the capabilities of Powershell cmdlets over a REST api presented through an AngularJS interface.  Why would one want to do this you ask? Well I’ve covered this in my last post but I will briefly explain it here.

(Note, what is described below could just as easily be built for Bash processes as well as the underlying module supports it)

The use case came out of the need to automate certain calls to manage various objects within Microsoft o365’s environment. Unfortunately Microsoft’s GraphAPI, does not expose all of the functionality that is available via its suite of various Powershell cmdlets for o365 services. Secondly when you need to do these operations via Powershell, its requires a per-established remote PSSession to o365…. and establishing (and tearing down) a new remote PSSession whenever you need to invoke a cmdlet against a remote resource (remote server, or o365 endpoint) is expensive. Lastly, who wants to actually sit there and manually run these commands when they could be automated and invoked on demand via other means… such as via a web-service etc. Hence this is how stateful-process-command-proxy came to be… it provides the building block bridge between node.js and a pool of pre-established Powershell consoles. Once you have node.js talking to stateful-process-command-proxy, you can build whatever you want on top of that in node.js to mediate the calls.

Layer one

The first higher level NPM module that builds on stateful-process-command-proxy is powershell-command-executor

What this adds on top of stateful-process-command-proxy is probably best described by this diagram:

 

So the main thing to understand is that the module provides the PSCommandService class which takes a registry of pre-defined “named” commands and respective permissible arguments. The registry is nothing more than a object full of configuration and is easy to define. You can see an example here in the project which defines a bunch of named “commands” and their arguments usable for o365 to manipulate users, groups etc.  PSCommandService is intended to serve as a decoupling point between the caller and the StatefulProcessCommandProxy… in other words a place where you can restrict and limit the types of commands, and arguments (sanitized) that can ever reach the Powershell processes that are pooled within StatefulProcessCommandProxy.

It is PSCommandService‘s responsibility to lookup the named command you want to execute, sanitize the arguments and generate a literal Powershell command string that is then sent to the StatefulProcessCommandProxy to be execute. StatefulProcessCommandProxy, once the command is received is responsible for checking that the command passes its command whitelist and blacklist before executing it. The sample o365Utils.js config file provides a set of pre-canned (usable) examples of init/destroy commands, auto-invalidation commands and whitelist/blacklist configs that you can use when constructing the StatefulProcessCommandProxy that the PSCommandService will use internally.

Layer two

The next logic step is to expose some sort of access to invoking these pre-canned “commands” to callers. One way to do this is via exposing it via a web-service.

WARNING: doing such a thing, without much thought can expose you to serious security risks. You need to really think about how you will secure access to this layer, the types of commands you expose, your argument sanitiziation and filtering of permissible commands via whitelists and blacklists etc for injection protection. Another precaution you may want to take is running it only on Localhost for experimental purposes only. READ OWASPs article on command injection.

Ok with that obvious warning out of the way here is the next little example project which provides this kind of layer that builds on top of the latter: powershell-command-executor-ui

This project is a simple Node.js ExpressJS app that provides a simple set of REST services that allows the caller to:

  • get all available named commands in the PSCommandService registry
  • get an individual command configuration from the registry
  • generate a command from a set of arguments
  • execute the command via a set of arguments and get the result
  • obtain the “status” of the underlying StatefulProcessCommandProxy and its history of commands

Given the above set of services, one can easily build a user-interface which dynamically lets the user invoke any command in the registry and see the results… and this is exactly what this project does via an AngularJS interface (albeit a bit crude…). See diagrams below.

Hopefully this will be useful to others out there, enjoy.

 

 

 

Encrypting and storing powershell credentials

Please see: https://github.com/bitsofinfo/powershell-credential-encryption-tools

Recently I had the need to store some credentials for a powershell script (i.e. credentials that I ultimately needed in a PSCredential object). The other requirement is that these credentials be portable and “user” independent, meaning that they could not be encrypted using the DPAPI (windows data protection api) as that binds the “secret” used for the encryption to the currently logged in user (which reduces your portability and usage of these encrypted credentials). The way to avoid this is to specify the secret key parameters in the ConvertTo-SecureString and ConvertFrom-SecureString commands which will force it to use AES (strength determined by your key size)

I ended up coding a few powershell scripts that assist with the creation of a JSON AES-256 encrypted credentials file + secret key, as well as functions you can include in other powershell scripts to load these credentials into usable formats such as PSCredentials, SecureStrings etc.

Please see: https://github.com/bitsofinfo/powershell-credential-encryption-tools

NOTE! The most important thing about using the output from this tool, is properly locking down (i.e. file permissions) the secret key!

The format of the resulting file looks something like this:

{ "username" : "AESEncryptedValue", "password": "AESEncryptedValue" }

Executing stateful shell commands with Node.js – powershell, bash etc

Hoping this will be useful for others out there, I’ve posted some code that could to serve as a lower level component/building block in a node.js application who has a need to mediate interaction with command line programs on the back-end. (i.e. bash shells, powershell etc.)

The project is on github @ stateful-process-command-proxy and also available as an NPM module

This is node.js module for executing os commands against a pool of stateful child processes such as bash or powershell via stdout and stderr streams. It is important to note, that despite the use-case described below for this project’s origination, this node module can be used for proxying long-lived bash process (or any shell really) in addition to powershell etc. It works and has been tested on both *nix, osx and windows hosts running the latest version of node.

This project originated out of the need to execute various Powershell commands (at fairly high volume and frequency) against services within Office365/Azure bridged via a custom node.js implemented REST API; this was due to the lack of certain features in the REST GraphAPI for Azure/o365, that are available only in Powershell (and can maintain persistent connections over remote sessions)

If you have done any work with Powershell and o365, then you know that there is considerable overhead in both establishing a remote session and importing and downloading various needed cmdlets. This is an expensive operation and there is a lot of value in being able to keep this remote session open for longer periods of time rather than repeating this entire process for every single command that needs to be executed and then tearing everything down.

Simply doing an child_process.exec per command to launch an external process, run the command, and then killing the process is not really an option under such scenarios, as it is expensive and very singular in nature; no state can be maintained if need be. We also tried using edge.js with powershell and this simply would not work with o365 exchange commands and heavy session cmdlet imports (the entire node.js process would crash). Using this module gives you full un-fettered access to the externally connected child_process, with no restrictions other than what uid/gid (permissions) the spawned process is running under (which you really have to consider from security standpoint!)

The diagram below should conceptually give you an idea of what this module does: process pooling, custom init/destroy commands, process auto-invalidation configuration and command history retention etc. See here for full details: https://github.com/bitsofinfo/stateful-process-command-proxy

Obviously this module can expose you to some insecure situations depending on how you use it… you are providing a gateway to an external process via Node on your host os! (likely a shell in most use-cases). Here are some tips; ultimately its your responsibility to secure your system.

  • Ensure that the node process is running as a user with very limited rights
  • Make use of the uid/gid configuration appropriately to further limit the processes
  • Never expose calls to this module directly, instead you should write a wrapper layer around StatefulProcessCommandProxy that protects, analyzes and sanitizes external input that can materialize in a command statement.
  • All commands you pass to execute should be sanitized to protect from injection attacks
  • Make use of the whitelist and blacklist command features of this module
  • WRAP this service via additional code that sanitizes all arguments to protect from command injection

Hopefully this will help others out there who have a similar need: https://github.com/bitsofinfo/stateful-process-command-proxy