elasticsearch, Lucene, Search

Aggregate, backup elasticsearch fs snapshots across a widely distributed cluster

One of the Elasticsearch clusters I’ve worked on is spanned across multiple data-centers around the world and stores some very large indexes. Sometimes, but not often we have the need to get a backup of one of these indexes off of the cluster for restoration onto another cluster, but due to the sheer size of these indexes, its not practical for us to snapshot it to S3 or even a shared NFS mount (as the cluster spans multiple data-centers). Therefore the local file-system “fs” snapshot type is the only one really usable for us in this scenario.. but what you end up with is parts of the snapshot distributed across individual nodes all over the world.

So there was a need for a tool to automate the task of collecting all of the individual snapshot “parts” and downloading them to a central machine. If you’ve ever looked into the actual format of an elasticsearch snapshot its a little tedious… i.e. you just can’t blindly copy over the contents of snapshot shard directory contents as ES smartly does snapshots via diffs and keeping track of what files are relevant for each snapshot in metadata files; see here for an excellent overview: https://www.found.no/foundation/elasticsearch-snapshot-and-restore/.

So in the end I came up with elasticsearch-snapshot-manager (Scala) as a tool for handling all of this (analyzing, aggregating, downloading).

This tool is intended to aid with the following scenario:

  1. You have a large elasticsearch cluster that spans multiple data-centers
  2. You have a “shared filesystem snapshot repository” who’s physical location is local to each node and actually NOT on a “shared device” or logical mountpoint (i.e due to (1) above), the snapshots reside on local-disk only.
  3. You need a way to execute the snapshot, then easily collect all the different parts of that snapshot which are located across N nodes across your cluster
  4. This tool is intended to automate that process…

Please see the github project for all the details @ https://github.com/bitsofinfo/elasticsearch-snapshot-manager , feedback appreciated.

AWS, cloud computing, Design Patterns, HTTP, PaaS, REST

Book review: Building Microservices

Screen Shot 2015-04-06 at 10.11.15 PMRecently I read Sam Newman’s “Building Microservices” , at ~280 pages its a fairly quick read. The reviews on this book overall are mixed and I can see where readers are coming from. By the title of this book one might expect some coverage of some of the microservices frameworks out there, concrete examples, maybe some actual code… but you won’t really find that here. Instead you will find a pretty good overview of various architectural approaches to modern application design in today’s world; covering general topics such a proper separation of concerns, unit-testing, continuous integration, automation, infrastructure management, service discovery, fault tolerance, high-availability and security etc.

In reality, none of the principles covered in this book are the exclusive domain of “microservice” application architectures, but rather can (and should be) applied to any application you are considering deploying; whether its a “monolithic” application or a suite of microservices interacting as parts of a larger functioning application.

In that right I think this book is definitely a good read and worth a look, if for nothing more than to ensure your team gets a refresher on good design principles and how they can be materialized with some of the newer frameworks and tool sets that have come out of our community in recent years. The material presented is sound.

hadoop, Spark, YARN

Fix: HDP, YARN, Spark “check your cluster UI to ensure that workers are registered and have sufficient resources”

Are you trying to submit a Spark job over YARN on an HDP Hadoop cluster and encounter these kinds of errors? (below)

If so just add the following 2 lines to your [spark-home]/conf/spark-defaults.conf file:

# customize for your HDP version...

spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041

 

ERRORS

You will see the errors below, stem from the root issue that occurs on an Spark Executor node where its trying to do a substitution for ${hdp.version} for which a definition variable does not exist, the above fixes that.

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

15/04/03 13:40:30 INFO yarn.YarnAllocator: Completed container container_1428072484378_0004_01_000003 (state: COMPLETE, exit status: 1)
15/04/03 13:40:30 INFO yarn.YarnAllocator: Container marked as failed: container_1428072484378_0004_01_000003. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1428072484378_0004_01_000003
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_1428072484378_0004_01_000003/launch_container.sh: line 26: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_1428072484378_0004_01_000003/launch_container.sh: line 26: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

	at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
	at org.apache.hadoop.util.Shell.run(Shell.java:455)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1</pre>


2015-04-03 13:41:27,055 INFO  container.Container (ContainerImpl.java:handle(999)) - Container container_1428072484378_0004_02_000013 transitioned from LOCALIZED to RUNNING
2015-04-03 13:41:27,068 INFO  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:buildCommandExecutor(267)) - launchContainer: [bash, /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_
1428072484378_0004_02_000013/default_container_executor.sh]
2015-04-03 13:41:27,614 WARN  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container container_1428072484378_0004_02_000013 is : 1
2015-04-03 13:41:27,614 WARN  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(229)) - Exception from container-launch with container ID: container_1428072484378_0004_02_000013 and exit code: 1
ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/admin/appcache/application_1428072484378_0004/container_1428072484378_0004_02_000013/launch_container.sh: line 26: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hado
op-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/sh
are/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framewor
k/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitut
ion

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2015-04-03 13:41:27,614 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(283)) - Exception from container-launch.
2015-04-03 13:41:27,615 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(283)) - Container id: container_1428072484378_0004_02_000013
2015-04-03 13:41:27,615 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(283)) - Exit code: 1
hadoop, NOSQL

Fix: HDP “Unauthorized connection for super-user: oozie from IP 127.0.0.1″

Recently have been playing with HortonWorks HDP 2.2. Was starting to configure some oozie workflows and when submitting the job the first step’s Hive script failed with this error and stack.


JA002: Unauthorized connection for super-user: oozie from IP 127.0.0.1

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): Unauthorized connection for super-user: oozie from IP 127.0.0.1
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy39.getDelegationToken(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:306)
... 30 more

To fix this, SSH into your HDP instance VM and edit: /etc/hadoop/conf/core-site.xml and change the following config to add “localhost”. Save and restart the relevant services or just reboot your HDP VM instances.


<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>sandbox.hortonworks.com,127.0.0.1,localhost</value>
</property>

angularjs, HTTP, Javascript, node.js, o365, powershell, windows, WS

Execute Powershell commands via Node.js, REST, AngularJS

Building on my last post on stateful-process-command-executor this post will cover how you can leverage that node.js module to expose the capabilities of Powershell cmdlets over a REST api presented through an AngularJS interface.  Why would one want to do this you ask? Well I’ve covered this in my last post but I will briefly explain it here.

(Note, what is described below could just as easily be built for Bash processes as well as the underlying module supports it)

The use case came out of the need to automate certain calls to manage various objects within Microsoft o365’s environment. Unfortunately Microsoft’s GraphAPI, does not expose all of the functionality that is available via its suite of various Powershell cmdlets for o365 services. Secondly when you need to do these operations via Powershell, its requires a per-established remote PSSession to o365…. and establishing (and tearing down) a new remote PSSession whenever you need to invoke a cmdlet against a remote resource (remote server, or o365 endpoint) is expensive. Lastly, who wants to actually sit there and manually run these commands when they could be automated and invoked on demand via other means… such as via a web-service etc. Hence this is how stateful-process-command-proxy came to be… it provides the building block bridge between node.js and a pool of pre-established Powershell consoles. Once you have node.js talking to stateful-process-command-proxy, you can build whatever you want on top of that in node.js to mediate the calls.

Layer one

The first higher level NPM module that builds on stateful-process-command-proxy is powershell-command-executor

What this adds on top of stateful-process-command-proxy is probably best described by this diagram:

 

So the main thing to understand is that the module provides the PSCommandService class which takes a registry of pre-defined “named” commands and respective permissible arguments. The registry is nothing more than a object full of configuration and is easy to define. You can see an example here in the project which defines a bunch of named “commands” and their arguments usable for o365 to manipulate users, groups etc.  PSCommandService is intended to serve as a decoupling point between the caller and the StatefulProcessCommandProxy… in other words a place where you can restrict and limit the types of commands, and arguments (sanitized) that can ever reach the Powershell processes that are pooled within StatefulProcessCommandProxy.

It is PSCommandService‘s responsibility to lookup the named command you want to execute, sanitize the arguments and generate a literal Powershell command string that is then sent to the StatefulProcessCommandProxy to be execute. StatefulProcessCommandProxy, once the command is received is responsible for checking that the command passes its command whitelist and blacklist before executing it. The sample o365Utils.js config file provides a set of pre-canned (usable) examples of init/destroy commands, auto-invalidation commands and whitelist/blacklist configs that you can use when constructing the StatefulProcessCommandProxy that the PSCommandService will use internally.

Layer two

The next logic step is to expose some sort of access to invoking these pre-canned “commands” to callers. One way to do this is via exposing it via a web-service.

WARNING: doing such a thing, without much thought can expose you to serious security risks. You need to really think about how you will secure access to this layer, the types of commands you expose, your argument sanitiziation and filtering of permissible commands via whitelists and blacklists etc for injection protection. Another precaution you may want to take is running it only on Localhost for experimental purposes only. READ OWASPs article on command injection.

Ok with that obvious warning out of the way here is the next little example project which provides this kind of layer that builds on top of the latter: powershell-command-executor-ui

This project is a simple Node.js ExpressJS app that provides a simple set of REST services that allows the caller to:

  • get all available named commands in the PSCommandService registry
  • get an individual command configuration from the registry
  • generate a command from a set of arguments
  • execute the command via a set of arguments and get the result
  • obtain the “status” of the underlying StatefulProcessCommandProxy and its history of commands

Given the above set of services, one can easily build a user-interface which dynamically lets the user invoke any command in the registry and see the results… and this is exactly what this project does via an AngularJS interface (albeit a bit crude…). See diagrams below.

Hopefully this will be useful to others out there, enjoy.

 

 

 

azure, o365, powershell, security, windows

Encrypting and storing powershell credentials

Please see: https://github.com/bitsofinfo/powershell-credential-encryption-tools

Recently I had the need to store some credentials for a powershell script (i.e. credentials that I ultimately needed in a PSCredential object). The other requirement is that these credentials be portable and “user” independent, meaning that they could not be encrypted using the DPAPI (windows data protection api) as that binds the “secret” used for the encryption to the currently logged in user (which reduces your portability and usage of these encrypted credentials). The way to avoid this is to specify the secret key parameters in the ConvertTo-SecureString and ConvertFrom-SecureString commands which will force it to use AES (strength determined by your key size)

I ended up coding a few powershell scripts that assist with the creation of a JSON AES-256 encrypted credentials file + secret key, as well as functions you can include in other powershell scripts to load these credentials into usable formats such as PSCredentials, SecureStrings etc.

Please see: https://github.com/bitsofinfo/powershell-credential-encryption-tools

NOTE! The most important thing about using the output from this tool, is properly locking down (i.e. file permissions) the secret key!

The format of the resulting file looks something like this:

{ "username" : "AESEncryptedValue", "password": "AESEncryptedValue" }