Category: Search

Aggregate, backup elasticsearch fs snapshots across a widely distributed cluster

One of the Elasticsearch clusters I’ve worked on is spanned across multiple data-centers around the world and stores some very large indexes. Sometimes, but not often we have the need to get a backup of one of these indexes off of the cluster for restoration onto another cluster, but due to the sheer size of these indexes, its not practical for us to snapshot it to S3 or even a shared NFS mount (as the cluster spans multiple data-centers). Therefore the local file-system “fs” snapshot type is the only one really usable for us in this scenario.. but what you end up with is parts of the snapshot distributed across individual nodes all over the world.

So there was a need for a tool to automate the task of collecting all of the individual snapshot “parts” and downloading them to a central machine. If you’ve ever looked into the actual format of an elasticsearch snapshot its a little tedious… i.e. you just can’t blindly copy over the contents of snapshot shard directory contents as ES smartly does snapshots via diffs and keeping track of what files are relevant for each snapshot in metadata files; see here for an excellent overview:

So in the end I came up with elasticsearch-snapshot-manager (Scala) as a tool for handling all of this (analyzing, aggregating, downloading).

This tool is intended to aid with the following scenario:

  1. You have a large elasticsearch cluster that spans multiple data-centers
  2. You have a “shared filesystem snapshot repository” who’s physical location is local to each node and actually NOT on a “shared device” or logical mountpoint (i.e due to (1) above), the snapshots reside on local-disk only.
  3. You need a way to execute the snapshot, then easily collect all the different parts of that snapshot which are located across N nodes across your cluster
  4. This tool is intended to automate that process…

Please see the github project for all the details @ , feedback appreciated.


Logstash for ModSecurity audit logs

Recently had a need to take tons of raw ModSecurity audit logs and make use of them. Ended up using Logstash as a first stab attempt to get them from their raw format into something that could be stored in something more useful like a database or search engine. Nicely enough, out of the box, Logstash has an embeddable ElasticSearch instance that Kibana can hook up to.  After configuring your Logstash inputs, filters and outputs, you can be querying your log data in no time…. that is assuming writing the filters for your log data takes you close to “no time” which is not the case with modsec’s more challenging log format.

After searching around for some ModSecurity/Logstash examples, and finding only this one (for modsec entries in the apache error log), I was facing the task of having to write my own to deal w/ the modsecurity audit log format…..arrggg!

So after a fair amount of work, I ended up having a Logstash configuration that works for me… hopefully this will be of use to others out there as well. Note, this is certainly not perfect but is intended to serve as an example and starting point for anyone who is looking for that.

The Modsecurity Logstash configuration file (and tiny pattern file) is located here on Github:

  1. Get some audit logs generated from modsecurity and throw them into a directory
  2. Edit the logstash modsecurity config file ( and customize its file input path to point to your logs from step (1)
  3. Customize the output(s) and review the various filters
  4. On the command line:  java -jar logstash-[version]-flatjar.jar agent -v -f  logstash_modsecurity.conf

This was tested against Logstash v1.2.1 through 1.4.2 and relies heavily on Logstash’s “ruby” filter capability which really was a lifesaver to be able to workaround some bugs and lack of certain capabilities Logstash’s in growing set of filters. I’m sure as Logstash grows, much of what the custom ruby filters do can be changed over time.

The end result of it is that with this configuration, your raw Modsec audit log entries, will end up looking something like this JSON example below. Again this is just how I ended up structuring the fields via the filters. You can take the above configuration example and change your output to your needs.

Also note that ModSecurity Audit logs can definitely contains some very sensitive data (like user passwords etc). So you might want to also take a look at using Logstash’s Cipher filter to encrypt certain message fields in transit if you are sending these processed logs somewhere else:

EXAMPLE JSON OUTPUT, using this Logstash configuration

  "@timestamp": "2013-09-17T09:46:16.088Z",
  "@version": "1",
  "host": "razzle2",
  "path": "\/Users\/bof\/who2\/zip4n\/logstash\/modseclogs\/proxy9\/modsec_audit.log.1",
  "tags": [
  "rawSectionA": "[17\/Sep\/2013:05:46:16 --0400] MSZkdwoB9ogAAHlNTXUAAAAD 65183 80",
  "rawSectionB": "POST \/xml\/rpc\/soapservice-v2 HTTP\/1.1\nContent-Type: application\/xml\nspecialcookie: tb034=\nCache-Control: no-cache\nPragma: no-cache\nUser-Agent: Java\/1.5.0_15\nHost:\nAccept: text\/html, image\/gif, image\/jpeg, *; q=.2, *\/*; q=.2\nConnection: keep-alive\nContent-Length: 93\nIncoming-Protocol: HTTPS\nab0044: 0\nX-Forwarded-For:",
  "rawSectionC": "{\"id\":2,\"method\":\"report\",\"stuff\":[\"\",\"X22322mkf3\"],\"xmlrpm\":\"0.1a\"}",
  "rawSectionF": "HTTP\/1.1 200 OK\nX-SESSTID: 009nUn4493\nContent-Type: application\/xml;charset=UTF-8\nContent-Length: 76\nConnection: close",
  "rawSectionH": "Message: Warning. Match of \"rx (?:^(?:application\\\\\/x-www-form-urlencoded(?:;(?:\\\\s?charset\\\\s?=\\\\s?[\\\\w\\\\d\\\\-]{1,18})?)??$|multipart\/form-data;)|text\/xml)\" against \"REQUEST_HEADERS:Content-Type\" required. [file \"\/opt\/niner\/modsec2\/pp7.conf\"] [line \"69\"] [id \"960010\"] [msg \"Request content type is not allowed by policy\"] [severity \"WARNING\"] [tag \"POLICY\/ENCODING_NOT_ALLOWED\"]\nApache-Handler: party-server-time2\nStopwatch: 1379411176088695 48158 (1771* 3714 -)\nProducer: ModSecurity for Apache\/2.7 (http:\/\/\/); core ruleset\/1.9.2.\nServer: Whoisthat\/v1 (Osprey)",
  "modsec_timestamp": "17\/Sep\/2013:05:46:16 --0400",
  "uniqueId": "MSZkdwoB9ogAAHlNTXUAAAAD",
  "sourceIp": "",
  "sourcePort": "65183",
  "destIp": "",
  "destPort": "80",
  "httpMethod": "POST",
  "requestedUri": "\/xml\/rpc\/soapservice-v2",
  "incomingProtocol": "HTTP\/1.1",
  "requestBody": "{\"id\":2,\"method\":\"report\",\"stuff\":[\"\",\"X22322mkf3\"],\"xmlrpm\":\"0.1a\"}",
  "serverProtocol": "HTTP\/1.1",
  "responseStatus": "200 OK",
  "requestHeaders": {
    "Content-Type": "application\/xml",
    "specialcookie": "8jj220021kl==j2899IuU",
    "Cache-Control": "no-cache",
    "Pragma": "no-cache",
    "User-Agent": "Java\/1.5.1_15",
    "Host": "",
    "Accept": "text\/html, image\/gif, image\/jpeg, *; q=.2, *\/*; q=.2",
    "Connection": "keep-alive",
    "Content-Length": "93",
    "Incoming-Protocol": "HTTPS",
    "ab0044": "0",
    "X-Forwarded-For": ""
  "responseHeaders": {
    "X-SESSTID": "009nUn4493",
    "Content-Type": "application\/xml;charset=UTF-8",
    "Content-Length": "76",
    "Connection": "close"
  "auditLogTrailer": {
    "Apache-Handler": "party-server-time2",
    "Stopwatch": "1379411176088695 48158 (1771* 3714 -)",
    "Producer": "ModSecurity for Apache\/2.7 (http:\/\/\/); core ruleset\/1.9.2.",
    "Server": "Whoisthat\/v1 (Osprey)",</pre>
"messages": [ { "info": "Warning. Match of \"rx (?:^(?:application\\\\\/x-www-form-urlencoded(?:;(?:\\\\s?charset\\\\s?=\\\\s?[\\\\w\\\\d\\\\-]{1,18})?)??$|multipart\/form-data;)|text\/xml)\" against \"REQUEST_HEADERS:Content-Type\" required.", "file": "\/opt\/niner\/modsec2\/pp7.conf", "line": "69", "id": "960010", "msg": "Request content type is not allowed by policy", "severity": "WARNING", "tag": "POLICY\/ENCODING_NOT_ALLOWED" } ] }, "event_date_microseconds": 1.3794111760887e+15, "event_date_milliseconds": 1379411176088.7, "event_date_seconds": 1379411176.0887, "event_timestamp": "2013-09-17T09:46:16.088Z", "XForwardedFor-GEOIP": { "ip": "", "country_code2": "XZ", "country_code3": "BRZ", "country_name": "Brazil", "continent_code": "SA", "region_name": "12", "city_name": "Vesper", "postal_code": "", "timezone": "Brazil\/Continental", "real_region_name": "Region Metropolitana" }, "matchedRules": [ "SecRule \"REQUEST_METHOD\" \"@rx ^POST$\" \"phase:2,status:400,t:lowercase,t:replaceNulls,t:compressWhitespace,chain,t:none,deny,log,auditlog,msg:'POST request must have a Content-Length header',id:960022,tag:PROTOCOL_VIOLATION\/EVASION,severity:4\"", "SecRule \"REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:\/*|!REQUEST_HEADERS:Referer\" \"@pm jscript onsubmit onchange onkeyup activexobject vbscript: <![cdata[ http: settimeout onabort shell: .innerhtml onmousedown onkeypress asfunction: onclick .fromcharcode background-image: .cookie onunload createtextrange onload <input\" \"phase:2,status:406,t:lowercase,t:replaceNulls,t:compressWhitespace,t:none,t:urlDecodeUni,t:htmlEntityDecode,t:compressWhiteSpace,t:lowercase,nolog,skip:1\"", "SecAction \"phase:2,status:406,t:lowercase,t:replaceNulls,t:compressWhitespace,nolog,skipAfter:950003\"", "SecRule \"REQUEST_HEADERS|XML:\/*|!REQUEST_HEADERS:'\/^(Cookie|Referer|X-OS-Prefs)$\/'|REQUEST_COOKIES|REQUEST_COOKIES_NAMES\" \"@pm gcc g++\" \"phase:2,status:406,t:lowercase,t:replaceNulls,t:compressWhitespace,t:none,t:urlDecodeUni,t:htmlEntityDecode,t:lowercase,nolog,skip:1\"", ], "secRuleIds": [ "960022", "960050" ] }