Tagged: logging

Logstash: Failed to flush outgoing items UndefinedConversionError workaround

If you have ever seen an error similar to this in Logstash it can be frustrating and can take your whole pipeline down (blocks). It appears that there are some outstanding tickets on this, one of which is here. This error can occur if you have an upstream input where the charset is defined as US-ASCII (such as from an ASCII file input), where that file contains some extended chars, and then being sent to an output where it needs to be converted (in my case it was elasticsearch_http). This was w/ Logstash 1.4.1

:message=>"Failed to flush outgoing items", :outgoing_count=>1,
:exception=>#<Encoding::UndefinedConversionError: ""\xC2"" from ASCII-8BIT to UTF-8>

For anyone else out there, here is a simple workaround fix I put in my filters which pre-processes messages from “known” upstream ASCII inputs. This fixed it for me: (using the Ruby code filter)

ruby {
code => "begin; if !event['message'].nil?; event['message'] = event['message'].force_encoding('ASCII-8BIT').encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '?'); end; rescue; end;"
}

To create a little log file to reproduce this try:

perl -e 'print "\xc2\xa0\n"' > test.log
Advertisements

Processing ModSecurity audit logs with Fluentd

Recently had a need to take tons of raw ModSecurity audit logs and make use of them. First used Logstash and then attempted with Apache Flume (see previous articles). Next in line was Fluentd which is what this article is about, long story short I ended up just having to write a Fluentd output plugin to take the output from the tail multiline plugin and then format it into a more structured first class object that looks like the below example.

The Modsecurity Fluentd plugin is located here on Github: https://github.com/bitsofinfo/fluentd-modsecurity

  1. Get some audit logs generated from modsecurity and throw them into a directory
  2. Edit your fluentd config file and customize its input to use the tail multiline plugin and then the modsecurity plugin, an example is here.
  3. Customize your output(s)
  4. On the command line:  “fluentd ./fluent.conf -vv” 

This was tested against the latest version of Fluentd available at the time of this article.

The end result of it is that with this configuration, your raw Modsec audit log entries, will end up looking something like this JSON example below. Again this is just how I ended up structuring the fields via the filters, you can fork and modify the plugin as you see fit to output a different format, or even make it more configurable

EXAMPLE JSON OUTPUT, using https://github.com/bitsofinfo/fluentd-modsecurity

{
"modsec_timestamp":"08/Nov/2013:06:22:59 --0400",
"uniqueId":"C5g8kkk0002012221222",
"sourceIp":"192.168.1.22",
"sourcePort":"34156",
"destIp":"192.168.0.2",
"destPort":"80",
"httpMethod":"GET",
"requestedUri":"/myuri/x",
"incomingProtocol":"HTTP/1.1",
"myCookie":"myCookie=testValue",
"requestHeaders":{
"Host":"accts.x4.bitsofinfo2.com",
"Connection":"keep-alive",
"Accept":"*/*",
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) Safari/537.36",
"Referer":"https",
"Accept-Encoding":"gzip,deflate,sdch",
"Accept-Language":"en-US,en;q=0.8",
"Cookie":"myCookie=testValue; myapp_sec_7861ac9196050da; special=ddd",
"Incoming-Protocol":"HTTPS",
"X-Forwarded-For":"192.1.33.22"
},
"XForwardedFor":"192.1.33.22",
"XForwardedFor-GEOIP":{
      "country_code":"TW",
      "country_code3":"TWN",
      "country_name":"Taiwan",
      "region":"03",
      "region_name":"T'ai-pei",
      "city":"Taipei",
      "latitude":25.039199829101562,
      "longitude":121.5250015258789
   },
"serverProtocol":"HTTP/1.1",
"responseStatus":"200 OK",
"responseHeaders":{
"Vary":"Accept-Encoding",
"Expires":"Fri, 08 Aug 2014 10",
"Cache-Control":"public, max-age=31536000",
"Content-Encoding":"deflate",
"Content-Type":"application/x-javascript; charset=UTF-8",
"Set-Cookie":"zippy=65.sss31; path=/; domain=accts.x4.bitsofinfo2.com",
"Connection":"close",
"Transfer-Encoding":"chunked"
},
"auditLogTrailer":{
"Apache-Handler":"proxy-server",
"Stopwatch":"1375957379601874 39178 (989 4992 -)",
"Producer":"ModSecurity for Apache (http://www.modsecurity.org/); core ruleset",
"Server":"Apache (party6)",
"messages":[
{
"info":"Warning 1. Operator EQ matched 0 at GLOBAL.",
"file":"/etc/d4/modsechttp_policy.conf",
"line":"120",
"id":"960903",
"msg":"ModSecurity does not support content encodings",
"severity":"WARNING"
},
{
"info":"Warning 2. Operator EQ matched 0 at GLOBAL.",
"file":"/etc/d4/modsechttp_policy.conf",
"line":"120",
"id":"960903",
"msg":"ModSecurity does not support content encodings",
"severity":"WARNING"
}
]
},
"event_date_microseconds":1.375957379601874e+15,
"event_date_milliseconds":1375957379601.874,
"event_date_seconds":1375957379.601874,
"event_timestamp":"2013-08-08T10:22:59.601Z",
"secRuleIds":[
"960011",
"960904",
"960903"
],
"matchedRules":[
"SecRule \"REQUEST_METHOD\" \"@rx ^(?:GET|HEAD)$\" ",
"SecRule \"&REQUEST_HEADERS:Content-Type\" \"@eq 0\" \"phase:2,deny,status:406,t:lo",
"SecRule \"REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Ref",
"SecAction \"phase:2,status:406,t:lowercase,t:replaceNulls,t:compres",
"SecRule \"&GLOBAL:alerted_960903_compression\" \"@eq 0\" \"phase:2,log,deny,status:406,t:lower"
]
}

Deserializing Modsecurity Audit logs with Apache Flume

This post will be updated in the coming days/weeks, however when looking at using Apache Flume to ingest some ModSecurity Audit logs, it quickly became apparent that Flume’s SpoolingDirectorySource lacked the ability to de-serialized “events” from a file that spanned many “new lines” (\n). Lacking this support, and seeing that an outstanding ticket already existed on a related subject at https://issues.apache.org/jira/browse/FLUME-1988 I went ahead and coded one up.

Please see RegexDelimiterDeSerializer and its corresponding unit test attached to FLUME-1988. Hopefully this can be included in an actual Flume release. In the meantime you should be able to include this and the related classes in a local copy of the Flume source and do your own build to get this functionality. The net result of using this regex patch is that each ModSecurity audit log entry (that spans many lines) will be summarized into *one* flume message. What you do next is up to you, however the next best thing is to pump this into the Flume Morphline Interceptor to then begin grokking and parsing the raw multi-lined modsec event. Note there are some possible synergies and re-use of regexes once you start using Morphlines and the Grok patterns we came up with for use with my Logstash based solution.

a) clone the official Flume source code

b) Drop in the files attached to FLUME-1988 into your cloned source of Flume

c) Follow the instructions located here to modify the source so that you can have a flume snapshot distro, that contains all the dependencies for Morphline (https://groups.google.com/a/cloudera.org/d/msg/cdk-dev/7T4pTebdWN4/sBHGkoS70LkJ)

d) From the root of the flume project run “mvn install -DskipTests=true” and take the tarball generated in “flume-ng-dist/target” and copy it somewhere else. (this is the freshly built Flume dist w/ the regex deserializer support)

e) Go to where you extracted the distro, widdle up your own flume config file and morphline config using the snippets below and then run “bin/flume-ng agent –conf conf –conf-file conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent”

Here is a sample flume config snippet that uses this:


agent.sources = src1
agent.channels = memoryChannel
agent.sinks = loggerSink

# For each one of the sources, the type is defined
agent.sources.src1.type = spooldir
agent.sources.src1.channels = memoryChannel
agent.sources.src1.spoolDir = /path/to/my_modsec_logs
agent.sources.src1.deserializer = REGEX
agent.sources.src1.deserializer.outputCharset = UTF-8
agent.sources.src1.deserializer.eventEndRegex = --[a-fA-F0-9]{8}-Z--
agent.sources.src1.deserializer.includeEventEndRegex = true

agent.sources.src1.interceptors = morphlineinterceptor
agent.sources.src1.interceptors.morphlineinterceptor.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
agent.sources.src1.interceptors.morphlineinterceptor.morphlineFile = /path/to/conf/morphline.conf
agent.sources.src1.interceptors.morphlineinterceptor.morphlineId = morphline1

 

Next is a sample “morphline.conf” configuration which will just emit each ModSecurity message from the audit log to standard out when running Flume. You can do the rest from there (have fun parsing). Please refer to the following morphlines documentation:

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.**"]

    commands : [
      {
        readMultiLine {
          regex: ".*"
          charset : UTF-8
        }
      }

      # log the record at DEBUG level to SLF4J
      { logDebug { format : "output record: {}", args : ["@{}"] } }

    ]
  }
]