Deserializing Modsecurity Audit logs with Apache Flume

This post will be updated in the coming days/weeks, however when looking at using Apache Flume to ingest some ModSecurity Audit logs, it quickly became apparent that Flume’s SpoolingDirectorySource lacked the ability to de-serialized “events” from a file that spanned many “new lines” (\n). Lacking this support, and seeing that an outstanding ticket already existed on a related subject at I went ahead and coded one up.

Please see RegexDelimiterDeSerializer and its corresponding unit test attached to FLUME-1988. Hopefully this can be included in an actual Flume release. In the meantime you should be able to include this and the related classes in a local copy of the Flume source and do your own build to get this functionality. The net result of using this regex patch is that each ModSecurity audit log entry (that spans many lines) will be summarized into *one* flume message. What you do next is up to you, however the next best thing is to pump this into the Flume Morphline Interceptor to then begin grokking and parsing the raw multi-lined modsec event. Note there are some possible synergies and re-use of regexes once you start using Morphlines and the Grok patterns we came up with for use with my Logstash based solution.

a) clone the official Flume source code

b) Drop in the files attached to FLUME-1988 into your cloned source of Flume

c) Follow the instructions located here to modify the source so that you can have a flume snapshot distro, that contains all the dependencies for Morphline (

d) From the root of the flume project run “mvn install -DskipTests=true” and take the tarball generated in “flume-ng-dist/target” and copy it somewhere else. (this is the freshly built Flume dist w/ the regex deserializer support)

e) Go to where you extracted the distro, widdle up your own flume config file and morphline config using the snippets below and then run “bin/flume-ng agent –conf conf –conf-file conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent”

Here is a sample flume config snippet that uses this:

agent.sources = src1
agent.channels = memoryChannel
agent.sinks = loggerSink

# For each one of the sources, the type is defined
agent.sources.src1.type = spooldir
agent.sources.src1.channels = memoryChannel
agent.sources.src1.spoolDir = /path/to/my_modsec_logs
agent.sources.src1.deserializer = REGEX
agent.sources.src1.deserializer.outputCharset = UTF-8
agent.sources.src1.deserializer.eventEndRegex = --[a-fA-F0-9]{8}-Z--
agent.sources.src1.deserializer.includeEventEndRegex = true

agent.sources.src1.interceptors = morphlineinterceptor
agent.sources.src1.interceptors.morphlineinterceptor.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
agent.sources.src1.interceptors.morphlineinterceptor.morphlineFile = /path/to/conf/morphline.conf
agent.sources.src1.interceptors.morphlineinterceptor.morphlineId = morphline1


Next is a sample “morphline.conf” configuration which will just emit each ModSecurity message from the audit log to standard out when running Flume. You can do the rest from there (have fun parsing). Please refer to the following morphlines documentation:

morphlines : [
    id : morphline1
    importCommands : ["com.cloudera.**"]

    commands : [
        readMultiLine {
          regex: ".*"
          charset : UTF-8

      # log the record at DEBUG level to SLF4J
      { logDebug { format : "output record: {}", args : ["@{}"] } }


One thought on “Deserializing Modsecurity Audit logs with Apache Flume

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s