Collecting Logs In Elasticsearch With Filebeat and Logstash

You are lucky if you’ve never been involved into confrontation between devops and developers in your career on any side. In this post I’ll show a solution to an issue which is often under dispute - access to application logs in production.

The issue at hand

Imagine you are a devops responsible for running company applications in production. Applications are supported by developers who obviously don’t have access to production environment and, therefore, to production logs.

Imagine that each server runs multiple applications, and applications store logs in /var/log/apps. A server with two running applications will have log layout:

$ tree /var/log/apps
/var/log/apps
├── alice.log
└── bob.log

The problem: How to let developers access their production logs efficiently?

A solution

Feeling developers’ pain (or getting pissed off by regular “favours”), you decided to collect all application logs in Elasticsearch, where every developer can search for them. The simplest implementation would be to setup Elasticsearch and configure Filebeat to forward application logs directly to Elasticsearch.

Elasticsearch

I’ve described in details a quick intro to Elasticsearch and how to install it in my previous post. So have a look there if you don’t know how to do it.

Filebeat

Filebeat, which replaced Logstash-Forwarder some time ago, is installed on your servers as an agent. It monitors log files and can forward them directly to Elasticsearch for indexing.

Filebeat configuration which solves the problem via forwarding logs directly to Elasticsearch could be as simple as:

filebeat:
  prospectors:
    -
      paths:
        - /var/log/apps/*.log
      input_type: log

output:
  elasticsearch:
    hosts: ["localhost:9200"]

It’ll work. Developers will be able to search for log using source field, which is added by Filebeat and contains log file path.

Note that I used localhost with default port and bare minimum of settings.

If you’re paranoid about security, you have probably risen eyebrows already. Developers shouldn’t know about logs location. But this is a different story.

I bet developers will get pissed off very soon with this solution. They have to do term search with full log file path or they risk receiving non-related records from logs with similar partial name. The problem is aggravated if you run applications inside Docker containers managed by Mesos or Kubernetes.

A better solution

A better solution would be to introduce one more step. Instead of sending logs directly to Elasticsearch, Filebeat should send them to Logstash first. Logstash will enrich logs with metadata to enable simple precise search and then will forward enriched logs to Elasticsearch for indexing.

Logstash

Logstash is the best open source data collection engine with real-time pipelining capabilities. Logstash can cleanse logs, create new fields by extracting values from log message and other fields using very powerful extensible expression language and a lot more.

Introduction of a new app field, bearing application name extracted from source field, would be enough to solve the problem.

Final configuration

Filebeat configuration will change to

filebeat:
  prospectors:
    -
      paths:
        - /var/log/apps/*.log
      input_type: log

output:
  logstash:
    hosts: ["localhost:5044"]

and Logstash configuration will look like

input {
    beats {
        port => "5044"
    }
}

filter {
    grok {
        match => { "source" => "%{GREEDYDATA}/%{GREEDYDATA:app}.log" }
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
    }
}

Both configuration files are self-explanatory. The only snippet deserving explanation is:

grok {
    match => { "source" => "%{GREEDYDATA}/%{GREEDYDATA:app}.log" }
}

If source field has value “/var/log/apps/alice.log”, the match will extract word alice and set it as value of newly created field app.

The bottom line

The final solution is way better. Developers can run exact term queries on app field, e.g:

$ curl http://localhost:9200/_all/_search?q=app:bob&sort=@tymestamp:asc&sort=offset:asc&fields=message&pretty | grep message

with output

        "message" : [ "Bob message 1" ]
        "message" : [ "Bob message 2" ]
        "message" : [ "Bob message 3" ]
        "message" : [ "Bob message 4" ]
        "message" : [ "Bob message 5" ]
        "message" : [ "Bob message 6" ]
        "message" : [ "Bob message 7" ]
        "message" : [ "Bob message 8" ]
        "message" : [ "Bob message 9" ]
        "message" : [ "Bob message 10" ]

I hope you guested it was a joke. Install Kibana for log browsing to make developers ecstatic.