Collecting Logs In Elasticsearch With Filebeat and Logstash
You are lucky if you’ve never been involved into confrontation between devops and developers in your career on any side. In this post I’ll show a solution to an issue which is often under dispute - access to application logs in production.
The issue at hand
Imagine you are a devops responsible for running company applications in production. Applications are supported by developers who obviously don’t have access to production environment and, therefore, to production logs.
Imagine that each server runs multiple applications, and applications store logs in /var/log/apps
. A server with two running applications will have log layout:
$ tree /var/log/apps
/var/log/apps
├── alice.log
└── bob.log
The problem: How to let developers access their production logs efficiently?
A solution
Feeling developers’ pain (or getting pissed off by regular “favours”), you decided to collect all application logs in Elasticsearch, where every developer can search for them. The simplest implementation would be to setup Elasticsearch and configure Filebeat to forward application logs directly to Elasticsearch.
Elasticsearch
I’ve described in details a quick intro to Elasticsearch and how to install it in my previous post. So have a look there if you don’t know how to do it.
Filebeat
Filebeat, which replaced Logstash-Forwarder some time ago, is installed on your servers as an agent. It monitors log files and can forward them directly to Elasticsearch for indexing.
Filebeat configuration which solves the problem via forwarding logs directly to Elasticsearch could be as simple as:
filebeat:
prospectors:
-
paths:
- /var/log/apps/*.log
input_type: log
output:
elasticsearch:
hosts: ["localhost:9200"]
It’ll work. Developers will be able to search for log using source
field, which is added by Filebeat and contains log file path.
Note that I used localhost with default port and bare minimum of settings.
If you’re paranoid about security, you have probably risen eyebrows already. Developers shouldn’t know about logs location. But this is a different story.
I bet developers will get pissed off very soon with this solution. They have to do term search with full log file path or they risk receiving non-related records from logs with similar partial name. The problem is aggravated if you run applications inside Docker containers managed by Mesos or Kubernetes.
A better solution
A better solution would be to introduce one more step. Instead of sending logs directly to Elasticsearch, Filebeat should send them to Logstash first. Logstash will enrich logs with metadata to enable simple precise search and then will forward enriched logs to Elasticsearch for indexing.
Logstash
Logstash is the best open source data collection engine with real-time pipelining capabilities. Logstash can cleanse logs, create new fields by extracting values from log message and other fields using very powerful extensible expression language and a lot more.
Introduction of a new app
field, bearing application name extracted from source
field, would be enough to solve the problem.
Final configuration
Filebeat configuration will change to
filebeat:
prospectors:
-
paths:
- /var/log/apps/*.log
input_type: log
output:
logstash:
hosts: ["localhost:5044"]
and Logstash configuration will look like
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "source" => "%{GREEDYDATA}/%{GREEDYDATA:app}.log" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
}
Both configuration files are self-explanatory. The only snippet deserving explanation is:
grok {
match => { "source" => "%{GREEDYDATA}/%{GREEDYDATA:app}.log" }
}
If source
field has value “/var/log/apps/alice.log”, the match will extract word alice and set it as value of newly created field app
.
The bottom line
The final solution is way better. Developers can run exact term queries on app
field, e.g:
$ curl http://localhost:9200/_all/_search?q=app:bob&sort=@tymestamp:asc&sort=offset:asc&fields=message&pretty | grep message
with output
"message" : [ "Bob message 1" ]
"message" : [ "Bob message 2" ]
"message" : [ "Bob message 3" ]
"message" : [ "Bob message 4" ]
"message" : [ "Bob message 5" ]
"message" : [ "Bob message 6" ]
"message" : [ "Bob message 7" ]
"message" : [ "Bob message 8" ]
"message" : [ "Bob message 9" ]
"message" : [ "Bob message 10" ]
I hope you guested it was a joke. Install Kibana for log browsing to make developers ecstatic.