Make sense of your logs with Elastic Search - part 3: reading log lines with Filebeat

2019-04-06

In the previous parts of this series, we've discovered what Elastic Search is and how it can help us analyzing our logs.

In the previous part, we've downloaded and installed the necessary software. Now, the time has come to configure the components, starting with: Filebeat.

  • Make sense of your logs with Elastic Search - part 3: reading log lines with Filebeat

What is Filebeat, again?

Filebeat is a component that checks your log files for new lines. When it finds new lines, Filebeat will send these lines to the next component, which is Logstash, in our case.

Configuring Filebeat

Filebeat can be configured with a single file: filebeat.yml.

This file can be found in my example Git repo in the filebeat folder.

An example:

    filebeat: 
      registry_file: /usr/share/filebeat/data/.registry 
      inputs: 
        - type: log 
          enabled: true 
          paths: 
            - /app/logs/*/*.log 
            - /opt/jboss/wildfly/standalone/log/*.log 
          multiline.pattern: "^([0-9]{4}-[0-9]{2}-[0-9]{2})" 
          multiline.negate: true
          multiline.match: after 
          fields: 
            application: UnionVMS 
            environment: production 
    output: 
      logstash: 
        hosts: ["logstash:5044"] 

filebeat.registry_file

The registry file keeps record of which log lines are already sent to Logstash, so that Filebeat can determine which log lines are new. We'll get back to this, later in this blogpost.

You can define the location and file name with this property.

filebeat.inputs

An input is a definition of which files to keep track of, and what to do with them.

Paths

In the “paths” section, you need to define a list of your log files. You can use * as a wildcard.

Example:

    ...     
      inputs: 
          ... 
          paths: 
            - /app/logs/*/*.log 
            - /opt/jboss/wildfly/standalone/log/*.log 
    ...

Multiline logs

Filebeat will send every line separately to Logstash.

2018-04-13 19:14:19 INFO e.e.e.f.u.e.s.bean.PluginServiceBean - ConfigModule updated parameter table with settings of plugins
2018-04-13 19:14:19 INFO e.e.e.f.u.e.s.bean.ExchangeServiceBean - Upsert settings in service layer: eu.europa.ec.fisheries.uvms.plugins.inmarsat SettingListType[setting=[SettingType[key=eu.europa.ec.fisheries.uvms.plugins.inmarsat.USERNAME,value=USERNAME]]] UVMS
2018-04-13 19:14:19 INFO e.e.e.f.u.e.b.ServiceRegistryModelBean - Update plugin settings for eu.europa.ec.fisheries.uvms.plugins.inmarsat
2018-04-13 19:14:19 INFO e.e.e.f.u.e.m.c.bean.MessageConsumerBean - ExchangeBaseRequest received in Exchange Message MDB:null

But... what in case of an exception?

2018-04-13 19:11:01 ERROR eu.europa.ec.mare.usm.jwt.jndiUtil - Could not bind value to key USM/secretKey
java.lang.RuntimeException: javax.naming.NamingException: Failed to bind [secret] at location [service jboss.naming.context.java.USM.secretKey] [Root exception is org.jboss.msc.service.DuplicateServiceException: Service jboss.naming.context.java.USM.secretKey is already registered]
at eu.europa.ec.mare.usm.jwt.jndiUtil.bindParam(jndiUtil.java:108)
at eu.europa.ec.mare.usm.jwt.jndiUtil.createJNDI(jndiUtil.java:23)
at eu.europa.ec.mare.usm.jwt.DefaultJwtTokenHandler.initKey(DefaultJwtTokenHandler.java:264)
at eu.europa.ec.mare.usm.jwt.DefaultJwtTokenHandler.init(DefaultJwtTokenHandler.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

So, in the front-end, a line "at eu.europa.ec.mare.usm.jwt.DefaultJwtTokenHandler.initKey(DefaultJwtTokenHandler.java:264)" will pop up, without any other context. What? What does that mean?

Long story short: we want to see the complete stack trace in the front-end, not just a part of it.

2018-04-13 19:11:01 ERROR eu.europa.ec.mare.usm.jwt.jndiUtil - Could not bind value to key USM/secretKey java.lang.RuntimeException: javax.naming.NamingException: Failed to bind [secret] at location [service jboss.naming.context.java.USM.secretKey] [Root exception is org.jboss.msc.service.DuplicateServiceException: Service jboss.naming.context.java.USM.secretKey is already registered] at eu.europa.ec.mare.usm.jwt.jndiUtil.bindParam(jndiUtil.java:108) at eu.europa.ec.mare.usm.jwt.jndiUtil.createJNDI(jndiUtil.java:23) at eu.europa.ec.mare.usm.jwt.DefaultJwtTokenHandler.initKey(DefaultJwtTokenHandler.java:264) at eu.europa.ec.mare.usm.jwt.DefaultJwtTokenHandler.init(DefaultJwtTokenHandler.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) java.lang.RuntimeException: javax.naming.NamingException: Failed to bind [secret] at location [service jboss.naming.context.java.USM.secretKey] [Root exception is org.jboss.msc.service.DuplicateServiceException: Service jboss.naming.context.java.USM.secretKey is already registered]

How can we explain to Filebeat that it should consider stack traces as one log line? What can Filebeat use to find the end of a log?

We can do that with regular expressions.

Example

Every log starts with a date. If a line does not start with a date, consider it to be a part of the previous log line

    inputs: 
    ...
          multiline.pattern: "^([0-9]{4}-[0-9]{2}-[0-9]{2})" 
          multiline.negate: true 
          multiline.match: after
    ...

Wait, what does “multiline.negate” and “multiline.match” do? A very good explanation, with examples, can be found in this guide.

Fields

Let’s say you've pumped logs of multiple environments (test, acceptance and production) in Elastic Search.

A customer sends you an email: “I’ve had a problem yesterday”. Ok, let’s see what the logs say. You open Kibana (the front-end of Elastic Search, we’ll get to that part later) and filter on the logs of yesterday. You get the logs of all enviroments, all mixed up. Oops.

We need to be able to filter on the environment name. Each environment has its own log directory. Based on the log directory, we can say: these are the logs for application x in environment y.

In order to make this distinction, we can add an extra field for each input:

      inputs: 
        - type: log 
          paths: 
            - /logs/unionvms/production/* 
          fields: 
            application: UnionVMS 
            environment: production 
    - type: log 
          paths: 
            - /logs/unionvms/acceptance/* 
          fields: 
            application: UnionVMS 
            environment: acceptance 

So there we have it. In the front-end, we can now filter on “fields.environment”!

output.logstash

Here you define where Filebeat should send the log lines to.

Since I use Docker, the app runs in a Docker network and the host name is equal to the container name.

Not using Docker? Fill in the name of the actual host which you Logstash runs on. Are Filebeat and Logstash on the same pc? Use localhost.

How does Filebeat know which logs are new?

Filebeat remembers which lines were sent to Logstash. This information is saved in the registry file.

If a file changes, Filebeat will detect this and rescan all log lines in that file. Only new log lines will be sent to Logstash.

Where can I find this registry file?

It depends on which environment Filebeat runs. The good news: you can define the location and name of the registry file yourself.

    filebeat: 
      registry_file: /usr/share/filebeat/data/.registry 

In the Docker example, I’ve configured Filebeat to save the registry file in a folder that is mapped by a Docker volume, so that it doesn’t get lost when the container restarts.

What is my log file gets rotated?

When a log gets a certain size, or when 24 hours have passed, most logging frameworks decide to start a new log file. The old log file gets renamed, often by appending a timestamp to the name. This is called log rotation.

For example:

    logs.txt (current log file) 
    logs_2019-02-27_01.txt (first 10 MB of the logs at 27 February 2019) 
    logs_2019-02-27_02.txt (next bunch of logs at 27 February 2019)

If your file gets rotated, the file gets a new name but its contents don’t change. Filebeat will detect that the file was already harvested, so logs don’t get sent twice.

You can find more info here.

Now it's time for Logstash to interpret the log lines

We've configured Filebeat to send all the log lines, including stracktraces and extra fields, to Logstash. The next step is to configure Logstash to interpret the log lines, in part 4 of this blogpost.

Me and my son

Stijn Hooft

Software engineer, specialized in Java and web technologies

Mail LinkedIn Github