Happy New Lunar Year 2019

February 5, 2019

This blog post is a little treat for all geeks. In short, we will introduce logstash to ingest data into Elasticsearch. The occasion is to celebrate today the Lunar New Year or Chinese New Year. 2019 is the Year of the Pig. In Chinese culture, pigs are the symbol of wealth. Their round faces and big ears are signs of fortune as well. The common belief is that pigs are blessed with luck and have a beautiful personality. See below the fortune output for the year of the pig.

Fortune with pig

Real and digital fortune cookies

If it comes to luck, there is a great misconception about fortune cookies in the Asian culture. Fortune cookies are often served as a dessert in Chinese restaurants in the United States and other Western countries but are not a tradition in Asia. A served cookie contains a fortune with some wisdom or vague prophecy. Since Unix, the fortune application does the same on the terminal. Fortune displays a random message from a database of quotations. I have created a database set for geeks, which holds some favourite tweets from Programming Wisdom. See below a quote.

Fortune with wisdom

Example Data

The database for fortune is a simple text file, which has these format rules:

We take a small data excerpt for our logstash demo:

Elasticsearch Storage

Logstash is like a swiss knife. You can pre-process nearly any kind of data for Elasticsearch. In Elasticsearch, we want to store the content and the person. Having the data in Elasticsearch, we can search for wisdom if needed for example in a presentation. We could tweak the text analysing, but we are going to deal with that in a separate future blog post.

To store the data, you can provide logstash with an index template.

We choose to store analysed text only for the quotes and for the person we want the default multi-field approach. We may search for names but might also want to do aggregation on persons.

Logstash configuration

Logstash allows to have three sections:

input

As input, we take the file. We apply the multiline codec and configure to treat each line, that belongs to the beginning entry with % as one event.

filter

The filter section contains the commands to clean up the data and separate them into the fields quote and person.

output

After the filter section, Logstash returns, for instance, this data:

{
          "tags" => [
        [0] "multiline"
    ],
          "path" => "/home/vinh/development/projects/geek-fortune-cookies/databases/programming-wisdom",
      "@version" => "1",
    "@timestamp" => 2019-02-04T21:47:54.641Z,
         "quote" => "\"A program is like a poem: you cannot write a poem without writing it.\" ",
          "host" => "docker-logstash",
        "person" => "E.W Dijkstra"
}

We use Elasticsearch for storage, so the output uses the elasticsearch output plugin.

The noteworthy part is that we provide the template options, to tell Elasticsearch how to store the data.

In the end, you will receive this output as displayed in Kibana.

Data in Kibana

Summary

As demonstrated, Logstash is a powerful tool to deal with data in a challenging format. The fortune database for geeks was an example. The configuration is challenging itself. With proper help, guidance and patience you can accomplish to get almost any data into Elasticsearch. We wish you a happy and fortunate year of the pig!

About the author: Vinh Nguyên

Loves to code, hike and mostly drink black coffee. Favors Apache Kafka, Elasticsearch, Java Development and 80's music.

Comments
Join us