In the previous post we discussed how to read data in chunks from a relational database as a Spring Batch process using Spring Boot runner. This post will explain how to import that data into Elasticsearch.
Spring Boot provides out of the box support for Elasticsearch that saves a lot of configuration time so we are going to take full advantage of that. However, in real world application, you will need to use the Elasticsearch Java Client that requires additional instantiating and configuring. When no configuration is provided, the Spring Boot integration for Elasticsearch will automatically create an embedded Elasticsearch instance which is very useful for testing….
Recently I had a customer project that required transferring large amounts of data from a relational database to the NoSQL database that is Elasticsearch in order to take advantage of its famous fast searching capabilities. Elasticsearch is part of the ELK stack that is released and maintained by Elastic.co. The abbreviation ELK stand for Elasticsearch, Logstash, and Kibana.
The easiest way to transfer data from a traditional relational database into Elasticsearch is by using the “L” in the “ELK” stack: Logstash. Unfortunately, Logstash has some limitations, and one of those limitations is directly related to reading records from a relational database because, although database entry and Elasticsearch entry may seem very similar, it’s not possible to match single database entry to a single Elasticsearch document. This difference originates from the fact that Elasticsearch doesn’t use the notion of “relations” between its “records”, but instead it uses flat documents structure to store its data and flat documents have no relations between each other. In addition, the customer database have several millions entries which made the situation even more complicated. So, instead of using Logstash, a decision was made to write our own importer that was going to use batch processing and bulk writing into Elasticsearch.