Recently I had a customer project that required transferring large amounts of data from a relational database to the NoSQL database that is Elasticsearch in order to take advantage of its famous fast searching capabilities. Elasticsearch is part of the ELK stack that is released and maintained by Elastic.co. The abbreviation ELK stand for Elasticsearch, Logstash, and Kibana.
The easiest way to transfer data from a traditional relational database into Elasticsearch is by using the “L” in the “ELK” stack: Logstash. Unfortunately, Logstash has some limitations, and one of those limitations is directly related to reading records from a relational database because, although database entry and Elasticsearch entry may seem very similar, it’s not possible to match single database entry to a single Elasticsearch document. This difference originates from the fact that Elasticsearch doesn’t use the notion of “relations” between its “records”, but instead it uses flat documents structure to store its data and flat documents have no relations between each other. In addition, the customer database have several millions entries which made the situation even more complicated. So, instead of using Logstash, a decision was made to write our own importer that was going to use batch processing and bulk writing into Elasticsearch.