Today we are looking into the Elasticsearch language support of Chinese. Chinese is spoken by the ethnic Chinese majority and many minority ethnic groups in China. About 1.2 billion people (around 16% of the world's population) speak some form of Chinese as their first language. We are an international company, so having customers in Singapore or Hong Kong makes it super interesting. Chinese consists of many dialects and mostly two written forms. In our first section, I will clarify which region uses what dialect and written form. After that, we are looking at what is supported by…
Read more
Reading the title of this blog post, you will likely associate the title with the fairy tale Snow White and the Seven Dwarfs. An association is a connection in mind for two related terms. It is a creative process that the human brain is so good at it. Another creative process is using synonyms.
Read more
Since Elasticsearch 5, the default similarity algorithm for Elasticsearch is Okapi BM25. A similarity (scoring/ranking model) defines how matching documents are scored. Performing a search against a set of documents gives you results sorted by relevance. In one of our previous blog posts by Rocco Schulz, BM25 was already mentioned. In this blog article, we are going to look into the inner workings of the Okapi BM25 algorithm.
Read more
This article is the beginning of our Elastic Stack article series, that explains the Elastic Stack for Beginners and curious people. I use football data as the basis for our demonstrations and examples for three reasons. The first reason is that the football (soccer) season starts again this weekend. I live in the city of the current Swiss Football champion (Bsc Young Boys). I want them to succeed again this season. The final reason is my little nephew. On my summer vacation visit, my nephew asked me what I do for a living in Switzerland: Software Engineering and Architecture for Distributed…
Read more
As everybody know, Elastic releases new versions of its products fairly regularly, so some of as are are always waiting for a new version to analyse and test new features.
Read more