Backup Elasticsearch Data with Microsoft Azure

June 15, 2020

Your data in Elasticsearch is valuable. Therefore the need for backups is provided by the snapshot and restore mechanism in Elasticsearch.

A snapshot is a backup taken from a running Elasticsearch cluster. You can take snapshots of individual indices or of the entire cluster. Snapshots can be stored in either local or remote repositories. Remote repositories can reside on S3, HDFS, Azure, Google Cloud Storage, and other platforms supported by a repository plugin. Elasticsearch Reference

In this article, we provide an example from beginning to end, how to perform a snapshot with Microsoft Azure. Microsoft Azure is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through Microsoft-managed data centers.

  1. In the first part, we create for the demonstration an Azure account.
  2. In the second part, we describe how to install and configure the Elasticsearch Azure plugin with the demo account.
  3. In the final third part, we create a snapshot of the Kibana index.

In this article, we use the Elastic Stack Version 7.7.1. We recommend using a single-node Elastisearch cluster if you want to follow this example.

Create an Azure Storage Account

Go to the Microsoft Azure Portal and create an account. New accounts have a trial period with a fixed amount to use for experiments.

Once you have completed the account creation, go to Storage and create a storage account.

Create Storage Account

The Storage Account Name is unique, and I chose to use cinhtau. It never was occupied so far 😉.

Next is networking. Since we want to create backups, it is publicly accessible. If your Elasticsearch cluster is in a VPC (Virtual Private Cloud) on Azure, you can also choose other options.

Setup Networking

In the Advances Settings choose secure transfer, i.e. traffic is encrypted with https.

Advanced Settings

Tags is a taxonomy for billing and statistics. For instance, you can use customer, project or more meaningful, purpose as a tag.

Setup Tags

After completion, you can see a detailed summary of the storage account, before deployment.

Storage Summary

If you want special encryption, you can set it up in Encryption.

Encryption

For Elasticsearch we need a container. As a default, the Elasticsearch Azure Plugin expects elasticsearch-snapshots. Create this container for the Elasticsearch snapshot.

Backup Container

Setup Elasticsearch Azure Plugin

In most of our cases, we are behind a corporate proxy, and we perform the offline installation.

We download the Azure Repository Plugin

wget https://artifacts.elastic.co/downloads/elasticsearch-plugins/repository-azure/repository-azure-7.7.1.zip \
 -O /tmp/repository-azure-7.7.1.zip

After the plugin is available, we install it.

[vinh@rhel ~]$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install file:///tmp/repository-azure-7.7.1.zip
-> Installing file:///tmp/repository-azure-7.7.1.zip
-> Downloading file:///tmp/repository-azure-7.7.1.zip
[=================================================] 100%  
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission setFactory
* java.net.SocketPermission * connect,resolve
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed repository-azure

Plugin Configuration

The tricky part is the elasticsearch.yml configuration. The Elasticsearch documentation is complete but lacks IMHO often good examples for a quick setup. With the plugin installed, we can now set up Elasticsearch to use the new storage. Some settings already default, but for completeness and clarity, we also list it in the following configuration.

# --------------------------------- Azure Storage ------------------------------
azure.client:
  default:
    timeout: 10s
    max_retries: 7
    endpoint_suffix: core.windows.net
    # must be add to keystore
    # account:
    # key:
    proxy:
      host: proxy-vinh.ist-drin.com
      port: 4711
      type: http

Working in corporate environments with secure networks, also requires that the Azure Plugin needs to go over the proxy to connect to the cloud storage.

As you can see the account and key is mandatory, but is only permitted in the keystore. In your Azure account, go to Keys and take a look at the connection string. This connection string provides all details for the configuration.

If you dissect this one-liner, you have the following information

DefaultEndpointsProtocol=https;
AccountName=cinhtau;
AccountKey=super-secret-key-replaced;
EndpointSuffix=core.windows.net

The account name and key are mandatory for the plugin to perform the snapshot. We add settings to elasticsearch keystore.

export ACCOUNT_KEY='super-secret-key-replaced'
echo cinhtau | sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin azure.client.default.account
echo $ACCOUNT_KEY | sudo /usr/share/elasticsearch/bin/elasticsearch-keystore add --stdin azure.client.default.key

Repeat this for every node in the Elasticsearch cluster. Elasticsearch needs to be restarted after the configuration.

Register Azure Repository

With the settings in place, you can create a repository in the Kibana console. For clarity, the container name is the default name. You can use or override any container name, that Azure allows you to create.

PUT _snapshot/azure-kibana
{
  "type": "azure",
  "settings": {
    "container": "elasticsearch-snapshots",
    "chunk_size": "32MB",
    "compress": true
  }
}

The repository validation will create some test files on the Azure Storage container. These contents are in the storage explorer visible.

Repository Validation

If you have no errors in the Elasticsearch logs, you can query all repositories.

GET /_snapshot/_all

In the Kibana console, you should see this JSON response.

{
  "azure-kibana" : {
    "type" : "azure",
    "settings" : {
      "container" : "elasticsearch-snapshots",
      "compress" : "true",
      "chunk_size" : "32MB"
    }
  }
}

Take a Snapshot

In the Kibana console, we create a snapshot named snapshot_2. The indices can also be an alias. We simply take the .kibana alias.

PUT /_snapshot/azure-kibana/snapshot_2?wait_for_completion=true
{
  "indices": ".kibana",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "cinhtau",
    "taken_because": "POC for customer"
  }
}

After only a few seconds, you can see this response.

{
  "snapshot" : {
    "snapshot" : "snapshot_2",
    "uuid" : "iCL1jnQmTrOLUCKVE69P2Q",
    "version_id" : 7070199,
    "version" : "7.7.1",
    "indices" : [
      ".kibana_3"
    ],
    "include_global_state" : false,
    "metadata" : {
      "taken_by" : "cinhtau",
      "taken_because" : "POC for customer"
    },
    "state" : "SUCCESS",
    "start_time" : "2020-06-13T12:44:56.904Z",
    "start_time_in_millis" : 1592052296904,
    "end_time" : "2020-06-13T12:44:59.505Z",
    "end_time_in_millis" : 1592052299505,
    "duration_in_millis" : 2601,
    "failures" : [ ],
    "shards" : {
      "total" : 1,
      "failed" : 0,
      "successful" : 1
    }
  }
}

Verify the snapshot by querying all snapshots in the repository.

GET /_snapshot/azure-kibana/_all

The response:

{
  "snapshots" : [
    {
      "snapshot" : "snapshot_2",
      "uuid" : "iCL1jnQmTrOLUCKVE69P2Q",
      "version_id" : 7070199,
      "version" : "7.7.1",
      "indices" : [
        ".kibana_3"
      ],
      "include_global_state" : false,
      "metadata" : {
        "taken_by" : "cinhtau",
        "taken_because" : "POC for customer"
      },
      "state" : "SUCCESS",
      "start_time" : "2020-06-13T12:44:56.904Z",
      "start_time_in_millis" : 1592052296904,
      "end_time" : "2020-06-13T12:44:59.505Z",
      "end_time_in_millis" : 1592052299505,
      "duration_in_millis" : 2601,
      "failures" : [ ],
      "shards" : {
        "total" : 1,
        "failed" : 0,
        "successful" : 1
      }
    }
  ]
}

Our double-check in Azure:

Snapshot Data

The snapshot mechanism creates the indices folder and the index.latest file.

Summary

Using Microsoft Azure Storage for Backups with Elasticsearch is easy and quick if you follow this example. Also, take into consideration that you can set up on Azure retention policies for the backups.

About the author: Vinh Nguyên

Loves to code, hike and mostly drink black coffee. Favors Apache Kafka, Elasticsearch, Java Development and 80's music.

Comments
Join us