Backup and Restore Elasticsearch Index Data

0
41

In this blog post, you will learn how to backup and restore Elasticsearch Index data. Well, there are various reasons for taking data backups. One of the main reason being to protect the primary data against any unforeseen damage as a result of system hardware/software failure. In case for Elasticsearch, you might be wanting to migrate the data to a new Elastic cluster or for any other reason.

Backup and Restore Elasticsearch Index Data

So, how can one backup and restore Elasticsearch index data?

In ELK/Elastic stack, an Elasticsearch backup is called a snapshot. A snapshot can be taken for an entire running Elasticsearch cluster (including all its data streams and indices), specific data streams or specific Elasticsearch indices.

In this tutorial, we will be using a single node Elasticsearch cluster.

Register a snapshot repository

Before you can take snapshot of the Elasticsearch index/cluster, you must first register a repository. There are different types of Elasticsearch repositories;

In this setup, we will use shared file system repository.

Register Snapshot Repository

To register a file system repository, you need to define the file system location on all the master/data nodes Elasticsearch configuration file. This is the path/location in which you want to store your backup/snapshot.

In our setup, we have mounted our backup disk on /mnt/es_backup.

df -hT -P /mnt/es_backup/
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/sdb1      ext4  3.9G   16M  3.7G   1% /mnt/es_backup

To define the location of the path to the backup location on Elasticsearch configuration file, use the option, path.repo.

path.repo: ["/mnt/es_backup"]

You can simply echo this line to the configuration file;

echo 'path.repo: ["/mnt/es_backup"]' >> /etc/elasticsearch/elasticsearch.yml

Set the ownership of the repository path to elasticsearch user.

chown -R elasticsearch: /mnt/es_backup/

If you have a multinode cluster, set the same configuration on all master and data nodes.

Once that is done, restart elasticsearch.

systemctl restart elasticsearch

Once you have defined the backup/snapshot location, you can now register it by running the command below. Remember in this setup, we are using a file system repository.

curl -X PUT "192.168.57.20:9200/_snapshot/es_backup?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/mnt/es_backup"
  }
}
'

When you run the command, you should get the output;

{
  "acknowledged" : true
}

Get Information about Snapshot Repository

To retrieve information about a registered repository, run the command below;

curl -X GET "192.168.57.20:9200/_snapshot/es_backup?pretty"

Sample output;

{
  "es_backup" : {
    "type" : "fs",
    "settings" : {
      "location" : "/mnt/es_backup"
    }
  }
}

To view all repositories;

curl -X GET "192.168.57.20:9200/_snapshot/_all?pretty"

If you want to delete a snapshot repository;

curl -X DELETE "192.168.57.20:9200/_snapshot/es_backup/?pretty"

Create Elasticsearch Snapshot/Backup

Create Snapshot of Entire Elasticsearch Cluster

Once you have registered a snapshot repository, you can now create a snapshot as shown below. “A repository can contain multiple snapshots of the same cluster. Snapshots are identified by unique names within the cluster”.

Take for example, to create snapshot called es_backup_202104192200, you would run such a command;

curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?pretty"
{
  "accepted" : true
}

By default, a snapshot backs up all data streams and open indices in the cluster.

You can also use the wait_for_completion=true parameter to specify whether or not the request should return immediately after snapshot initialization (default) or wait for snapshot completion like;

curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?wait_for_completion=true"

See sample contents of the backup/snapshot directory after the command completes running;

ls -1 /mnt/es_backup/
index-0
index.latest
indices
meta-33qzhT82QTmvH4GkWn-vhw.dat
snap-33qzhT82QTmvH4GkWn-vhw.dat

Create Snapshot of Specific Elasticsearch Index

In my current, I have just a few indices for demo only;

curl 192.168.57.20:9200/_cat/indices?pretty
yellow open filebeat-7.10.1-2021.04.16-000001 XWQ7QQ_9Tpar_rPE5dn0Sw 1 1    24  0  146kb  146kb
yellow open filebeat-7.12.0-2021.04.19-000001 0sQCK1OTRWiosULRHKQMpw 1 1 66423  0 15.5mb 15.5mb
...

So let’s say i want to backup a specific index, filebeat-7.12.0-2021.04.19-000001;

curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/filebeat_202104192200?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "filebeat-7.12.0-2021.04.19-000001",
  "ignore_unavailable": true,
  "include_global_state": false
}
'

While taking a snapshot, you can include other options such as;

  • ignore_unavailable: takes the option true or false.
    • When set to true, it causes the indices/data streams that do not exist to be ignored while taking snapshot.
    • if not defined, snapshot will fail if a data stream or index is missing while taking snapshot.
  • include_global_state: can be set to true to false.
    • if set to true, it causes the snapshot to save the current cluster state as part of the snapshot.
    • if set to false, it prevents the cluster global state from being stored as part of the snapshot.
  • partial:
    • if set to false (default), the snapshot will fail if one or more indices in the snapshot do not have all primary shards available.
    • If set to true, snapshot will take place even if one or more indices in the snapshot do not have all primary shards available.
  • expand_wildcards:
    • used to control whether hidden and closed indices will be included in the snapshot, and defaults to all.
  • metadata:
    • add information such as who took the snapshot, why it was taken, or any other data that might be useful to the snapshot.

See example below;

curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/filebeat_202104192200?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "filebeat-7.12.0-2021.04.19-000001",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "kifarunix",
    "taken_because": "test backup"
  }
}
'

View Snapshot Information

To view information about created snapshots within a specific repository, run the example commands below.

For example, to view information about es_backup_202104192200 snapshot;

curl -X GET "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?pretty"
{
  "snapshots" : [
    {
      "snapshot" : "es_backup_202104192200",
      "uuid" : "33qzhT82QTmvH4GkWn-vhw",
      "version_id" : 7100099,
      "version" : "7.10.0",
      "indices" : [
        ".kibana_task_manager_1",
        "filebeat-7.12.0-2021.04.19-000001",
        "filebeat-7.10.1-2021.04.16-000001",
        ".kibana-event-log-7.10.0-000001",
        ".async-search",
        ".apm-agent-configuration",
        "ilm-history-3-000001",
        ".kibana_1",
        ".apm-custom-link"
      ],
      "data_streams" : [ ],
      "include_global_state" : true,
      "state" : "SUCCESS",
      "start_time" : "2021-04-19T19:57:08.912Z",
      "start_time_in_millis" : 1618862228912,
      "end_time" : "2021-04-19T19:57:56.691Z",
      "end_time_in_millis" : 1618862276691,
      "duration_in_millis" : 47779,
      "failures" : [ ],
      "shards" : {
        "total" : 9,
        "failed" : 0,
        "successful" : 9
      }
    }
  ]
}

You can see indices and data streams in the backup snapshot.

To view all snapshots within a repository;

curl -X GET "192.168.57.20:9200/_snapshot/es_backup/_all?pretty"

Restore Elasticsearch Snapshot/backup

Now, let say you accidentally deleted an index, which you already had backup for. Then it is easy to restore the Elasticsearch snapshot.

According to Elasticsearch snapshot restore;

  • You cannot restore snapshots from later Elasticsearch versions into a cluster running an earlier Elasticsearch version. For example, you cannot restore a snapshot taken in 7.6.0 to a cluster running 7.5.0.
  • You cannot restore indices into a cluster running a version of Elasticsearch that is more than one major version newer than the version of Elasticsearch used to snapshot the indices. For example, you cannot restore indices from a snapshot taken in 5.0 to a cluster running 7.0.

The following table summarizes the snapshot compatibility between cluster versions;

Cluster version
Snapshot version2.x5.x6.x7.x8.x
1.x →YesNoNoNoNo
2.x →YesYesNoNoNo
5.x →NoYesYesNoNo
6.x →NoNoYesYesNo
7.x →NoNoNoYesYes

In above, we learnt how to take a snapshot of the entire cluster as well as for an individual Elasticsearch index.

So, for demo purposes, let us delete the indices on our current Elasticsearch;

curl -X DELETE "192.168.57.20:9200/_all?pretty"
{
  "acknowledged" : true
}

If you try to list available indices, you will find one of the Kibana indices having been created automatically;

curl 192.168.57.20:9200/_cat/indices?pretty
yellow open .kibana -CjWP5YlSdi5eqt1VpLXng 1 1 1 0 5kb 5kb

Next, to try and restore our general/entire cluster snapshot, es_backup_202104192200, then you can run the command below;

curl -X POST "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200/_restore?pretty"

If some of the indices already exist and open, within the cluster, that matches some of the indices available in the snapshot, you can either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name.

Sample error;

{
  "error" : {
    "root_cause" : [
      {
        "type" : "snapshot_restore_exception",
        "reason" : "[es_backup:es_backup_202104192200/33qzhT82QTmvH4GkWn-vhw] cannot restore index [ilm-history-3-000001] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
      }
    ],
    "type" : "snapshot_restore_exception",
    "reason" : "[es_backup:es_backup_202104192200/33qzhT82QTmvH4GkWn-vhw] cannot restore index [ilm-history-3-000001] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
  },
  "status" : 500
}

So let us close the complaining index

curl -X POST "192.168.57.20:9200/ilm-history-3-000001/_close?pretty"

Sample command output.

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "indices" : {
    "ilm-history-3-000001" : {
      "closed" : true
    }
  }
}

If you get such an error;

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_state_exception",
        "reason" : "index, alias, and data stream names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_1/vSrhd_CyTva5oI1ggwnCuQ]) conflicts with index]"
      }
    ],
    "type" : "illegal_state_exception",
    "reason" : "index, alias, and data stream names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_1/vSrhd_CyTva5oI1ggwnCuQ]) conflicts with index]"
  },
  "status" : 500
}

Delete the indices again;

curl -X DELETE "192.168.57.20:9200/_all?pretty"

And immediately run the snapshot restore before .kibana index is auto created.

Now when you run a snapshot restore;

curl -X POST "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200/_restore?pretty"

You should get;

{
  "accepted" : true
}

Listing the indices again should now be same as before;

curl 192.168.57.20:9200/_cat/indices?pretty
yellow open filebeat-7.10.1-2021.04.16-000001 QImIEVM9SOKvtDnO1WUyNw 1 1    24 0   146kb   146kb
yellow open filebeat-7.12.0-2021.04.19-000001 -rYD-nUNR9m10x2W21uAAg 1 1 66423 0  15.5mb  15.5mb
green  open .apm-custom-link                  b6b_dTNPQHOLatvVyw6fUg 1 0     0 0    208b    208b
green  open .kibana_task_manager_1            f2Eg4u8yRvSEk47QU-wwbg 1 0     5 3 132.9kb 132.9kb
green  open .apm-agent-configuration          kMWsZ9kBTW6xeYoe3J4sIA 1 0     0 0    208b    208b
green  open .kibana-event-log-7.10.0-000001   -ZTzLi9zTuOnjcsm2wOhAw 1 0     2 0    11kb    11kb
green  open .async-search                     la4iO9BFTd6qSUzrw7JKNw 1 0     2 2 924.5kb 924.5kb
green  open .kibana_1                         gK9b55LTRCiuwm9sFRTuaQ 1 0  1558 7  10.7mb  10.7mb

And that is how easy it is to backup and restore an Elasticsearch index data.

To delete a snapshot;

curl -X DELETE "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?pretty"

That marks the end of our tutorial on how to backup and restore Elasticsearch Index data.

In our next guide, we will learn how to backup and restore elasticsearch index data to a different elasticsearch cluster. Links is provided below;

Restore Elasticsearch Data to another Cluster

Reference

Elasticsearch Snapshot and Restore

Other Tutorials

Setup Kibana Elasticsearch and Fluentd on CentOS 8

Setup Multi-node Elasticsearch 7.x Cluster on Fedora 30/Fedora 29/CentOS 7

Update/Change Kibana Visualization Index Pattern

LEAVE A REPLY

Please enter your comment!
Please enter your name here