In this blog post, you will learn how to backup and restore Elasticsearch Index data. Well, there are various reasons for taking data backups. One of the main reason being to protect the primary data against any unforeseen damage as a result of system hardware/software failure. In case for Elasticsearch, you might be wanting to migrate the data to a new Elastic cluster or for any other reason.
Backing up and Restoring Elasticsearch Index Data
In ELK/Elastic stack, an Elasticsearch backup is called a snapshot
. A snapshot can be taken for an entire running Elasticsearch cluster (including all its data streams and indices), specific data streams or specific Elasticsearch indices.
In this tutorial, we will be using a single node Elasticsearch cluster.
Register a snapshot repository
Before you can take snapshot of the Elasticsearch index/cluster, you must first register a repository. There are different types of Elasticsearch repositories;
In this setup, we will use shared file system repository.
Register Snapshot Repository
To register a file system repository, you need to define the file system location on all the master/data nodes Elasticsearch configuration file. This is the path/location in which you want to store your backup/snapshot.
In our setup, we have mounted our backup disk on /mnt/es_backup
.
df -hT -P /mnt/es_backup/
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdb1 ext4 3.9G 16M 3.7G 1% /mnt/es_backup
To define the location of the path to the backup location on Elasticsearch configuration file, use the option, path.repo
.
path.repo: ["/mnt/es_backup"]
You can simply echo this line to the configuration file;
echo 'path.repo: ["/mnt/es_backup"]' >> /etc/elasticsearch/elasticsearch.yml
Set the ownership of the repository path to elasticsearch
user.
chown -R elasticsearch: /mnt/es_backup/
If you have a multinode cluster, set the same configuration on all master and data nodes.
Once that is done, restart elasticsearch.
systemctl restart elasticsearch
Once you have defined the backup/snapshot location, you can now register it by running the command below. Remember in this setup, we are using a file system repository.
curl -X PUT "192.168.57.20:9200/_snapshot/es_backup?pretty" -H 'Content-Type: application/json' -d'
{
"type": "fs",
"settings": {
"location": "/mnt/es_backup"
}
}
'
When you run the command, you should get the output;
{
"acknowledged" : true
}
Get Information about Snapshot Repository
To retrieve information about a registered repository, run the command below;
curl -X GET "192.168.57.20:9200/_snapshot/es_backup?pretty"
Sample output;
{
"es_backup" : {
"type" : "fs",
"settings" : {
"location" : "/mnt/es_backup"
}
}
}
To view all repositories;
curl -X GET "192.168.57.20:9200/_snapshot/_all?pretty"
If you want to delete a snapshot repository;
curl -X DELETE "192.168.57.20:9200/_snapshot/es_backup/?pretty"
Create Elasticsearch Snapshot/Backup
Create Snapshot of Entire Elasticsearch Cluster
Once you have registered a snapshot repository, you can now create a snapshot as shown below. “A repository can contain multiple snapshots of the same cluster. Snapshots are identified by unique names within the cluster”.
Take for example, to create snapshot called es_backup_202104192200
, you would run such a command;
curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?pretty"
{
"accepted" : true
}
By default, a snapshot backs up all data streams and open indices in the cluster.
You can also use the wait_for_completion=true
parameter to specify whether or not the request should return immediately after snapshot initialization (default) or wait for snapshot completion like;
curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?wait_for_completion=true"
See sample contents of the backup/snapshot directory after the command completes running;
ls -1 /mnt/es_backup/
index-0
index.latest
indices
meta-33qzhT82QTmvH4GkWn-vhw.dat
snap-33qzhT82QTmvH4GkWn-vhw.dat
Create Snapshot of Specific Elasticsearch Index
In my current, I have just a few indices for demo only;
curl 192.168.57.20:9200/_cat/indices?pretty
yellow open filebeat-7.10.1-2021.04.16-000001 XWQ7QQ_9Tpar_rPE5dn0Sw 1 1 24 0 146kb 146kb
yellow open filebeat-7.12.0-2021.04.19-000001 0sQCK1OTRWiosULRHKQMpw 1 1 66423 0 15.5mb 15.5mb
...
So let’s say i want to backup a specific index, filebeat-7.12.0-2021.04.19-000001
;
curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/filebeat_202104192200?pretty" -H 'Content-Type: application/json' -d'
{
"indices": "filebeat-7.12.0-2021.04.19-000001",
"ignore_unavailable": true,
"include_global_state": false
}
'
While taking a snapshot, you can include other options such as;
ignore_unavailable
: takes the optiontrue
orfalse
.- When set to true, it causes the indices/data streams that do not exist to be ignored while taking snapshot.
- if not defined, snapshot will fail if a data stream or index is missing while taking snapshot.
include_global_state
: can be set totrue
tofalse
.- if set to
true
, it causes the snapshot to save the current cluster state as part of the snapshot. - if set to
false
, it prevents the cluster global state from being stored as part of the snapshot.
- if set to
partial
:- if set to false (default), the snapshot will fail if one or more indices in the snapshot do not have all primary shards available.
- If set to true, snapshot will take place even if one or more indices in the snapshot do not have all primary shards available.
expand_wildcards
:- used to control whether
hidden
andclosed
indices will be included in the snapshot, and defaults toall
.
- used to control whether
metadata
:- add information such as who took the snapshot, why it was taken, or any other data that might be useful to the snapshot.
See example below;
curl -X PUT "192.168.57.20:9200/_snapshot/es_backup/filebeat_202104192200?pretty" -H 'Content-Type: application/json' -d'
{
"indices": "filebeat-7.12.0-2021.04.19-000001",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "kifarunix",
"taken_because": "test backup"
}
}
'
View Snapshot Information
To view information about created snapshots within a specific repository, run the example commands below.
For example, to view information about es_backup_202104192200
snapshot;
curl -X GET "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?pretty"
{
"snapshots" : [
{
"snapshot" : "es_backup_202104192200",
"uuid" : "33qzhT82QTmvH4GkWn-vhw",
"version_id" : 7100099,
"version" : "7.10.0",
"indices" : [
".kibana_task_manager_1",
"filebeat-7.12.0-2021.04.19-000001",
"filebeat-7.10.1-2021.04.16-000001",
".kibana-event-log-7.10.0-000001",
".async-search",
".apm-agent-configuration",
"ilm-history-3-000001",
".kibana_1",
".apm-custom-link"
],
"data_streams" : [ ],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2021-04-19T19:57:08.912Z",
"start_time_in_millis" : 1618862228912,
"end_time" : "2021-04-19T19:57:56.691Z",
"end_time_in_millis" : 1618862276691,
"duration_in_millis" : 47779,
"failures" : [ ],
"shards" : {
"total" : 9,
"failed" : 0,
"successful" : 9
}
}
]
}
You can see indices and data streams in the backup snapshot.
To view all snapshots within a repository;
curl -X GET "192.168.57.20:9200/_snapshot/es_backup/_all?pretty"
Restore Elasticsearch Snapshot/backup
Now, let say you accidentally deleted an index, which you already had backup for. Then it is easy to restore the Elasticsearch snapshot.
According to Elasticsearch snapshot restore;
- You cannot restore snapshots from later Elasticsearch versions into a cluster running an earlier Elasticsearch version. For example, you cannot restore a snapshot taken in 7.6.0 to a cluster running 7.5.0.
- You cannot restore indices into a cluster running a version of Elasticsearch that is more than one major version newer than the version of Elasticsearch used to snapshot the indices. For example, you cannot restore indices from a snapshot taken in 5.0 to a cluster running 7.0.
The following table summarizes the snapshot compatibility between cluster versions;
Cluster version | |||||
Snapshot version | 2.x | 5.x | 6.x | 7.x | 8.x |
1.x → | |||||
2.x → | |||||
5.x → | |||||
6.x → | |||||
7.x → |
In above, we learnt how to take a snapshot of the entire cluster as well as for an individual Elasticsearch index.
So, for demo purposes, let us delete the indices on our current Elasticsearch;
curl -X DELETE "192.168.57.20:9200/_all?pretty"
{
"acknowledged" : true
}
If you try to list available indices, you will find one of the Kibana indices having been created automatically;
curl 192.168.57.20:9200/_cat/indices?pretty
yellow open .kibana -CjWP5YlSdi5eqt1VpLXng 1 1 1 0 5kb 5kb
Next, to try and restore our general/entire cluster snapshot, es_backup_202104192200
, then you can run the command below;
curl -X POST "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200/_restore?pretty"
If some of the indices already exist and open, within the cluster, that matches some of the indices available in the snapshot, you can either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name.
Sample error;
{
"error" : {
"root_cause" : [
{
"type" : "snapshot_restore_exception",
"reason" : "[es_backup:es_backup_202104192200/33qzhT82QTmvH4GkWn-vhw] cannot restore index [ilm-history-3-000001] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
}
],
"type" : "snapshot_restore_exception",
"reason" : "[es_backup:es_backup_202104192200/33qzhT82QTmvH4GkWn-vhw] cannot restore index [ilm-history-3-000001] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
},
"status" : 500
}
So let us close the complaining index
curl -X POST "192.168.57.20:9200/ilm-history-3-000001/_close?pretty"
Sample command output.
{
"acknowledged" : true,
"shards_acknowledged" : true,
"indices" : {
"ilm-history-3-000001" : {
"closed" : true
}
}
}
If you get such an error;
{
"error" : {
"root_cause" : [
{
"type" : "illegal_state_exception",
"reason" : "index, alias, and data stream names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_1/vSrhd_CyTva5oI1ggwnCuQ]) conflicts with index]"
}
],
"type" : "illegal_state_exception",
"reason" : "index, alias, and data stream names need to be unique, but the following duplicates were found [.kibana (alias of [.kibana_1/vSrhd_CyTva5oI1ggwnCuQ]) conflicts with index]"
},
"status" : 500
}
Delete the indices again;
curl -X DELETE "192.168.57.20:9200/_all?pretty"
And immediately run the snapshot restore before .kibana index is auto created.
Now when you run a snapshot restore;
curl -X POST "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200/_restore?pretty"
You should get;
{
"accepted" : true
}
Listing the indices again should now be same as before;
curl 192.168.57.20:9200/_cat/indices?pretty
yellow open filebeat-7.10.1-2021.04.16-000001 QImIEVM9SOKvtDnO1WUyNw 1 1 24 0 146kb 146kb
yellow open filebeat-7.12.0-2021.04.19-000001 -rYD-nUNR9m10x2W21uAAg 1 1 66423 0 15.5mb 15.5mb
green open .apm-custom-link b6b_dTNPQHOLatvVyw6fUg 1 0 0 0 208b 208b
green open .kibana_task_manager_1 f2Eg4u8yRvSEk47QU-wwbg 1 0 5 3 132.9kb 132.9kb
green open .apm-agent-configuration kMWsZ9kBTW6xeYoe3J4sIA 1 0 0 0 208b 208b
green open .kibana-event-log-7.10.0-000001 -ZTzLi9zTuOnjcsm2wOhAw 1 0 2 0 11kb 11kb
green open .async-search la4iO9BFTd6qSUzrw7JKNw 1 0 2 2 924.5kb 924.5kb
green open .kibana_1 gK9b55LTRCiuwm9sFRTuaQ 1 0 1558 7 10.7mb 10.7mb
And that is how easy it is to backup and restore an Elasticsearch index data.
To delete a snapshot;
curl -X DELETE "192.168.57.20:9200/_snapshot/es_backup/es_backup_202104192200?pretty"
In our next guide, you can also restore elasticsearch index data to a different elasticsearch cluster. Links is provided below;
Restore Elasticsearch Data to another Cluster
Reference
Elasticsearch Snapshot and Restore
Other Tutorials
Setup Kibana Elasticsearch and Fluentd on CentOS 8
Setup Multi-node Elasticsearch 7.x Cluster on Fedora 30/Fedora 29/CentOS 7