Elasticsearch: Snapshot Backups on a Shared NFS

Disasters can happen. We experienced data loss on our Elasticsearch cluster a few weeks ago after a failed upgrade. That's why data redundancy isn't enough: even when you data is replicated on multiple nodes, your data isn't safe!

Backuping your elasticsearch cluster is another layer of security in case things go wrong:

Failed Upgrade: in our case, that's what happened. The data was upgraded but elasticsearch was unable to read it. Several nodes had corrupted data,
Intrusions: what if a hacker gains access to your database,
Multiple node failures: data is usually replicated on 1+ nodes, but what if several nodes fail simultaneously? It's highly improbable, that's true.

This tutorial is going to explain how we use a shared Network File System connected to all our Elasticsearch nodes to save incremental snapshots of the database every night. Let's see how we can leverage NFS to store Elasticsearch snapshots.

Do YOU have custom Load Testing needs?

Rely on our Expertise

NFS Setup¶

Prerequisites¶

We are going to use two servers:

NFS Server: this server shares a folder on its local disk, (with IP 10.0.0.1)
NFS Client: Elasticsearch node acting as an NFS client connected to the NFS Server. (with IP 10.0.0.2)

Both are supposed to run on Ubuntu linux, with a non-root user with sudo privileges.

NFS Server¶

The NFS server is responsible of providing a shared folder accessible from all ElasticSearch nodes. Why? Because that's the way Elasticsearch snapshots work. All nodes must have access to a shared storage to be able to store the snapshot data.

First, let's install the NFS server packages:

server@ubuntu:~$ sudo apt update
server@ubuntu:~$ sudo apt install nfs-kernel-server

It's now time to create and share an NFS folder on this machine. Let's suppose we want to share /var/nfs/elasticsearch:

Create the shared elasticsearch folder:

server@ubuntu:~$ sudo mkdir /var/nfs/elasticsearch -p

Now let's configure the NFS Server to share the folder with our NFS client:

server@ubuntu:~$ sudo nano /etc/exports

Here is an example showing how to share the folder with our client (suppose its IP is 127.0.0.1):

server@ubuntu:~$ /var/nfs/elasticsearch 10.0.0.2(rw,sync,no_root_squash,no_subtree_check)

Of course, replace 10.0.0.2 with the public IP of your NFS client.

Here, we’re using the same configuration options for both directories with the exception of no_root_squash. Let’s take a look at what each of these options mean:

rw: client can both read and write files,
sync: This option forces NFS to write changes to disk before replying. It improves consistency but reduces transfer speed,
no_subtree_check: Disables whether a file is actually still available in the exported tree. This can cause many problems when a file is renamed while the client has it opened. In almost all cases, it is better to disable subtree checking,
no_root_squash: By default, NFS translates requests from a root user remotely into a non-privileged user on the server. no_root_squash disables this behavior for certain shares.

Once the configuration is done, the NFS server must be restarted:

server@ubuntu:~$ sudo systemctl restart nfs-kernel-server

If you want to dig further (especially regarding security via firewalls like ufw), it's worth reading How to Setup NFS mount on Ubuntu tutorial.

NFS Client¶

Each Elasticsearch node acts as an NFS Client. First, we need to install the ubuntu packages:

client@ubuntu:~$ sudo apt update
client@ubuntu:~$ sudo apt install nfs-common

Then, we're going to create and mount the shared NFS folder:

client@ubuntu:~$ sudo mkdir -p /var/nfs/elasticsearch
client@ubuntu:~$ sudo mount 10.0.0.1:/var/nfs/elasticsearch /var/nfs/elasticsearch

These commands mount the NFS share in folder /var/nfs/elasticsearch on the client side. We can check if the mount has worked successfully by running this command:

ubuntu@ubuntu:~$ df -h
Filesystem                Size  Used Avail Use% Mounted on
udev                      238M     0  238M   0% /dev
tmpfs                      49M  628K   49M   2% /run
/dev/vda1                  20G  1.2G   18G   7% /
tmpfs                     245M     0  245M   0% /dev/shm
tmpfs                     5.0M     0  5.0M   0% /run/lock
tmpfs                     245M     0  245M   0% /sys/fs/cgroup
tmpfs                      49M     0   49M   0% /run/user/0
10.0.0.1:/var/nfs/elasticsearch   20G  1.2G   18G   7% /var/nfs/elasticsearch

As the folder has been mounted manually, the shared NFS folder will disappear on next reboot. To fix this, we need to add the NFS shared folder to /etc/fstab. Your fstab file should look like:

client@ubuntu:~$ cat /etc/fstab 
# <file system> <mount point>   <type>  <options>   <dump>  <pass>
/dev/md2    /   ext4    errors=remount-ro,discard   0   1
/dev/md3    /home   ext4    defaults,discard    1   2
proc        /proc   proc    defaults        0   0
sysfs       /sys    sysfs   defaults        0   0
/dev/sda1   /boot/efi   vfat    defaults    0   0

# NFS OVH
10.0.0.1:/var/nfs/elasticsearch /var/nfs/elasticsearch-ovh nfs4 rw,_netdev,tcp 0 0

Fantastic! Now we have a shared NFS folder mounted on each Elasticsearch node in /var/nfs/elasticsearch. We can test if it's working by creating a file:

client@ubuntu:~$ sudo touch /var/nfs/elasticsearch/test.txt

This should create a file named test.txt which can be seen from any other NFS client. Now, we're going to see how we can use this folder to store Elasticsearch snapshots.

Elasticsearch Setup¶

In this section, we're going to use Kibana to administer the Elasticsearch cluster. We suppose Elasticsearch is installed directly on the server. For further information about Elasticsearch snapshots, refer to their documentation.

We're now going to configure and create a snapshot repository mapped on the /var/nfs/elasticsearch folder.

elasticsearch.yml¶

We need to declare the path.repo configuration to allow fs snapshot repositories to access the /var/nfs folder:

path.repo: ["/var/nfs"]

This is mandatory otherwise the repository will fail to create. Restart Elasticsearch to apply the settings.

Snapshot Repository¶

Next, let's now put the snapshot repository:

Elastic-Search must be up and running,
Start Kibana and open the dev tools,
Create the snapshot repository:

PUT /_snapshot/nfs-server
{
    "type": "fs",
    "settings": {
        "location": "/var/nfs/elasticsearch",
        "compress": true
    }
}

The server should answer:

{"acknowledged": true}

Once created, let's verify the snapshot repository is working properly:

POST /_snapshot/nfs-server/_verify

The server should answer with something similar to:

{
  "nodes" : {
    "xxx" : {
      "name" : "node-3"
    },
    "xxx" : {
      "name" : "node-4"
    },
    "xxx" : {
      "name" : "node-1"
    },
    "xxx" : {
      "name" : "node-2"
    },
    "xxx" : {
      "name" : "node-5"
    }
  }
}

If you encounter any exception while verifying the snapshot repository:

Check mounts: double-check all elasticsearch nodes have the NFS folder mounted into the same location,
Check elasticsearch.yml: make sure path.repo config is declared and properly set,
Restart Cluster: restart Elasticsearch cluster to make sure the settings declared in elasticsearch.yml ,
Check Rights: it might be necessary to loosen rights on the NFS mount:

client@ubuntu:~$ sudo chmod 777 -R /var/nfs/elasticsearch

If this fixes the issue, try to chown the folder to the elasticsearch user from the client machine:

client@ubuntu:~$ sudo chown -R elasticsearch:elasticsearch /var/nfs/elasticsearch

To list all the snapshots currently stored on the repository:

GET _snapshot/nfs-server/_all

Conclusion¶

As we have seen, it's pretty easy to setup a shared NFS mount to save and restore Elasticsearch database. I would definitely suggest to use a NFS Server with:

1Gbps+ Network Bandwidth: depending the amount of data you have, better make sure there is Gigabit connection between the nodes and the NFS server to speed up the process,
RAID1 HDD or SSDs: the backup server should have RAID1 (aka Mirroring). Suppose your cluster data is corrupted and you need to restore a snapshot. You don't want to have all your data stored on a non-redundant disk whose failure would be catastrophic,
CPU: 4 cores or more. Better have an over-sized CPU to sustain high IOPS without any issue,
RAM: 16GB+ of RAM is fine. The more RAM you have, the more the system can use to cache the filesystem.

From our experience, restoring 300GB of data can take about an hour to be transferred from our NFS server to our 5 nodes Elasticsearch cluster. Each computer has a 1Gbps network connection.

Once the snapshot has been restored, it still takes some time for Elasticsearch to replicate the primary shards. Make sure to check the process using Kibana:

GET /_recovery?pretty&active_only

I would advise to restart indexing documents only once the recovery process has completed (all indices shown as green by GET _cat/indices).

Want to become a super load tester?

Request a Demo