Guide to Elasticdump - Moving and Saving Elasticsearch Indices - Stack Abuse

Guide to Elasticdump - Moving and Saving Elasticsearch Indices

Introduction

Elasticsearch initially began as a custom search engine. These days, it has gone above and beyond that singular role as it's part of log aggregation stacks, security monitoring, and even as a datastore for performing exploratory analysis.

Indices in Elasticsearch is where the data is stored, and there are often times when these indices need to be transported from one cluster to another. Perhaps we just need a safe backup before moving data to another cluster or upgrading versions. Elasticdump is a tool that helps in facilitating this backup and restore operation.

Elasticdump is a tool made to move and save Elasticsearch indices.

In this guide, we'll take a look at how to use Elasticdump and how to create and move indices between clusters on your local computer.

Note: To follow this tutorial, you need to have docker installed, as well as NPM.

What Is Elasticdump, and Why Use It?

Those familiar with Elasticsearch would know of its "Snapshot and Restore" feature to facilitate these operations. Yes, it is indeed a smart feature that helps back up individual indices or an entire cluster. While exploring that option for a small number of indices or clusters, you may find that using the "Snapshot and Restore" feature could be an overkill. This is even more pronounced when there's no system to incrementally take snapshots.

This is where Elasticdump can help - it's a lightweight tool that can move and save indices with a remote repository required by Elasticsearch's "Snapshot and Restore".

Here's what Elasticdump can do:

  • Copy indices from one cluster to the other
  • Dump indices to flat files
  • Back up indices & mappings
  • Restore indices across different versions of Elasticsearch
  • Multielasticdump that comes along with Elasticdump can export multiple indices in parallel

Setting up Elasticdump

Installation

Elasticdump is a Node package and it can be directly downloaded from NPM. We'll need to have Node.js installed, alongside the Node Package Manager (NPM).

Let's go ahead and install them, before downloading Elasticdump:

$ sudo apt install nodejs npm # Install Node.js + NPM
$ sudo npm install n -g # Install helper package to get latest Node.js + NPM versions
/usr/local/bin/n -> /usr/local/lib/node_modules/n/bin/n
/usr/local/lib
└── [email protected]
$ sudo n latest # Get latest version of NPM

  installing : node-v16.2.0
       mkdir : /usr/local/n/versions/node/16.2.0
       fetch : https://nodejs.org/dist/v16.2.0/node-v16.2.0-linux-x64.tar.xz
   installed : v16.2.0 (with npm 7.13.0)

Note: the node command changed location and the old location may be remembered in your current shell.
         old : /usr/bin/node
         new : /usr/local/bin/node
To reset the command location hash either start a new shell or execute PATH="$PATH"

$ sudo npm install elasticdump -g # Install Elasticdump globally on your local machine

This installs Elasticdump globally and the installation can be verified using the following command:

$ elasticdump --version
6.71.0

Setting Up an Elasticsearch Cluster (Optional)

To test Elasticdump out, you will have to have at least one Elasticsearch cluster with a single node setup. If you have an Elasticsearch cluster running already, you may skip this step.

For others, the following commands will spin up an Elasticsearch container for you. Make sure you have Docker or Docker Desktop installed and running on your machine. You can download your suitable installer from here.

Once your Docker server is up and running, let's create a directory, which will contain the volume to hold Elasticsearch's data. If this is not done, Elasticsearch volume would be ephemeral and your data will be lost if the container goes down:

$ mkdir -p data/ES9200
$ mkdir -p data/ES9400
$ vol_location=`pwd`

You can now spin up your Elasticsearch container by keying in the following command:

$ docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -v ${vol_location}/data/es9200:/usr/share/elasticsearch/data --name=es_source -d docker.elastic.co/elasticsearch/elasticsearch:7.13.0

Let's break this command down a bit.

We've commanded Docker to publish (-p flag) port 9200 on your host machine to the 9200 port on your container, and the same process is then run for 9300. We have also assigned an environment variable (-e flag) stating that it is going to be a single node setup.

Then, we attach the volume (-v flag) to the directory created in the previous setup, mapped to the data directory inside the container, in which the data will be stored.

Finally, the --name flag is to give a nice name to the container while the -d flag is to make the container run in the background, and docker.elastic.co/elasticsearch/elasticsearch:7.13.0 is the name of the image.

We'll copy data from this Elasticsearch cluster to another cluster, using Elasticdump. To make this happen, you can create another cluster known as es_target by issuing the following command:

$ docker run -p 9400:9200 -p 9500:9300 -e "discovery.type=single-node" -v ${vol_location}/data/es9400:/usr/share/elasticsearch/data --name=es_target -d docker.elastic.co/elasticsearch/elasticsearch:7.13.0

Note: The service won't run on a port that's already being used by another service.

You can check if your Elasticsearch cluster containers are running fine by issuing the following command:

Git Essentials

Check out this hands-on, practical guide to learning Git, with best-practices and industry-accepted standards. Stop Googling Git commands and actually learn it!

$ docker ps
CONTAINER ID        IMAGE                                                  COMMAND                  CREATED
STATUS              PORTS                                                                                  NAMES
c0aa983abb21        docker.elastic.co/elasticsearch/elasticsearch:7.13.0   "/bin/tini -- /usr/l…"   9 hours ago
Up 2 hours          0.0.0.0:9400->9200/tcp, :::9400->9200/tcp, 0.0.0.0:9500->9300/tcp, :::9500->9300/tcp   es_target
eb30714b5302        docker.elastic.co/elasticsearch/elasticsearch:7.13.0   "/bin/tini -- /usr/l…"   10 hours ago        Up 3 hours          0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 0.0.0.0:9300->9300/tcp, :::9300->9300/tcp   es_source

The containers that you have created look good! Let's dive into the usage of Elasticdump.

Working with Elasticdump

With a container ready, we can start working with Elasticdump.

Restoring the Index

The clusters are ready, but they look empty. Let's create an index and load data to one of the clusters. The es_source cluster runs on port 9200 and can be accessed via http://localhost:9200. Run the following commands to download the mapping and the data to be loaded:

$ wget https://raw.githubusercontent.com/StackAbuse/moving-elasticsearch-indices-with-elasticdump/main/logs_mapping.json
$ wget https://raw.githubusercontent.com/StackAbuse/moving-elasticsearch-indices-with-elasticdump/main/logs_data.json

Let's try loading these files to the es_source cluster under the index log-2021-06-01 using Elasticdump:

$ elasticdump --input=logs_mapping.json --output=http://localhost:9200/log-2021-06-01 --type=mapping
$ elasticdump --input=logs_data.json --output=http://localhost:9200/logindex --type=data

There are three flags here: --input corresponds to the JSON file that we have downloaded, --output corresponds to the cluster endpoint, and --type corresponds to either mapping or data which defines what to export.

Elasticdump takes the input as these files and writes the output to the log-2021-06-01 index as:

$ elasticdump --input=logs_mapping.json --output=http://localhost:9200/log-2021-06-01 --type=mapping
Sat, 05 Jun 2021 12:30:52 GMT | starting dump
Sat, 05 Jun 2021 12:30:52 GMT | got 1 objects from source file (offset: 0)
Sat, 05 Jun 2021 12:30:59 GMT | sent 1 objects to destination elasticsearch, wrote 1
Sat, 05 Jun 2021 12:30:59 GMT | got 0 objects from source file (offset: 1)
Sat, 05 Jun 2021 12:30:59 GMT | Total Writes: 1
Sat, 05 Jun 2021 12:30:59 GMT | dump complete

$ elasticdump --input=logs_data.json --output=http://localhost:9200/log-2021-06-01 --type=data
Sat, 05 Jun 2021 12:31:23 GMT | starting dump
Sat, 05 Jun 2021 12:31:24 GMT | got 100 objects from source file (offset: 0)
Sat, 05 Jun 2021 12:31:26 GMT | sent 100 objects to destination elasticsearch, wrote 100
Sat, 05 Jun 2021 12:31:26 GMT | got 100 objects from source file (offset: 100)
Sat, 05 Jun 2021 12:31:27 GMT | sent 100 objects to destination elasticsearch, wrote 100
Sat, 05 Jun 2021 12:31:27 GMT | got 100 objects from source file (offset: 200)
Sat, 05 Jun 2021 12:31:28 GMT | sent 100 objects to destination elasticsearch, wrote 100
Sat, 05 Jun 2021 12:31:28 GMT | got 0 objects from source file (offset: 300)
Sat, 05 Jun 2021 12:31:28 GMT | got 0 objects from source file (offset: 300)
Sat, 05 Jun 2021 12:31:28 GMT | Total Writes: 300
Sat, 05 Jun 2021 12:31:28 GMT | dump complete

Let's check if the restoration is completed successfully:

$ curl localhost:9200/_cat/indices
yellow open log-2021-06-01 UD4SRzu-TjCNdVZatdOQsA 1 1 300 0 305.5kb 305.5kb

There are 300 documents in Elasticsearch constituting a total size of 305.5kb.

Copying the Index Across Clusters

The syntax seen above is similar across the usage of Elasticdump for different scenarios. Let's now copy the index created previously into a new index on another cluster:

$ elasticdump --input=http://localhost:9200/log-2021-06-01 --output=http://localhost:9400/log-2021-06-01
Sat, 05 Jun 2021 10:05:24 GMT | starting dump
Sat, 05 Jun 2021 10:05:24 GMT | got 100 objects from source elasticsearch (offset: 0)
Sat, 05 Jun 2021 10:05:28 GMT | sent 100 objects to destination elasticsearch, wrote 100
Sat, 05 Jun 2021 10:05:28 GMT | got 100 objects from source elasticsearch (offset: 100)
Sat, 05 Jun 2021 10:05:28 GMT | sent 100 objects to destination elasticsearch, wrote 100
Sat, 05 Jun 2021 10:05:28 GMT | got 100 objects from source elasticsearch (offset: 200)
Sat, 05 Jun 2021 10:05:30 GMT | sent 100 objects to destination elasticsearch, wrote 100
Sat, 05 Jun 2021 10:05:30 GMT | got 0 objects from source elasticsearch (offset: 300)
Sat, 05 Jun 2021 10:05:30 GMT | Total Writes: 300
Sat, 05 Jun 2021 10:05:30 GMT | dump complete

Elasticdump will create a new index for you if the output index doesn't exist. Let's again verify if the index has been created properly.

$ curl localhost:9400/_cat/indices
yellow open log-2021-06-01 UD4SRzu-TjCNdVZatdOQsA 1 1 300 0 305.5kb 305.5kb

Conclusion

As simple as it sounds, the Elasticdump tool is a must-have tool for anyone dealing with Elasticsearch.

Last Updated: June 6th, 2021

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Sathiya Sarathi GunasekaranAuthor

Pythonist 🐍| Linux Geek who codes on WSL | Data & Cloud Fanatic | Blogging Advocate |
Author

Want a remote job?

    Prepping for an interview?

    • Improve your skills by solving one coding problem every day
    • Get the solutions the next morning via email
    • Practice on actual problems asked by top companies, like:
     
     
     

    © 2013-2021 Stack Abuse. All rights reserved.