What's the easiest way of moving Elastic Search data between servers

Question

I've got Elastic Search v6.1.0 installed on Windows and Centos7 machines. The goal is to migrate data from Win to Centos7 machine.

Since they both have the same ES version, I simply dragged "data" folder from machine A to B. When I checked its health, its status was red and active_primary_shards was 0. So I reversed the changes I made.

What other methods are there? Can Snapshot/Restore method be used for this purpose? I think it's for migrating between different versions.

So the question is, what's the best/easiest method for moving data between 2 servers with same ES versions?

Copying data folders should work as along as the config (cluster name) is also the same. Is there any difference in the configuration of both clusters? — Val
– Val, Commented Jan 28, 2018 at 5:12
Possible duplicate of How to copy some ElasticSearch data to a new index — Jim G.
– Jim G., Commented Jun 11, 2019 at 20:10

Nikolay Vasiliev · Accepted Answer · 2018-01-28 15:11:30Z

Using snapshot/restore

You can perfectly use snapshot/restore for this task as long as you have a shared file system or a single-node cluster. The shared FS should meet the following criteria:

In order to register the shared file system repository it is necessary to mount the same shared filesystem to the same location on all master and data nodes.

So it's not a problem if you have a single-node cluster. In this case just make a snapshot and copy it over to other machine.

It might though be a challenging task if you have many nodes running. You may use one of the supported plugins for S3, HDFS and other cloud storages.

The advantage of this approach is that the data and the indices are snapshotted entirely.

Using `_reindex` API

It might be easier to use _reindex API to transfer data from one ES cluster to another. There is a special Reindex from Remote mode that allows exactly this use case.

What reindex actually does is a scroll on the source index and a lot of bulk inserts to the target index (which can be remote).

There are couple of issues you should take care of:

setting up the target index (no mapping, no settings will be set by reindex)
if some fields on the source index are excluded from _source then their contents won't be copied to the target index

Summing up

For snapshot/restore

Pros:

all data and the indices are saved/restored as they are
2 calls to the ES API are needed

Cons:

if cluster has more than 1 node, you need to setup a shared FS or to use some cloud storage

For `_reindex`

Pros:

Works for cluster of any size
Data is copied directly (no intermediate storage required)
1 call to the ES API is needed

Cons:

Data excluded from _source will be lost

Here's also a similar SO question from some three years ago.

Hope that helps!

Thanks for your comment, Nikolay. I ended up using snapshot/restore method. Took only 15 minutes!

Collectives™ on Stack Overflow

What's the easiest way of moving Elastic Search data between servers

1 Answer 1

Using snapshot/restore

Using `_reindex` API

Summing up

For snapshot/restore

For `_reindex`

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Using snapshot/restore

Using _reindex API

Summing up

For snapshot/restore

For _reindex

1 Comment

Linked

Related

Using `_reindex` API

For `_reindex`