4

I've got Elastic Search v6.1.0 installed on Windows and Centos7 machines. The goal is to migrate data from Win to Centos7 machine.

Since they both have the same ES version, I simply dragged "data" folder from machine A to B. When I checked its health, its status was red and active_primary_shards was 0. So I reversed the changes I made.

What other methods are there? Can Snapshot/Restore method be used for this purpose? I think it's for migrating between different versions.

So the question is, what's the best/easiest method for moving data between 2 servers with same ES versions?

2
  • 1
    Copying data folders should work as along as the config (cluster name) is also the same. Is there any difference in the configuration of both clusters? Commented Jan 28, 2018 at 5:12
  • Possible duplicate of How to copy some ElasticSearch data to a new index Commented Jun 11, 2019 at 20:10

1 Answer 1

5

Using snapshot/restore

You can perfectly use snapshot/restore for this task as long as you have a shared file system or a single-node cluster. The shared FS should meet the following criteria:

In order to register the shared file system repository it is necessary to mount the same shared filesystem to the same location on all master and data nodes.

So it's not a problem if you have a single-node cluster. In this case just make a snapshot and copy it over to other machine.

It might though be a challenging task if you have many nodes running. You may use one of the supported plugins for S3, HDFS and other cloud storages.

The advantage of this approach is that the data and the indices are snapshotted entirely.

Using _reindex API

It might be easier to use _reindex API to transfer data from one ES cluster to another. There is a special Reindex from Remote mode that allows exactly this use case.

What reindex actually does is a scroll on the source index and a lot of bulk inserts to the target index (which can be remote).

There are couple of issues you should take care of:

  1. setting up the target index (no mapping, no settings will be set by reindex)
  2. if some fields on the source index are excluded from _source then their contents won't be copied to the target index

Summing up

For snapshot/restore

Pros:

  • all data and the indices are saved/restored as they are
  • 2 calls to the ES API are needed

Cons:

  • if cluster has more than 1 node, you need to setup a shared FS or to use some cloud storage

For _reindex

Pros:

  • Works for cluster of any size
  • Data is copied directly (no intermediate storage required)
  • 1 call to the ES API is needed

Cons:

  • Data excluded from _source will be lost

Here's also a similar SO question from some three years ago.

Hope that helps!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your comment, Nikolay. I ended up using snapshot/restore method. Took only 15 minutes!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.