2

Platform: Ubuntu 10.04 x86.

We have a HTTP server (nginx, but that is not relevant) which serves some static content. Content is (rarely) uploaded by content-managers via SFTP, but may be changed / added my some other means (like a cat, made directly on server).

Now we want to add a second, identical HTTP server — a slave mirror in another data-center on another continent. (And setup DNS round-robin.)

What is the best way to set up synchronization between master server and a slave mirror, so that delay between modification and re-syncronization is minimal (a few seconds should be bearable though)?

The solution must cope with large changesets and race conditions. That is, if I change 1000 files, it should not spawn 1000 syncronization processes. And if I change something while synchronization is active, my new change must eventually make it to the server as well... And so on.

Rejected solutions:

  • CDN — does not worth the money for our particular usage scenario.
  • nfs — not over global internet.
  • dumb cron + rsync — latency and/or system load would be too large.
  • manual rsync — not reliable, content is changed by non-IT users.

I would say that we need something based on inotify. Is there a ready solution?

Update: two extra (rather obvious) requirements that I forgot to mention:

  • If data is somehow changed on the slave mirror (say, a superuser accidentally deleted a file), synch solution must restore data back to the master state on the next sync.

  • In idle state the solution must not consume traffic or system resources (other than some memory etc. for the sleeping daemon process of course).

Update 2: one more requirement:

  • The solution must work with UTF-8 file names.
5
  • It looks similar to serverfault.com/questions/157901/… Commented Jun 7, 2011 at 19:59
  • lsyncd see: serverfault.com/questions/7969/… Commented Jun 7, 2011 at 20:01
  • @Mircea: please add lsyncd as a regular answer, so it can be upvoted / discussed properly. ;-) Commented Jun 7, 2011 at 20:03
  • Wait, seriously, how did you get the cat to make server content? Do you work for WikiHow or some other content farm? Commented Jun 7, 2011 at 23:36
  • Well, something like cat >crossdomain.xml and type a bit (or just paste into the terminal). It is a rare event, but it can happen, and synch solution must be ready. ;-) The point is that I can't use SFTP hook or something like this — multiple potential sources of changes. Commented Jun 8, 2011 at 6:22

4 Answers 4

1

What about pirsyncd? I think it`s good idea for you ;)

1

Have you considered Unison as a means to keeping files in sync? Using it, you'd be able to do the one-way sync you're requesting. It seems like a reasonable fit for this application.

1
  • It does not work with UTF-8 file names. Added this requirement to the question. Commented Jun 7, 2011 at 22:34
1

You could use lsyncd see: Is there a working Linux backup solution that uses inotify?

2
  • It does not restore files, deleted on mirror. (I've updated question to reflect this requirement.) Commented Jun 7, 2011 at 22:33
  • (Or maybe I'm missing something... Will try again.) Commented Jun 7, 2011 at 23:07
-2

Seems like this is where you might want to write a script that checks on timestamps of files and if the timestamp is later than last run of script, assume that file needs to be pushed, then trigger rsync or some other tool to synchronize the file. Likewise, on the other side, do the same thing with checking if a file has been changed, and if so, trigger a pull. Fabric might actually be a good tool for this. If you are familiar with Python, using fabric may be the way to go, in combination with timestamp checking.

2
  • Sorry, but (1) I explicitly said that the solution should be invoked automatically and (2) this task is too full of potential pitfalls (again, see the question for an incomplete list) to write a script by hand without trying existing solutions. Commented Jun 8, 2011 at 19:24
  • I personally do not believe this would take a lot of work to write, and without cron this could be run as a simple daemon, which is completely hands-off. This is a lightweight solution in general, and I would argue that it has advantages over other possible solutions. In fact, I had something similar to these requirements, and had implemented something similar to this. Process basically ran as a daemon, and after execution would go to sleep for a preset amount of time. There was a checker script run by puppet to make sure the process if it ever died would get restarted. Commented Jun 9, 2011 at 2:52

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.