Skip to main content
added 1401 characters in body
Source Link
Kain0_0
  • 16.6k
  • 20
  • 40

Delta

What you want to do is transfer the delta. That is the difference between your local copy and the remote copy.

One way to do this is via an Event Log. The log contains each operation/data change applied in order. The client maintains the id of the last event received, when syncing the events since that event are retrieved.

Another way to do this is to create a snapshot and determine the difference between it and the previous snapshot. Transmit those differences.

Of course the amount of change might be so high that your current solution of compressing a full db might be smaller. So it does make sense to hae some smarts to discard stupid events, or choose between the delta stream and a compressed copy.


Checksums

The first issue is detecting that the copy is bad. Checksums work a treat here. How you apply them will depend on what you are transferring.

If you are transferring the full db, then a single checksum is enough. Trying to checksum each block or smaller unit isn't helpful because there is no meaningful recovery method. You have to recopy the file regardless.

If you chunk the db into segments (split into separate files) then it makes sense to checksum each segment. If one segment is corrupted you need only recopy that segment.

If you are doing this on an event stream then you can checksum each event (or block of events). This is even better because you can at least restore the first X uncorrupted events while you are attempting to retrieve that xth corrupted event. Its also helpful in that you know at what time your db matched the remote (due to timestamps).

Self-Healing

These take checksums to their next level. They add a lot more data but if the connection regularly flips bits, it has a chance of restoring the bits back to their original state, thus avoiding the need to retransmit.

Some of the simpler schemes add about 3/4th more data, so it can be more efficient than retransmitting the same copy twice, with the hope that one copy gets through error free. But it will depend on the kinds of errors you are getting over the connection.

Delta

What you want to do is transfer the delta. That is the difference between your local copy and the remote copy.

One way to do this is via an Event Log. The log contains each operation/data change applied in order. The client maintains the id of the last event received, when syncing the events since that event are retrieved.

Another way to do this is to create a snapshot and determine the difference between it and the previous snapshot. Transmit those differences.

Of course the amount of change might be so high that your current solution of compressing a full db might be smaller. So it does make sense to hae some smarts to discard stupid events, or choose between the delta stream and a compressed copy.

Delta

What you want to do is transfer the delta. That is the difference between your local copy and the remote copy.

One way to do this is via an Event Log. The log contains each operation/data change applied in order. The client maintains the id of the last event received, when syncing the events since that event are retrieved.

Another way to do this is to create a snapshot and determine the difference between it and the previous snapshot. Transmit those differences.

Of course the amount of change might be so high that your current solution of compressing a full db might be smaller. So it does make sense to hae some smarts to discard stupid events, or choose between the delta stream and a compressed copy.


Checksums

The first issue is detecting that the copy is bad. Checksums work a treat here. How you apply them will depend on what you are transferring.

If you are transferring the full db, then a single checksum is enough. Trying to checksum each block or smaller unit isn't helpful because there is no meaningful recovery method. You have to recopy the file regardless.

If you chunk the db into segments (split into separate files) then it makes sense to checksum each segment. If one segment is corrupted you need only recopy that segment.

If you are doing this on an event stream then you can checksum each event (or block of events). This is even better because you can at least restore the first X uncorrupted events while you are attempting to retrieve that xth corrupted event. Its also helpful in that you know at what time your db matched the remote (due to timestamps).

Self-Healing

These take checksums to their next level. They add a lot more data but if the connection regularly flips bits, it has a chance of restoring the bits back to their original state, thus avoiding the need to retransmit.

Some of the simpler schemes add about 3/4th more data, so it can be more efficient than retransmitting the same copy twice, with the hope that one copy gets through error free. But it will depend on the kinds of errors you are getting over the connection.

Source Link
Kain0_0
  • 16.6k
  • 20
  • 40

Delta

What you want to do is transfer the delta. That is the difference between your local copy and the remote copy.

One way to do this is via an Event Log. The log contains each operation/data change applied in order. The client maintains the id of the last event received, when syncing the events since that event are retrieved.

Another way to do this is to create a snapshot and determine the difference between it and the previous snapshot. Transmit those differences.

Of course the amount of change might be so high that your current solution of compressing a full db might be smaller. So it does make sense to hae some smarts to discard stupid events, or choose between the delta stream and a compressed copy.