I want to implement tracking of changes in plain-text documents, in a way similar to how it works in MS Word or Apple Pages. What I am unsure of is the data model and how to store it.
Goal
The expected properties:
- disk space frugality
- it should be possible to revert all the changes and get the original document
- multiple users can suggest changes
- the suggested changes can stack (it should be possible to do an edit of an edit)
Source-code version control systems like Git orient themselves by newlines. In plain-text documents, there can be a page-long text without a single newline character, which would make diffs unnecessarily big. So Git's out. Due to (1.) I want to avoid storing copies of the document and calculate the deltas. Desktop publishing software like MS-Word or Libre Office use an abstract tree of document nodes instead of plain-text, so I can’t use their approach. I really like Critic Markup (found it mentioned in this discussion).
Suggested solution
My plan is to use a modified version of Critic Markup where:
I. nesting would be allowed
II. each suggested change would also include an id pointing to a DB entry with change metadata: author_id, created, modified, accepted, accepted_by, rejected, rejected_by etc.
Due to (I.) a parser will be required, but that’s ok. An example of such a document:
Don't go around saying{-- to people that--|d76b979c-d7a8-11e8-9f8b-f2801f1b9fd1} the world owes you a living. The world owes you nothing. It was here first. {~~One~>Only one~~|d21ef228-9cdb-447c-97b9-cccdee58e36c} thing is impossible for God: To find sense in any copyright law on the planet. {++ Truth is stranger {++(at least that’s what they say)++|cc6e3998-d649-4b85-8ba1-25c1aaeb1d91}than fiction ++|f41939f6-f203-43a4-9ac4-97690ebf1c8f}, but it is because Fiction is obliged to stick to possibilities; Truth isn’t.
Notice the nesting in the addition {++ … ++} block.
Nesting would be allowed only in accepted changes. Those would render as normal text (maybe with some onMouseHover additional info), whereas rejected edits, although still part of the document source, would not be rendered at all. When loading such a document, the parser would first parse it and create the tree of text and changes. All the changes metadata would be fetched from the DB. With all this information, the document would be subsequently rendered.
Problems
Changes that span two "change nodes", or belong partly to a "change node" and partly to normal text. Example, in which such a change should concern the following sub-text "plums. And she":
Then she ate the apple{++, pear and some plums++|cc6e3998-d649-4b85-8ba1-25c1aaeb1d91}. And she liked it.
Possible solutions:
- Split the change into two "change nodes"
- Give up (2.) and (4.) and after the user accepts or rejects all the suggested changes and marks the document as resolved, simply turn the document back into a simple plain-text document.
What are your thoughts on this? Are there any better ways of achieving the goal from the title? Any complications that I don’t see? Thanks!