How to encode text highlights from Javascript to an API

Question

I'm working on an app that enables users to collaborate (typically by highlighting/noting specific spans of text) on text articles.

I'll have an API that serves up the documents in some form (they're in .doc format right now, but I'd like to deliver them in something like Markdown). I can safely assume the articles' content will not change.

I'm currently stuck on the encoding format of these highlights. The problem is that these articles have some formatting on them (i.e., blockquotes where the author will cite from another external article, as well as the typical line breaks and paragraph spacing), and so the client would interpret that formatting differently than the server would.

For example, Markdown uses > characters to denote content in a blockquote, while HTML uses <blockquote> - so in this case, my Javascript code would - when a user highlights text that lives in a <blockquote> - need to do some messy calculations to get the correct character offsets.

Ultimately, I'd like to always be working with character offsets on the server as follows:

// e.g. // from the 55th character to the 58th character // offset = [55, 3]

I've briefly considered a couple other ways:

send the article to the client in HTML, although this would yield the same problem, as I'd need to add CSS classes and such to the HTML markup
send the article content as an array of strings (split on each line break in the article) and give a type to each item (e.g., 'normal' or 'blockquote') - though this seems like a naive way to approach this problem

Is there some other cleaner way of encoding these highlights from the client that I'm missing?

EDIT: For more clarity - this would be a client-side app (requiring a modern browser).

Mike Sokolov · Accepted Answer · 2013-03-05 04:32:20Z

This is a challenging problem; there's not a simple answer. You will need to come up with some invariant that is easy to work with, ideally allowing you to work with different formatting markup languages. I'd recommend storing offsets into the document text (ignoring markup). Of course it may not be easy to get text offsets from your editor. If you are working in a browser, the content will be HTML I think, and you will be getting offsets from the browser's selection object, which doesn't provide the text offset you would ideally want. You can compute it though, with some fancy XPath. Even if you do that, you still may have problems if you convert the document into another format since it's likely that newlines may be converted or removed.

So I think the answer is: no, there is no magic bullet, and you are on the right track.

Collectives™ on Stack Overflow

How to encode text highlights from Javascript to an API

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related