1

Here is a quote from the Elisp documentation about markers:

A “marker” is a Lisp object used to specify a position in a buffer relative to the surrounding text. A marker changes its offset from the beginning of the buffer automatically whenever text is inserted or deleted, so that it stays with the two characters on either side of it.

I am curious how they implement this. It's not obvious how to do this efficiently.

Let's imagine a buffer with 1000 characters in it, so it has 1001 possible positions for point. And there is a marker m1 right after the 500th character:

 v---- marker here, at 501st position. [ 1 2 3 4 5 ... 499 500 [m1] 501 .... 998 999 1000 ] 

Now let's say we insert 10 new characters at position 200, before the marker. If the marker were stored as an integer (501), it would now be invalid. Its value should be 511. One implementation strategy could be to store all the markers in an array, sorted by their position, when text is inserted, you walk through all the markers greater than the insert position and increment their integers. This would be O(N) in the number of buffer markers, so not great if you had a lot of markers in the buffer.

What algorithm and data structure do they use to implement these "moving markers" efficiently?

6
  • Overview of markers says: " A marker has three attributes: the marker position, the marker buffer, and the insertion type. The marker position is an integer that is equivalent (at a given time) to the marker as a position in that buffer. But the marker's position value can change during the life of the marker, and often does." Commented Jul 14, 2024 at 19:36
  • In the same section, you will find later on: " Insertion and deletion in a buffer must check all the markers and relocate them if necessary. This slows processing in a buffer with a large number of markers. For this reason, it is a good idea to make a marker point nowhere if you are sure you don't need it any more." Commented Jul 14, 2024 at 19:37
  • 1
    AFAIK, that's the real algorithm: there is a linked list of markers for each buffer and any insertion/deletion traverses the list and resets positions (see e.g. insert in src/insdel:674, insert_1_both in src/insdel.c:884 and adjust_markers_for_insert in src/insdel.c:288, for one possible path - all line numbers are for upstream sources from May 16 (head commit 650ee9e071eea2ff8504e056131efa9f6ac894e9) - your copy might be slightly different). Commented Jul 15, 2024 at 4:43
  • 1
    @NickD: Was this not sufficiently a question "about the language itself" for the purposes of the "elisp" tag? (If the tag is not about the usage of the language, and it's not about the implementation of the language, then when is it relevant?) Commented Oct 4 at 4:38
  • 1
    You are right: this is one instance where the tag is appropriate. I restored it. Commented Oct 4 at 17:04

1 Answer 1

2

buffer.h defines markers in struct buffer_text like so:

/* The markers that refer to this buffer. This is actually a single marker --- successive elements in its marker `chain' are the other markers referring to this buffer. This is a singly linked unordered list, which means that it's very cheap to add a marker to the list and it's also very cheap to move a marker within a buffer. */ struct Lisp_Marker *markers; 

This would be O(N) in the number of buffer markers, so not great if you had a lot of markers in the buffer.

I think you'd be trading off performance somewhere else if you want it to be better than O(N). In any case, I expect it's always been "fine in practice", in which case there'd be little reason to complicate the situation.

Considering that a lot of other code (including a lot of interpreted elisp) is likely being executed along with any text change, you'd probably need some pretty extraordinary number of markers before the C code managing them was having any noticeable effect on performance. Although, as NickD comments, the manual does recommend that code dealing with markers should explicitly inhibit them when they are finished with in order to avoid the unnecessary upkeep. GC will also eliminate garbage markers when it runs.

You could always set up some artificial test cases and find out what it takes. No doubt it's possible (and certainly system dependent), but if you do have any practical code which is exceeding this threshold on your system, there might be a Better Way of writing that code.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.