Tracking column (offset) in presence of tab characters

Question

Imagine that you first write a compiler for your language where you necessarily report errors to the user. Compiler also collects location information for backend tools. They must know where the program elements are located. Later, when you are done with your compiler, you decide to provide IDE support as well. The editor is actually one more back-end tool. Having correct locations for program components helps a lot to syntax highlighting and error reporting. At this moment, you suddenly realize that locations reported by compiler are questionable.

It seems like EOL definitions is more or less specified in the language so that you can report lines correctly -- there is always a agreement between compiler and editor. But what about the column? If compiler reports that there is a blunder for an identifier located at line:col, editor may wonder, highlighting something different, depending the Tab settings. It seems impossible to have exact line:col location, no matter how useful it is, if tab width in the editor-specific. Nevertheless, I see that JavaCC provides getLine altogether with getBeginColumn method. I wonder how is it implemented, how is it possible in principle to track the offset? How does lexer match your Editor's width?

It's really hard for me to tell what you're asking for here. Customizing IDE settings? Implementing your own IDE? Telling JavaCC to assume tabs are X spaces? Upon closer reading my best guess is that you're doing the second and asking how to accomplish the third. But that would be off-topic here since Programmers.SE is for conceptual questions about software design and development, not how to implement specific behavior in a given language or programming tool. — Ixrec
– Ixrec, Commented Jan 3, 2016 at 13:05
I would implement it like this: Count histogram of preceeding spaces in front of your code line after last tab (over file). eg. if spaces are 2,4,6 -> tabwidth is 8 if there are 4 spaces the tabwidth is 8 if 2 spaces tabwidth is 4. — thepacker
– thepacker, Commented Jan 3, 2016 at 18:25
If your editor is smart enough to expand a \t character into 4 or 8 or whatever space characters, it can be smart enough to take this into account when dealing with column positions specified by an external tool. In fact, "where do i render this letter" and "where do I highlight that error" are largely the same question, and should be handled by the same code. — Kilian Foth
– Kilian Foth, Commented Feb 2, 2016 at 9:35

eyeezzi · Accepted Answer · 2020-08-24 20:11:11Z

Sometimes the simplest solution is the best. Simply initialize your lexer with a tab width. To be more dynamic, your IDE can expose an environment variable which the lexer reads.

There's no agreement on what a tab width should be, it's an entirely user configurable value so your lexer must also be user-configurable.

The current lexer I'm writing in Go has the following definition.

type Lexer struct { input string // for error reporting curLine int curCol int tabWidth int } func New(input string, tabWidth int) *Lexer { l := &Lexer{input: input, tabWidth: tabWidth} // initializes lexer's state l.curLine = 1 l.curCol = 1 return l }

Valentin Tihomirov · Accepted Answer · 2016-01-04 06:45:16Z

JavaCC answers this way

The generated parser has a JavaCharStream named jj_input_stream that has setTabSize(int) and getTabSize() methods. Use the tab width value from your IDE and pass it to the setTabSize method, this will make the token locations more accurate.

Mine proposal is better. The compiler should stuck at constant tab width, say 1 to count tab as any other character. The editor, which is aware of both the current tab width as well as the number of tabs on the current line (editor can easily compute this information since it has the file loaded in memory and already displays the line of code), adjusts the column information from the compiler accordingly. The only problem would be programmer's surprise when he juxtaposes the compiler-reported locations with the objects selected in the editor.

Stack Exchange Network

Tracking column (offset) in presence of tab characters

2 Answers 2

Hot Network Questions

Tracking column (offset) in presence of tab characters

2 Answers 2

Related

Hot Network Questions