vi Internals

PavelChekov · **Posted:** Thu Apr 07, 2022 6:04 pm

Hey,

I was curious about how vi works internally, and started looking around, but short of browsing the source code, I could not find anything on how vi works internally. All that popped up was vi tutorials. Does anyone know of any descriptions of vi's internals (not necessarily vim, by the way)?

Thanks

This could probably go under General Programming, but I saw editor-related discussion here.

Live Long and Prosper

klange · **Joined:** Wed Mar 30, 2011 12:31 am **Posts:** 676

There are many different topics to cover in editor dev:
- The actual editing and how it interfaces with files (quite a lot of the complexity in older editors is around editing files that can't fit in memory)
- Managing the "screen"
- Syntax highlighting
... and so on.

I wrote my own editor that works similarly to vim - both from a user perspective, and sort of how it's built, so I might be able to explain some aspects in more detail.

PavelChekov · **Posted:** Thu Apr 07, 2022 6:52 pm

Have you written a particular work or know of a particular reference that you recommend I consult?

Thanks

klange · **Joined:** Wed Mar 30, 2011 12:31 am **Posts:** 676

While I didn't follow it myself, kilo is a wonderful step-by-step guide on building an editor that explains things pretty well, and it also works very similarly to my own.

PavelChekov · **Posted:** Thu Apr 07, 2022 7:09 pm

The kilo tutorial was indeed very interesting and intuitive, however it
differed in many ways from the vi family, most notably in editing the files
in memory instead of a temporary file. I am more interested specifically in
the vi family of editors and their specific architecture.

I would also like to make it clear that I am not in any way dissing the
kilo tutorial; it seems a very good start for editor dev in general.

Thanks

klange · **Joined:** Wed Mar 30, 2011 12:31 am **Posts:** 676

Old-school editors always had to operate on the assumption files wouldn't fit into memory - after all, when you have mere kilobytes available and even common source files run in the multiples of that, only the shortest of files are going to fit. The main difficulty in editing files without reading them into memory is that we like to think of files as a series of lines when they are actually a series of bytes with no concept of lines. We like to be able to insert text, but if we were doing this directly on a file it would mean constantly rewriting the file from the insertion point onward to move the contents forward. Because of this, these editors don't actually edit files on disk directly. A Vim swap file, as you may have noticed by looking at one, does not contain the text of your file - it's not just a temporary working copy. Not directly, at least. Instead, swap files are a serialization of operations you have performed on the file. They work in combination with the original file, and the editor is able to seek around and collect the data it needs to display the lines that are actually visible in the window. Since the changes you would make to a file could potentially be orders of magnitude larger than the original file, obviously this serialization to disk was necessary if dealing with the original file on disk was necessary. These days, though, it's not actually necessitated by memory constraints - indeed, you can turn off swap files entirely in Vim if you want, and your changes will live only in memory, including all of the kilobytes of new text you write. Swap files persist for another reason, in vim, though: As a backup. If your editor crashes, the swap file remains on disk and can be used to recover your editing session and all of those changes you hadn't saved. You might realize from this that if you can tell Vim to not use a swap file, and if you open a new file that doesn't exist yet, then everything will be in memory anyway.

Solar · **Posted:** Fri Apr 08, 2022 12:04 am

Both vi and vim are ancient code bases. Bill Joy (author of vi) was rather infamous for... how to put this... putting results over how he got there. So vi might not be the best place to start.

Vim started as a one-man effort to write a vi clone that would work on AmigaOS, and then grew into what it is today over... jeeeeez, has it really been three decades already? Well, Vim too attracted some cruft over all that time, and Bram has been somewhat hard-handed in what kind of rework he would accept into the code base.

At least that is what the people at neovim thought. Their project is the one that had code structure and maintainability in mind right from the get-go. So as much as it pains me as a long-time Vim user, but if it is internals you want to look at, neovim is your best bet. (They actually have a "developer" section in their documentation, so...)

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

PavelCheckov wrote:

Hey,

I was curious about how vi works internally, and started looking around, but short of browsing the source code, I could not find anything on how vi works internally. All that popped up was vi tutorials. Does anyone know of any descriptions of vi's internals (not necessarily vim, by the way)?

Thanks

This could probably go under General Programming, but I saw editor-related discussion here.

Live Long and Prosper

I agree with Solar that neovim has the most readable code base. But, on the other side, it's a fairly big project. If you wanna take a look at a minimal implementation of VI, you could take a look at BusyBox's implementation: https://github.com/mirror/busybox/blob/master/editors/vi.c. One file, 5k lines of code.

Other than that, my personal advice would be study a bit how consoles work (escape sequences, modes etc.). That will help you better understand the source code of such an editor.

klange · **Joined:** Wed Mar 30, 2011 12:31 am **Posts:** 676

vvaltchev wrote:

I agree with Solar that neovim has the most readable code base. But, on the other side, it's a fairly big project. If you wanna take a look at a minimal implementation of VI, you could take a look at BusyBox's implementation: https://github.com/mirror/busybox/blob/master/editors/vi.c. One file, 5k lines of code.

I think Busybox's vi actually loads entire files into memory, and it doesn't have functionality like syntax highlighting, so it's quite far separated from how classic vi or modern Vim work - but it might be a good reference for how the command language and navigation is implemented.

PavelChekov · **Posted:** Fri Apr 08, 2022 5:03 pm

klange wrote:

… A Vim swap file, as you may have noticed by looking at one, does not
contain the text of your file - it's not just a temporary working copy.
Not directly, at least. Instead, swap files are a serialization of
operations you have performed on the file. They work in combination with
the original file, and the editor is able to seek around and collect the
data it needs to display the lines that are actually visible in the window. …

As you say, swap files are a "serialization of operations". When I cat
small files' swap files, they simply contain the text of the file as well
as a bit of header, however larger ones contain unintelligible gibberish
that screws up the output of my terminal (changes characters to weird
alternate characters). How are these stored? Is there some standard
for this?

Thanks

Solar · **Posted:** Wed Apr 13, 2022 6:18 am

The format is not formally documented (since it was only ever intended for internal use), and not guaranteed to be portable.

There is a Perl script to partially decode a swapfile, which might provide some insight. For the real deal, you will have to look into Vim itself.

Schol-R-LEA · **Posted:** Wed Apr 13, 2022 8:05 am

This topic has come up before, here and here. I will confess that in the former instance, I made several comments which were factually incorrect and somewhat wrong-headed.

While it focuses on EMACS-style editors rather than vi, and is rather showing its age as well, the book The Craft of Text Editing is AFAICT the standard textbook on editor internals - and better still, it is now available on the author's web site for free. The book covers various editor designs, and does discuss vi-style editors, though not in detail IIRC.

klange · **Joined:** Wed Mar 30, 2011 12:31 am **Posts:** 676

PavelCheckov wrote:

As you say, swap files are a "serialization of operations". When I cat
small files' swap files, they simply contain the text of the file as well
as a bit of header, however larger ones contain unintelligible gibberish
that screws up the output of my terminal (changes characters to weird
alternate characters). How are these stored? Is there some standard
for this?

Thanks

A Vim swap file, as its name correctly suggests, is essentially the same as a swap partition or page file. Behind an interface that allows other code to get and modify individual lines, Vim implements a virtual memory system. There are two layers to this, one that deals in blocks (pretty much the same way any other virtual memory system would deal with paging to disk) and another that deals with lines and presents the interface through which other code can deal with files. Vim essentially turns every file it reads into a series of lines, and can either keep those all in memory if you have the space for them, or semi-lazily push them out to the paging system. Vim also, under default configurations, will periodically ensure everything in memory is flushed to the swap file.

The reason why you can open some swap files and essentially see your raw text is because that's what's in a line structure in Vim - alongside information about where the line is, how much data it has in it, and a few other things. There's also a header for the whole swap file that says what file it is, who was running Vim to produce the swap, what process it was made by, and so on. Vim generally treats files as a big array (or rather, a big tree-index) of lines, which are then raw blobs of bytes. Other parts of Vim are expected to deal with one line at a time, and there's some optimizations in the methods to retrieve lines if you're going through sequentially, but the API is presented such that you ask for a line, it is retrieved from disk if needed, and remains in memory accessible as a direct blob of bytes until you ask for another one. If you make modifications, you mark the line dirty, and it will get rewritten to the disk as part of a flush.

Syntax highlighting in Vim is out-of-band, and doesn't get stored with the swap. If you've ever opened a huge file and tried to scroll to the end you've probably noticed Vim has a horrible time of recoloring, in part because it needs to swap in and out the entire file as the highlighter goes through to figure out the state at the bottom, and in part because it doesn't cache that if you then need to go back up to the top.

Beyond the swap mechanism, Kilo treats text very similarly to Vim, keeping the actual text of a line in an untouched buffer and holding an array for color information for syntax highlighting separately. If you wrapped all of the line array accesses in Kilo in a function call, you could implement a swap mechanism like Vim pretty easily.

I do something very different from either of those in Bim, as I treat lines as a series of Unicode codepoints, similar to how Python and Kuroko deal with strings. I store those codepoints as an array of unsigned 32-bit integers and use the upper 11 bits (you only need 21 bits to represent a codepoint) for coloring. Bim, too, could have a swap mechanism implemented if I hid all of the line array accesses behind function calls. Not a bad idea for a future improvement, really...

PavelChekov · **Posted:** Tue May 03, 2022 5:48 pm

klange wrote:

Behind an interface that allows other code to get and modify individual lines, Vim implements a virtual memory system. There are two layers to this, one that deals in blocks (pretty much the same way any other virtual memory system would deal with paging to disk) and another that deals with lines and presents the interface through which other code can deal with files. Vim essentially turns every file it reads into a series of lines, and can either keep those all in memory if you have the space for them, or semi-lazily push them out to the paging system. Vim also, under default configurations, will periodically ensure everything in memory is flushed to the swap file.

@klange, what do you mean by "semi-lazily push them out to the paging system"?

EDIT: I read through the elvis source code, which appears to be the most readable codebase of the vi family, and I think I understand the paging system.

PavelChekov · **Posted:** Tue May 10, 2022 6:58 pm

I now understand how the actual paging system works, but how much of the actual file is loaded into the buffer?

OSDev.org

vi Internals

Who is online