Page 1 of 2
vi Internals
Posted: Thu Apr 07, 2022 6:04 pm
by PavelChekov
Hey,
I was curious about how vi works internally, and started looking around, but short of browsing the source code, I could not find anything on how vi works internally. All that popped up was vi tutorials. Does anyone know of any descriptions of vi's internals (not necessarily vim, by the way)?
Thanks
This could probably go under General Programming, but I saw editor-related discussion here.
Live Long and Prosper
Re: vi Internals
Posted: Thu Apr 07, 2022 6:28 pm
by klange
There are many different topics to cover in editor dev:
- The actual editing and how it interfaces with files (quite a lot of the complexity in older editors is around editing files that can't fit in memory)
- Managing the "screen"
- Syntax highlighting
... and so on.
I
wrote my own editor that works similarly to vim - both from a user perspective, and sort of how it's built, so I might be able to explain some aspects in more detail.
Re: vi Internals
Posted: Thu Apr 07, 2022 6:52 pm
by PavelChekov
Have you written a particular work or know of a particular reference that you recommend I consult?
Thanks
Re: vi Internals
Posted: Thu Apr 07, 2022 6:55 pm
by klange
While I didn't follow it myself,
kilo is a wonderful step-by-step guide on building an editor that explains things pretty well, and it also works very similarly to my own.
Re: vi Internals
Posted: Thu Apr 07, 2022 7:09 pm
by PavelChekov
The kilo tutorial was indeed very interesting and intuitive, however it
differed in many ways from the vi family, most notably in editing the files
in memory instead of a temporary file. I am more interested specifically in
the vi family of editors and their specific architecture.
I would also like to make it clear that I am not in any way dissing the
kilo tutorial; it seems a very good start for editor dev in general.
Thanks
Re: vi Internals
Posted: Thu Apr 07, 2022 7:56 pm
by klange
Old-school editors always had to operate on the assumption files wouldn't fit into memory - after all, when you have mere kilobytes available and even common source files run in the multiples of that, only the shortest of files are going to fit. The main difficulty in editing files without reading them into memory is that we like to think of files as a series of lines when they are actually a series of bytes with no concept of lines. We like to be able to insert text, but if we were doing this directly on a file it would mean constantly rewriting the file from the insertion point onward to move the contents forward. Because of this, these editors don't actually edit files on disk directly. A Vim swap file, as you may have noticed by looking at one, does not contain the text of your file - it's not just a temporary working copy. Not directly, at least. Instead, swap files are a serialization of operations you have performed on the file. They work in combination with the original file, and the editor is able to seek around and collect the data it needs to display the lines that are actually visible in the window. Since the changes you would make to a file could potentially be orders of magnitude larger than the original file, obviously this serialization to disk was necessary if dealing with the original file on disk was necessary. These days, though, it's not actually necessitated by memory constraints - indeed, you can turn off swap files entirely in Vim if you want, and your changes will live only in memory, including all of the kilobytes of new text you write. Swap files persist for another reason, in vim, though: As a backup. If your editor crashes, the swap file remains on disk and can be used to recover your editing session and all of those changes you hadn't saved. You might realize from this that if you can tell Vim to not use a swap file, and if you open a new file that doesn't exist yet, then everything will be in memory anyway.
Re: vi Internals
Posted: Fri Apr 08, 2022 12:04 am
by Solar
Both vi and vim are
ancient code bases. Bill Joy (author of vi) was rather infamous for... how to put this... putting results over how he got there. So vi might not be the best place to start.
Vim started as a one-man effort to write a vi clone that would work on AmigaOS, and then grew into what it is today over... jeeeeez, has it really been three decades already? Well, Vim too attracted some cruft over all that time, and Bram has been somewhat hard-handed in what kind of rework he would accept into the code base.
At least that is what the people at neovim thought.
Their project is the one that had code structure and maintainability in mind right from the get-go. So as much as it pains me as a long-time Vim user, but if it is
internals you want to look at,
neovim is your best bet. (They actually have a "developer" section in their documentation, so...)
Re: vi Internals
Posted: Fri Apr 08, 2022 8:04 am
by vvaltchev
PavelCheckov wrote:Hey,
I was curious about how vi works internally, and started looking around, but short of browsing the source code, I could not find anything on how vi works internally. All that popped up was vi tutorials. Does anyone know of any descriptions of vi's internals (not necessarily vim, by the way)?
Thanks
This could probably go under General Programming, but I saw editor-related discussion here.
Live Long and Prosper
I agree with Solar that neovim has the most readable code base. But, on the other side, it's a fairly big project. If you wanna take a look at a minimal implementation of VI, you could take a look at BusyBox's implementation:
https://github.com/mirror/busybox/blob/ ... itors/vi.c. One file, 5k lines of code.
Other than that, my personal advice would be study a bit how consoles work (escape sequences, modes etc.). That will help you better understand the source code of such an editor.
Re: vi Internals
Posted: Fri Apr 08, 2022 3:05 pm
by klange
vvaltchev wrote:I agree with Solar that neovim has the most readable code base. But, on the other side, it's a fairly big project. If you wanna take a look at a minimal implementation of VI, you could take a look at BusyBox's implementation:
https://github.com/mirror/busybox/blob/ ... itors/vi.c. One file, 5k lines of code.
I think Busybox's vi actually loads entire files into memory, and it doesn't have functionality like syntax highlighting, so it's quite far separated from how classic vi or modern Vim work - but it might be a good reference for how the command language and navigation is implemented.
Re: vi Internals
Posted: Fri Apr 08, 2022 5:03 pm
by PavelChekov
klange wrote: … A Vim swap file, as you may have noticed by looking at one, does not
contain the text of your file - it's not just a temporary working copy.
Not directly, at least. Instead, swap files are a serialization of
operations you have performed on the file. They work in combination with
the original file, and the editor is able to seek around and collect the
data it needs to display the lines that are actually visible in the window. …
As you say, swap files are a "serialization of operations". When I cat
small files' swap files, they simply contain the text of the file as well
as a bit of header, however larger ones contain unintelligible gibberish
that screws up the output of my terminal (changes characters to weird
alternate characters). How are these stored? Is there some standard
for this?
Thanks
Re: vi Internals
Posted: Wed Apr 13, 2022 6:18 am
by Solar
The format is not formally documented (since it was only ever intended for internal use), and not guaranteed to be portable.
There is a
Perl script to partially decode a swapfile, which might provide some insight. For the real deal, you will have to look into Vim itself.
Re: vi Internals
Posted: Wed Apr 13, 2022 8:05 am
by Schol-R-LEA
This topic has come up before,
here and
here. I will confess that in the former instance, I made several comments which were factually incorrect and somewhat wrong-headed.
While it focuses on EMACS-style editors rather than vi, and is rather showing its age as well, the book
The Craft of Text Editing is AFAICT the standard textbook on editor internals - and better still, it is now
available on the author's web site for free. The book covers various editor designs, and does discuss vi-style editors, though not in detail IIRC.
Re: vi Internals
Posted: Wed Apr 13, 2022 12:49 pm
by klange
PavelCheckov wrote:As you say, swap files are a "serialization of operations". When I cat
small files' swap files, they simply contain the text of the file as well
as a bit of header, however larger ones contain unintelligible gibberish
that screws up the output of my terminal (changes characters to weird
alternate characters). How are these stored? Is there some standard
for this?
Thanks
A Vim swap file, as its name correctly suggests, is essentially the same as a swap partition or page file. Behind an interface that allows other code to get and modify individual lines, Vim implements a virtual memory system. There are two layers to this, one that deals in blocks (pretty much the same way any other virtual memory system would deal with paging to disk) and another that deals with lines and presents the interface through which other code can deal with files. Vim essentially turns every file it reads into a series of lines, and can either keep those all in memory if you have the space for them, or semi-lazily push them out to the paging system. Vim also, under default configurations, will periodically ensure everything in memory is flushed to the swap file.
The reason why you can open some swap files and essentially see your raw text is because that's what's in a line structure in Vim - alongside information about where the line is, how much data it has in it, and a few other things. There's also a header for the whole swap file that says what file it is, who was running Vim to produce the swap, what process it was made by, and so on. Vim generally treats files as a big array (or rather, a big tree-index) of lines, which are then raw blobs of bytes. Other parts of Vim are expected to deal with one line at a time, and there's some optimizations in the methods to retrieve lines if you're going through sequentially, but the API is presented such that you ask for a line, it is retrieved from disk if needed, and remains in memory accessible as a direct blob of bytes until you ask for another one. If you make modifications, you mark the line dirty, and it will get rewritten to the disk as part of a flush.
Syntax highlighting in Vim is out-of-band, and doesn't get stored with the swap. If you've ever opened a huge file and tried to scroll to the end you've probably noticed Vim has a horrible time of recoloring, in part because it needs to swap in and out the entire file as the highlighter goes through to figure out the state at the bottom, and in part because it doesn't cache that if you then need to go back up to the top.
Beyond the swap mechanism, Kilo treats text very similarly to Vim, keeping the actual text of a line in an untouched buffer and holding an array for color information for syntax highlighting separately. If you wrapped all of the line array accesses in Kilo in a function call, you could implement a swap mechanism like Vim pretty easily.
I do something very different from either of those in Bim, as I treat lines as a series of Unicode codepoints, similar to how Python and Kuroko deal with strings. I store those codepoints as an array of unsigned 32-bit integers and use the upper 11 bits (you only need 21 bits to represent a codepoint) for coloring. Bim, too, could have a swap mechanism implemented if I hid all of the line array accesses behind function calls. Not a bad idea for a future improvement, really...
Re: vi Internals
Posted: Tue May 03, 2022 5:48 pm
by PavelChekov
klange wrote:Behind an interface that allows other code to get and modify individual lines, Vim implements a virtual memory system. There are two layers to this, one that deals in blocks (pretty much the same way any other virtual memory system would deal with paging to disk) and another that deals with lines and presents the interface through which other code can deal with files. Vim essentially turns every file it reads into a series of lines, and can either keep those all in memory if you have the space for them, or semi-lazily push them out to the paging system. Vim also, under default configurations, will periodically ensure everything in memory is flushed to the swap file.
@klange, what do you mean by "semi-lazily push them out to the paging system"?
EDIT: I read through the elvis source code, which appears to be the most readable codebase of the vi family, and I think I understand the paging system.
Re: vi Internals
Posted: Tue May 10, 2022 6:58 pm
by PavelChekov
I now understand how the actual paging system works, but how much of the actual file is loaded into the buffer?