PavelCheckov wrote:
As you say, swap files are a "serialization of operations". When I cat
small files' swap files, they simply contain the text of the file as well
as a bit of header, however larger ones contain unintelligible gibberish
that screws up the output of my terminal (changes characters to weird
alternate characters). How are these stored? Is there some standard
for this?
Thanks
A Vim swap file, as its name correctly suggests, is essentially the same as a swap partition or page file. Behind an interface that allows other code to get and modify individual lines, Vim implements a virtual memory system. There are two layers to this, one that deals in blocks (pretty much the same way any other virtual memory system would deal with paging to disk) and another that deals with lines and presents the interface through which other code can deal with files. Vim essentially turns every file it reads into a series of lines, and can either keep those all in memory if you have the space for them, or semi-lazily push them out to the paging system. Vim also, under default configurations, will periodically ensure everything in memory is flushed to the swap file.
The reason why you can open some swap files and essentially see your raw text is because that's what's in a line structure in Vim - alongside information about where the line is, how much data it has in it, and a few other things. There's also a header for the whole swap file that says what file it is, who was running Vim to produce the swap, what process it was made by, and so on. Vim generally treats files as a big array (or rather, a big tree-index) of lines, which are then raw blobs of bytes. Other parts of Vim are expected to deal with one line at a time, and there's some optimizations in the methods to retrieve lines if you're going through sequentially, but the API is presented such that you ask for a line, it is retrieved from disk if needed, and remains in memory accessible as a direct blob of bytes until you ask for another one. If you make modifications, you mark the line dirty, and it will get rewritten to the disk as part of a flush.
Syntax highlighting in Vim is out-of-band, and doesn't get stored with the swap. If you've ever opened a huge file and tried to scroll to the end you've probably noticed Vim has a horrible time of recoloring, in part because it needs to swap in and out the entire file as the highlighter goes through to figure out the state at the bottom, and in part because it doesn't cache that if you then need to go back up to the top.
Beyond the swap mechanism, Kilo treats text very similarly to Vim, keeping the actual text of a line in an untouched buffer and holding an array for color information for syntax highlighting separately. If you wrapped all of the line array accesses in Kilo in a function call, you could implement a swap mechanism like Vim pretty easily.
I do something very different from either of those in Bim, as I treat lines as a series of Unicode codepoints, similar to how Python and Kuroko deal with strings. I store those codepoints as an array of unsigned 32-bit integers and use the upper 11 bits (you only need 21 bits to represent a codepoint) for coloring. Bim, too, could have a swap mechanism implemented if I hid all of the line array accesses behind function calls. Not a bad idea for a future improvement, really...