How would you like to access .tar.* files?
Posted: Sat Apr 15, 2006 9:29 am
While considering how to implement userlevel file handling, I considered transparent decompression of compressed files. I also considered transparent viewing of archive files as directory. Now, there's two types of combination for those two.
First, there's the compress-then-archive type files. These include zip files and so forth. They are quick when random-accessing the files in them and when surfing around, but decompress slower and give a lower compression ratio. They are more common in the Windows-world for common users. These are not what my question is about.
Second, there's the archive-then-compress type files. The most prevalent of these are the .tar.bz2 and .tar.gz archive files. They compress better and are quicker at decompressing, but cannot support random-access properly.
My idea is mainly about this random-access. When you view an archive as a directory, you must be able to view the archive. If the archive itself is compressed, you need to decompress the entire (!) archive to show the directory tree. This can be done, in theory, on a one-off base with caching the results. However, when you want to view a single file, you must also decompress the entire archive at least up to the point of the requested file.
If you allow random access to these files, the UI can become very slow (given a few gigabytes of .tar.bz2, unresponsive on minute-scale) and the user could consider it a big flaw. If you allow random looking-around but no read/write, the user would consider it inconsistent. If you view them as closed files that you can uncompress to create the uncompressed view, the user loses part of the "view inside compressed archives" function. As a fourth option, the archive could be decompressed as the user tries to open it so that the files in it are actual directories and so forth. This would delay the response but allow the rest without user intervention. It could cause unexpected behaviour though, where the user is waiting for "nothing".
Which would you, you pretending to be a user, prefer?
First, there's the compress-then-archive type files. These include zip files and so forth. They are quick when random-accessing the files in them and when surfing around, but decompress slower and give a lower compression ratio. They are more common in the Windows-world for common users. These are not what my question is about.
Second, there's the archive-then-compress type files. The most prevalent of these are the .tar.bz2 and .tar.gz archive files. They compress better and are quicker at decompressing, but cannot support random-access properly.
My idea is mainly about this random-access. When you view an archive as a directory, you must be able to view the archive. If the archive itself is compressed, you need to decompress the entire (!) archive to show the directory tree. This can be done, in theory, on a one-off base with caching the results. However, when you want to view a single file, you must also decompress the entire archive at least up to the point of the requested file.
If you allow random access to these files, the UI can become very slow (given a few gigabytes of .tar.bz2, unresponsive on minute-scale) and the user could consider it a big flaw. If you allow random looking-around but no read/write, the user would consider it inconsistent. If you view them as closed files that you can uncompress to create the uncompressed view, the user loses part of the "view inside compressed archives" function. As a fourth option, the archive could be decompressed as the user tries to open it so that the files in it are actual directories and so forth. This would delay the response but allow the rest without user intervention. It could cause unexpected behaviour though, where the user is waiting for "nothing".
Which would you, you pretending to be a user, prefer?