Page 1 of 1

How would you like to access .tar.* files?

Posted: Sat Apr 15, 2006 9:29 am
by Candy
While considering how to implement userlevel file handling, I considered transparent decompression of compressed files. I also considered transparent viewing of archive files as directory. Now, there's two types of combination for those two.

First, there's the compress-then-archive type files. These include zip files and so forth. They are quick when random-accessing the files in them and when surfing around, but decompress slower and give a lower compression ratio. They are more common in the Windows-world for common users. These are not what my question is about.

Second, there's the archive-then-compress type files. The most prevalent of these are the .tar.bz2 and .tar.gz archive files. They compress better and are quicker at decompressing, but cannot support random-access properly.

My idea is mainly about this random-access. When you view an archive as a directory, you must be able to view the archive. If the archive itself is compressed, you need to decompress the entire (!) archive to show the directory tree. This can be done, in theory, on a one-off base with caching the results. However, when you want to view a single file, you must also decompress the entire archive at least up to the point of the requested file.

If you allow random access to these files, the UI can become very slow (given a few gigabytes of .tar.bz2, unresponsive on minute-scale) and the user could consider it a big flaw. If you allow random looking-around but no read/write, the user would consider it inconsistent. If you view them as closed files that you can uncompress to create the uncompressed view, the user loses part of the "view inside compressed archives" function. As a fourth option, the archive could be decompressed as the user tries to open it so that the files in it are actual directories and so forth. This would delay the response but allow the rest without user intervention. It could cause unexpected behaviour though, where the user is waiting for "nothing".

Which would you, you pretending to be a user, prefer?

Re:How would you like to access .tar.* files?

Posted: Sat Apr 15, 2006 12:36 pm
by Kemp
Slightly off-topic I know, but I'm not sure whether the whole idea is a good thing. With zip files under XP you can view them as if they were a folder, but try it with a 200MB zip containing mainly files of a few KB (like the leaked sections of Win2K source for instance) and you have to sit there waiting for a hell of a long time before you can view anything. This would occur with the decompress-before-viewing angle. You could get around it slightly with some sort of hybrid approach where you construct the view from the file index and only play around with the actual filesin the archive when the user does something to one, though in the case of archives where random access is not allowed this would still introduce huge delays for each file operation. Overall, I think pretending things are something other than what they really are is only going to lead to annoyance in the end, just like pretending peripherals and suchlike are files (mentioning no OS names here).

Re:How would you like to access .tar.* files?

Posted: Sat Apr 15, 2006 1:35 pm
by Solar
Kemp wrote: Overall, I think pretending things are something other than what they really are is only going to lead to annoyance in the end, just like pretending peripherals and suchlike are files (mentioning no OS names here).
My thoughts exactly.

If you want to support compressed folders in some transparent way, chose a format that is actually suited to the task - in this case, having each file compressed individually, probably with a cached "directory" that relieves you from scanning the whole archive contents when you have to display it.

ZIPs and .tgz / .tbz2 are not directories. It's nice if you can access them as easily as one, but they don't behave like directories, so don't fool the user into thinking they are, and he won't be surprised by "weird" behaviour.

Re:How would you like to access .tar.* files?

Posted: Sat Apr 15, 2006 2:22 pm
by Candy
Solar wrote:
Kemp wrote: Overall, I think pretending things are something other than what they really are is only going to lead to annoyance in the end, just like pretending peripherals and suchlike are files (mentioning no OS names here).
My thoughts exactly.

If you want to support compressed folders in some transparent way, chose a format that is actually suited to the task - in this case, having each file compressed individually, probably with a cached "directory" that relieves you from scanning the whole archive contents when you have to display it.

ZIPs and .tgz / .tbz2 are not directories. It's nice if you can access them as easily as one, but they don't behave like directories, so don't fool the user into thinking they are, and he won't be surprised by "weird" behaviour.
The idea was to make formats that fit the behaviour show like folders anyway. My question was, should this be extended to stuff that actually isn't representable as a directory?

Internally, a zip file contains a normal directory tree. I have no clue yet why Microsoft didn't display it faster, could well be because they didn't notice yet.

Re:How would you like to access .tar.* files?

Posted: Sat Apr 15, 2006 4:04 pm
by Kemp
I think in the case of my example, it was purely the amount of time taken to scan through a listing of several hundred thousand small files. While I will admit that I find the representation of zip files as directories useful, it is something I can easily live without (and did do for quite a while). There's two basic types of archives you've described:

[*]Those that lend themselves to browsing, in which case the client app will show it like that anyway, thus making directory representation unnecessary.
[*]Those that require ugly hacks and inconsistent behaviour to look like directories, in which case it's probably not worth it.

Re:How would you like to access .tar.* files?

Posted: Thu Apr 20, 2006 9:24 am
by Pype.Clicker
what would be good to have is somehow "indexing" of the decompression by the shell so that if you later want to retrieve another file, you don't need to decompress everything again but rather just 'resume' decompression to the closest point.

imho, tar.xxx files should remain unmodifiable as they're actually (i mean, i wouldn't promote attemps to add/remove files from a tar.xx as if it was a .Zip or a directory)