Questions on Long file names FAT32

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
Klakap
Member
Member
Posts: 298
Joined: Sat Mar 10, 2018 10:16 am

Questions on Long file names FAT32

Post by Klakap »

Good day!

I have some questions about LFS.

Please if dir entry have LFS, is his long 64 bytes? (short file name + LFS?)
Please have file with LFS lenght 21 characters + 3 extension?
Please is today in any machine used short names?

I am thankful for answers.
Pebblerubble
Posts: 3
Joined: Mon May 20, 2019 1:03 pm

Re: Questions on Long file names FAT32

Post by Pebblerubble »

This link here answers probably all of your question (Correct me if I misunderstood you) :

https://wiki.osdev.org/FAT#Long_File_Names

Please note that in FAT a long filename entry is actually built of a 8+3 entry plus a long-filename-entry.

There is NO limitation to how big the suffix is. So .html for example is a valid suffix in LFS.

I think only DOS machines use 8+3 filenames nowadays anymore. It's in my humble opinion really outdated and obsolete.

[EDIT: I could imagine a bootloader in 16bit mode or a FAT12 floppy driver might have some use for 8+3. But I still think it's outdated.]

Greetings
Peter
Octocontrabass
Member
Member
Posts: 5584
Joined: Mon Mar 25, 2013 7:01 pm

Re: Questions on Long file names FAT32

Post by Octocontrabass »

Klakap wrote:I have some questions about LFS.
Make sure you read Microsoft's FAT filesystem specification. It should have answers to many of your questions.
Klakap wrote:Please if dir entry have LFS, is his long 64 bytes? (short file name + LFS?)
Long file names are variable-length. If you include the short file name, they can take between 64 and 672 bytes.
Klakap wrote:Please have file with LFS lenght 21 characters + 3 extension?
Long file names can have up to 255 UTF-16 code units. That includes the extension, if the file has one, and the period (.) separating the name and the extension. The extension can be more than 3 characters long.
Klakap wrote:Please is today in any machine used short names?
If the file name can be accurately stored as a short name, Windows will not write a long file name. Linux can be configured to behave this way too.
Klakap
Member
Member
Posts: 298
Joined: Sat Mar 10, 2018 10:16 am

Re: Questions on Long file names FAT32

Post by Klakap »

Thank you for answers.
User avatar
bzt
Member
Member
Posts: 1584
Joined: Thu Oct 13, 2016 4:55 pm
Contact:

Re: Questions on Long file names FAT32

Post by bzt »

Hi,
Pebblerubble wrote:I think only DOS machines use 8+3 filenames nowadays anymore. It's in my humble opinion really outdated and obsolete.
This is not exactly the case. 8+3 is quite widespread in embedded world. Many cameras for example record images with names like 'IMG00000.JPG' or 'DSC00000.RAW' which are 8+3. This is partially for simplicity, the other reason is M$ has patented LFN.

I'm not sure about the state of the LFN patent fee these days, still applies? If so, there's a simple workaround: the patent only applies to entries where the 8+3 entry is generated in a particular way from the LFN entry. Linux circumvents this by not generating two entries (see lkml):
The claims of both of the VFAT patents involve the creation (or storing) of both a long filename and a short filename for a file. The 2nd patch only creates/stores either a short filename or a long filename for a file, but never both.
Cheers,
bzt
nullplan
Member
Member
Posts: 1801
Joined: Wed Aug 30, 2017 8:24 am

Re: Questions on Long file names FAT32

Post by nullplan »

Octocontrabass wrote:Long file names are variable-length. If you include the short file name, they can take between 64 and 672 bytes.
Where did those numbers come from? There are 12 UCS-2 characters in an LFN entry. The shortest name is one ASCII character, taking up two directory entries. Since I am a proponent of UTF-8 everywhere, I suggest re-encoding an LFN as UTF-8, so that means the shortest LFN is 1 byte. Adding the SFN into this doesn't really make sense, as the file can be named by either the LFN or the SFN, but not a combination of them.

To my knowledge, MS imposes a limit of 128 codepoints, which weighs in at something between 64 and 192 bytes of UTF-8. But the format will in principle allow for 63 LFN entries, each with 12 codepoints, which is 1512 bytes of UCS-2, or between 756 and 2268 bytes of UTF-8. (Opening this up to UTF-16 does not worsen the prospects of UTF-8, since a non-BMP-codepoint will be encoded in 4 bytes of UTF-16 as well as 4 bytes of UTF-8, so nothing is gained. The worst case for UTF-16 --> UTF-8 conversion is a string of high BMP characters, which are 2 bytes of UTF-16, but 3 bytes of UTF-8, leading to a size increase of 3/2)
bzt wrote:I'm not sure about the state of the LFN patent fee these days, still applies?
Disclaimer: IANAL. None of this is legal advice. Consult a professional if you require aid in this matter.

To my knowledge, all patents regarding LFN are expired, except for the one for generating LFNs and short file names in the same namespace (which is what the Linux quote was about), which is due to expire in 2020 or 2021 (don't quote me on that). Thankfully, patents aren't copyright, and you are allowed to ship an infringing algorithm so long as you ensure it isn't actually used until a licence is procured or the patent has expired. Also, as we saw, it is really easy to get around this one by just always generating LFNs or SFNs, and never a mixture of the two. Also also, it is exceedingly unlikely Microsoft would ever know or care about an infringment from the likes of us before the patent expires. None of us is in this for money (well, except for rdos, and good on them), so M$ are gladly invited to any share of my profits they like. :D
Carpe diem!
Octocontrabass
Member
Member
Posts: 5584
Joined: Mon Mar 25, 2013 7:01 pm

Re: Questions on Long file names FAT32

Post by Octocontrabass »

nullplan wrote:
Octocontrabass wrote:Long file names are variable-length. If you include the short file name, they can take between 64 and 672 bytes.
Where did those numbers come from?
It's the total size, in bytes, of the directory entries that store a long file name. (The question, not quoted here, was about the size of the directory entry, not the size of the file name itself.)
nullplan wrote:There are 12 UCS-2 characters in an LFN entry.
There are 13 UTF-16 code units in each LFN entry.
nullplan wrote:To my knowledge, MS imposes a limit of 128 codepoints, which weighs in at something between 64 and 192 bytes of UTF-8.
It's 255 UTF-16 code units, or up to 765 bytes when taken as UTF-8. If you ignore that limit and use all 63 possible long entries, it's 819 UTF-16 code units, or 2457 bytes as UTF-8.
nullplan wrote:Also, as we saw, it is really easy to get around this one by just always generating LFNs or SFNs, and never a mixture of the two.
This is accomplished by generating an invalid SFN when you have a LFN, since the directory entry format won't allow you to not have the SFN.
Post Reply