About SFS (to Brendan)

Candy · Post by **Candy** » Fri Dec 08, 2006 11:46 am

Brendan wrote: Simply getting correct behaviour is hard enough. For an example, try "make foo", then reboot your computer and use the BIOS to set the time to last week, then do "touch somefile" and then try "make foo" again. What if you're doing incremental backups? For some OSs, you get the same problems even when you use the OS to change the current time.

Use Gnu make on DOS. That alone gives enough cutoff since it rounds dates to the nearest 2 second boundary, which make can trivially outpace in using the file again.

Brendan wrote:Unfortunately, (according to Wikipedia) systems with strict POSIX compliance do have the same times twice when leap seconds are added. This means you can get the time and 500 ms later get an earlier time.

Thought I'd heard it somewhere but I wasn't sure.

One problem with this is accurately calculating times 6 months or more ahead without knowing if a leap second will be added to UTC in advance, which could lead to calculations being incorrect by a maximum of one second per year. IMHO this isn't a huge problem as these long range calculations rarely need so much accuracy, and it's possible to postpone the calculation until just before it's needed to avoid the inaccuracy

The other problem is informing the OS when a leap second will be added - I assumed there'd be some sort of software updates happening every few months (which may be a bad assumption for some people).

The other main hurdle for accurate time keeping is handling large corrections. If the user notices that the time is wrong and wants to increase or decrease it by 10 minutes, then do you allow a sudden change (and mess up running software), or do you gradually compensate (e.g. increase the time counter at a rate of 1.01 Hz or 0.99 Hz instead of 1 Hz until the time is correct again) to avoid the sudden change. What if the user reboots and changes the RTC time using the BIOS? A file saved 5 minutes ago could look like it was saved a long time in the past or a long time in the future. For me, this becomes much more difficult because there's also the added problem of keeping all computers in the cluster synchronized. To be honest, I haven't really found a solution for these problems that I like...

I'm not going for accurate time (as in second-accurate, subsecond-accurate or better. I'm going for a time that
- Only increments
- Is unique everytime you request it
- Is meaningful to the user (within bounds)

The first requires a few things. One, you can't adjust the time to before the current time - ever. Adjusting in the BIOS means that the computer should internally ignore the BIOS time and rather count for itself using its own last timestamp without overwriting the RTC values. After a few times turning off the computer the time will gradually shift closer to the "real" time. The actual time that's world-wide valid is irrelevant - the user defines the world of the computer.

The second means that the time kept is actually a number that is incremented regularly. The times given out (as opposed to the time kept) are the time kept plus a counter value that is reduced by as much as possible (preferably to 0 every time). This within the bounds of the first, so you can't decrement the counter by more than the time interval that's added to the time that's kept.

The third just means that the user must be able to set the time initially (and check whether it's valid) and indicate the relevant timezone / DST settings. The times given out should match the set value where not in conflict with the 1st or 2nd rule.

As far as I can see, this gives a good time clock for desktop systems that fulfills a number of cryptographically and system-integrity wise important characteristics. I can not find a flaw except that the time could drift a minute or more in 30 years (UTC allows 2 leap second per year). That would be compensated by the lack of quality in RTC's (I expect) and the user's inability to expect the time to remain correct for a longer period of time. Also, when the time values are retrieved at an astronomical speed, the time values given out might run past the actual time. This is not much of a problem and mainly the reason why I'm thinking of a 64-bit subsecond counter (try getting 18 billion billion values in a second - good luck).

Brendan · Post by **Brendan** » Fri Dec 08, 2006 1:58 pm

Hi,

Candy wrote:I'm not going for accurate time (as in second-accurate, subsecond-accurate or better. I'm going for a time that
- Only increments
- Is unique everytime you request it
- Is meaningful to the user (within bounds)

Candy wrote:The second means that the time kept is actually a number that is incremented regularly. The times given out (as opposed to the time kept) are the time kept plus a counter value that is reduced by as much as possible (preferably to 0 every time). This within the bounds of the first, so you can't decrement the counter by more than the time interval that's added to the time that's kept.

That is a very interesting idea...

Have you considered the code to do it (specifically, the code to ensure atomicity in "many-CPU" systems)? To be honest, I tried to figure out a way, but everything I thought of left a re-entrancy lock in the timer IRQ handler or race conditions.

BTW I'm assuming a re-entrancy lock in the timer IRQ handler isn't desirable - without due care it creates further problems, like deadlocks and starvation (e.g. a multi-threaded process using several CPUs to repeatedly request the time, causing the IRQ handler to repeatedly fail to acquire the lock).

I think something similar can be done without relatively complex locking, but there's problems...

Code: Select all

    BITS 64

timerStructure:
.accessCount:   dd 0
.timeCount:     dd 0

timePeriodStructure:
.unused:	dd 0
.timePeriod:	dd 1000 * 1/TIMER_FREQUENCY


timerIRQ:
   push rax

   mov rax,[timePeriodStructure]
   lock add [timerStructure], rax

   pop rax
   iretd


getTime:
   mov rax,1
   lock xadd [timerStructure],rax
   ret

This gives a 64-bit value, where the highest 32 bits are the time (e.g. milli-seconds since foo) and the lower 32-bits are the number of times the time has been accessed.

The main problem here is that "timerStructure.accessCount" isn't reset to zero by the timer IRQ, so it'd overflow and effect "timerStructure.timeCount" (causing an error that depends on how often "getTime" is called).

This could be converted to 32-bit protected mode, but then you've only got a 16-bit access count (and a much larger error caused by the overflow problem) and a 16-bit time component (which would suck).

Candy wrote:The third just means that the user must be able to set the time initially (and check whether it's valid) and indicate the relevant timezone / DST settings. The times given out should match the set value where not in conflict with the 1st or 2nd rule.

I wouldn't worry much about the timezone and daylight savings - they are used for display purposes only...

Cheers,

Brendan

Candy · Post by **Candy** » Fri Dec 08, 2006 3:47 pm

I was thinking of soft time critical parts, where the software sets the VIF to indicate that it would strongly prefer not to be interrupted right now. I recall that there's some mechanism for it to give an interrupt if the VIP bit is set, allowing you to use that as a free indicator to the kernel that the user space is busy with some mutex-sensitive code that's best not interrupted. In my design it'd get a grace of one timer period (a millisecond) for it to complete the whatever operation it was, after which it was forcibly removed from the cpu. This should mostly reduce the overhead of using mutexes.

I can't find a proper way to make this code lock-free. The best thing I can find is using a small counter that might be atomically updated, but it's not perfect. I recall a compare-and-swap operation being mentioned somewhere, that would fit the bill if it compares mem to A, and if equal swaps mem with B. Seeing current processors (mainly AMD's initial AMD64 processors) that lack CMPXCHG16B you can't do that with a 128-bit counter. Otherwise, you can use:

Code: Select all

load_time:
   mov xmm0, [time]
   mov xmm1, xmm0
   <process xmm1 as if you had the mutex>
   cmpxchg16b [time], xmm0, xmm1
   jne load_time
   ret

for any generic operation on the time that would be nonatomical.

I've just checked cmpxchg8b and it works like this. That means that my idea is workable!

smiddy · Post by **smiddy** » Fri Dec 08, 2006 11:40 pm

Brendan wrote:Hi,

Hola,

As articulate as ever, you are!

Brendan wrote:For me, there's a single time counter (and single time format) that is used for everything, including measuring how much CPU time each thread has consumed, setting time delays (including very small delays for device drivers), the basis for "cron", etc.

This presupposes that you time share tasks. You don't have to, you can let each application control its own destiny and the application can reliquish itself when it is ready to.

Though a time sharing environment would certainly need something to divy up from, a metronome of sorts.

Brendan wrote:Simply getting correct behaviour is hard enough. For an example, try "make foo", then reboot your computer and use the BIOS to set the time to last week, then do "touch somefile" and then try "make foo" again. What if you're doing incremental backups? For some OSs, you get the same problems even when you use the OS to change the current time.

I agree with this assessment, but you can't fix all the aults of the user, it would have to be defined up front, that monkeying with the time will monkey with your system, that is if you strictly went with a RTC only approach. Do-able still, even in this scenario.

Brendan wrote:Now consider what happens if a thread does "sleep(DELAY_30MILLISECONDS)" and the user changes the time from "10:00 AM" to "5:00 AM" before the thread wakes up. Will the kernel handle it correctly, or will the thread sleep for 5 hours too long? What if the hard disk driver does "sleep(DELAY_30MILLISECONDS)"?

See answer above, user screws it up, they have themselves to thank.

Brendan wrote:For accuracy; the accuracy for short time periods is important (especially for things like hardware delays, measuring used CPU time, etc). Lately I've been porting/rewriting some compression/decompression code which reports how long it took to compress or decompress. On Linux, often my code defies all the laws of physics and completes instantly (in reality it takes less than "10-ish-milliseconds", and Linux is too crappy to measure it properly).

The accuracy for long time periods isn't that important - if you want to wait for 3 years and you actually wait for 3 years and one second, I doubt it's going to matter.

I don't disagree, but there is more than one way to skin a cat...(terpentine works, but it is messy) I think each application can do this, it places the burden of application development back into the hands of the developer, they have to learn and understand what they're programming on, from a hardware and software perspective.

Bredan wrote:Of course there is another important factor - to be able to say that my OS is better than <insert_any_other_OS_here> because <insert_some_reason_here>....

Cheers,

Brendan

Well, there ya go! I knew all came down to something so prophetic and profound that the boundaries of life would be impinged on, and there you have it.

Seriosuly, I think the paradigm is OS control, versus no OS control or very little. In a cooperative environment each application controls their own destiny. It would force better planning and testing of the applications to ensure they don't burst under the pressure, so to speak. But then again, what do I know...I've only implemented a CLI n my own OS, which doesn't have but one task, the entrance of commands...

Brendan · Post by **Brendan** » Sat Dec 09, 2006 3:36 am

Hi,

smiddy wrote:
Brendan wrote:Simply getting correct behaviour is hard enough. For an example, try "make foo", then reboot your computer and use the BIOS to set the time to last week, then do "touch somefile" and then try "make foo" again. What if you're doing incremental backups? For some OSs, you get the same problems even when you use the OS to change the current time.
I agree with this assessment, but you can't fix all the aults of the user, it would have to be defined up front, that monkeying with the time will monkey with your system, that is if you strictly went with a RTC only approach. Do-able still, even in this scenario.

If the OS keeps track of milli-seconds since boot, and the current time is derived from "time at boot + milli-seconds since boot", then the user can be allowed to change the "time at boot" part without effecting the "milli-seconds since boot" part. In this case, things like "sleep()" won't be effected but timestamps in the file system will.

It would also be possible to keep track of the current system time and the desired system time, where hopefully they are both the same. If the user changes the time, then they only change the desired system time, and the OS tries to make the current system time match without causing problems for software. For example, if the user adds an hour to the desired system time, then the OS could add 1.5 seconds to the current system time each second, and if the user removes an hour from the desired system time the OS could add 0.5 seconds to the current system time each second. This could also be done more aggressively (e.g. if the user increases the desired system time then increase the current system time by the same amount, and if the user decreases the desired system time then only increase the current system time when the time is read).

If you combine all of this (e.g. keep track of milli-seconds since boot, the current time offset and the desired time offset), then you can have a system that can cope with the time being changed without major problems (as long as the user uses the OS to change the time, not the BIOS).

To avoid problems caused by the user changing the time by using the BIOS, it'd be possible to store a "time at shutdown" in the file system, and use that during boot to initialise the current time offset and the desired time offset (where the current time offset is set to the time at shutdown or the RTC time depending on which is later, and the desired time offset is always set to the RTC time).

smiddy wrote:Seriosuly, I think the paradigm is OS control, versus no OS control or very little. In a cooperative environment each application controls their own destiny. It would force better planning and testing of the applications to ensure they don't burst under the pressure, so to speak. But then again, what do I know...I've only implemented a CLI n my own OS, which doesn't have but one task, the entrance of commands...

As far as my kernel is concerned, there's very little difference between a device driver, a file system and a text editor - they are all just processes with one or more threads. In this case a co-operative environment isn't really suitable (a badly written text editor could hog the CPU and prevent device drivers from getting CPU time for IRQ handling), and the priority of each thread must be respected (e.g. a high priority device driver thread must be able to pre-empt lower priority threads, including when it wakes up after doing something like "sleep()").

Cheers,

Brendan

Brendan · Post by **Brendan** » Sat Dec 09, 2006 4:41 am

Hi,

Now (for my purposes) I've been trying to think of ways to combine the advantages of Candy's suggestions, with the comments I made in my reply to Smiddy above.

My new problem is that I'm talking of maintaining several different items atomicaly - the time count, the access count, the current time offset and the desired time offset. Obviously both of my previous block-free methods are no longer suitable.

I should also point out there's 3 different classifications for re-entrancy protection - "locking", "lock-free" and "block-free". Traditional spinloops, semephores and mutexes are "locking". A "lock-free" algrorithm doesn't use explicit locks (like a locking algorithm), but isn't that much different to a spinloop in that it can involve repeated retries. A "block-free" algorithm guarantees forward progress in all cases (i.e. no locking and no retries).

With this in mind, it's time to bring out the big guns...

Code: Select all

%define TOTAL_TIME_DATA_ENTRIES

struc TIME_DATA
 .accessCount:          resd 1
 .timeLow:              resd 1
 .timeHigh:             resd 1
 .currentOffsetLow:     resd 1
 .currentOffsetHigh:    resd 1
 .desiredOffsetLow:     resd 1
 .desiredOffsetHigh:    resd 1
endstruc

                        section .data
                        align CACHE_LINE_SIZE
timeDataReference:      dd timeDataEntries
timeCountFractions:     dd 0

                        align CACHE_LINE_SIZE
timeDataEntryTable:     times (TIME_DATA_size * TOTAL_TIME_DATA_ENTRIES) db 0
timerIRQperiodHigh:     dd 1000 * 1/TIMER_FREQUENCY
timerIRQperiodLow:      dd 0

                        align CACHE_LINE_SIZE
newDesiredOffsetLow:    dd 0
newDesiredOffsetHigh:   dd 0
newDesiredOffsetFlag:   dd 0

                        align CACHE_LINE_SIZE
newDesiredOffsetLock:   dd 0

                        align CACHE_LINE_SIZE
                        section .text



;Timer IRQ Handler

timerIRQ:
    pushad

    mov esi,[timeDataReference]
    lea edi,[esi+TIME_DATA_size]
    cmp edi,timeDataEntryTable + TIME_DATA_size * TOTAL_TIME_DATA_ENTRIES
    jb .gotDest
    mov edi,timeDataEntryTable
.gotDest:

    mov edx,[timerIRQperiodLow]
    add [timeCountFractions],edx
    mov eax,0
    adc eax,[timerIRQperiodHigh]
    xor edx,edx
    add eax,[esi+TIME_DATA.timeLow]
    adc edx,[esi+TIME_DATA.timeHigh]
    mov dword [edi+TIME_DATA.accessCount],0
    mov [edi+TIME_DATA.timeLow],eax
    mov [edi+TIME_DATA.timeHigh],edx

    mov eax,[esi+TIME_DATA.currentOffsetLow]
    mov ebx,[esi+TIME_DATA.currentOffsetHigh]

    mov ecx,[esi+TIME_DATA.desiredOffsetLow]
    mov edx,[esi+TIME_DATA.desiredOffsetHigh]
    test dword [newDesiredOffsetFlag],1
    je .gotDesiredOffset
    mov ecx,[newDesiredOffsetLow]
    mov edx,[newDesiredOffsetHigh]
    mov dword [newDesiredOffsetFlag],0
.gotDesiredOffset:

    cmp edx,ebx
    ja .timeShiftForward
    jb .timeShiftBack
    cmp ecx,eax
    ja .timeShiftForward
    je .gotTimeShift

.timeShiftBack:
    mov ebp,[timerIRQperiod]
    dec ebp
    sub eax,ebp
    sbb ebx,0
    cmp edx,ebx
    jb .gotTimeShift
    ja .timeShiftForward
    cmp ecx,eax
    ja .timeShiftForward
    jmp .gotTimeShift

.timeShiftForward:
    mov eax,ecx
    mov ebx,edx
    jmp .gotTimeShift

.gotTimeShift:
    mov [edi+TIME_DATA.currentOffsetLow],eax
    mov [edi+TIME_DATA.currentOffsetHigh],ebx

    mov [timeDataReference],edi
    popad
    ret



;Get the current time
;
;Output
; ecx:ebx:eax  Current time

getCurrentTime:
    push esi
    pushfd
    cli
    mov esi,[timeDataReference]
    mov eax,1
    lock xadd [esi+TIME_DATA.accessCount],eax
    mov ebx,[esi+TIME_DATA.timeLow]
    mov ecx,[esi+TIME_DATA.timeHigh]
    add ebx,[esi+TIME_DATA.currentOffsetLow]
    adc ecx,[esi+TIME_DATA.currentOffsetHigh]
    popfd
    pop esi
    ret


;Change the current time
;
;Input
; ecx:ebx  new desired time

setDesiredTime:
    pushad
    pushfd
    cli
    mov esi,[timeDataReference]
    mov eax,1
    lock xadd [esi+TIME_DATA.accessCount],eax
    mov eax,[esi+TIME_DATA.timeLow]
    mov edx,[esi+TIME_DATA.timeHigh]
    popfd

    sub ebx,eax
    sbb ecx,edx

    pushfd
    cli
    test dword [newDesiredOffsetFlag],1
    jne .wait
    bts [newDesiredOffsetLock],1
    jnc .gotLock
.wait:
    popfd
    pause
    pushfd
    cli
    test dword [newDesiredOffsetFlag],1
    jne .wait
    bts [newDesiredOffsetLock],1
    jc .wait

.gotLock:
    mov [newDesiredOffsetLow],ebx
    mov [newDesiredOffsetHigh],ecx
    mov dword [newDesiredOffsetFlag],1

    btr [newDesiredOffsetLock],1
    popfd

    popad
    ret

OK, it's not simple, but it should do everything and it is entirely "block-free", except for the "setDesiredTime" code. The resulting time counter is a 64-bit timer (plus a 32-bit access count), and it allows for extremely fine-grained compensation for drift.

For the "setDesiredTime" there's some major limitations, in that it does acquire a lock and must wait for the timer IRQ to update the "desired time offset" that's currently being used. This means that if you call "setDesiredTime" 20 times (using any number of CPUs) then it'll take 19 timer IRQs before all calls return. IMHO this isn't really a problem because the time shouldn't be changed often anyway (and it will handle a single change without needing to wait for anything).

Of course none of this has been tested...

Cheers,

Brendan

JJeronimo · Post by **JJeronimo** » Sun Dec 10, 2006 2:50 pm

Well, i've terminated reading the specification... now I have some questions to ask...

(paragraph numbers include all the entries of unordered/ordered lists)

Brendan (in the Simple File System Specification) wrote:== Index Area Format ==
* 14§ Any index entry type not listed above must be treated as an unused entry (0x10), and may be replaced with the entry type value 0x10 by an SFS 1.0 compliant code at any time.

I don't agree with this... I think this will make the filesystem extension impossible...
For example, what will you do if latter you want to express an entry for a compressed file?

I think you should replace that by "Any index entry type not listed above must be treated as an unuseable entry (0x18)."

This way, old software will never overwrite the new feature (unless you use some special option to remove unknown entries, parhaps)... This could also provide a standardised way to revert the filesystem to a lower version, however loosing new features, and would allow old software to safely write to the filesystem, too...
Though, I may not be previewing every case...

Brendan (in the Simple File System Specification) wrote:=== Index Area - File Entry ===
* 4§ The ending block number must be higher than the starting block number, unless...

I think that, if the file is less that one block, you should allow this two fields to be equal...
"The ending block number must be the same or higher than the starting block number, unless...", in place of this...

Brendan (in the Simple File System Specification) wrote:== Name Strings ==
* 13§ The forward slash character ('/', code 0x002F) must be used as a directory separator only, and is not allowed within a volume label. It is expected that some systems will silently replace these characters with a backward slash character ('\', code 0x005C) to maintain consistency with the intended environment.

So, just say that the forward slash and the backslash can be both (and equivalently) used to separate filesystem nodes...

About the no-fragmentation restriction... I think you should get rid of that! I'm not saying that you don't know how to design filesystems (not me, who have never designed one!), but only that you are taking the "Simple" to an unnecessary extreme...
Why don't you make use of continuation entries to store extra block strings (separated from the first block string)?

Also, how will the driver fastly find an unused block? Let's scan the whole index area?!

JJ

Combuster · Post by **Combuster** » Sun Dec 10, 2006 3:59 pm

JJeronimo wrote:
Brendan (in the Simple File System Specification) wrote:== Index Area Format ==
* 14§ Any index entry type not listed above must be treated as an unused entry (0x10), and may be replaced with the entry type value 0x10 by an SFS 1.0 compliant code at any time.
I don't agree with this... I think this will make the filesystem extension impossible...
For example, what will you do if latter you want to express an entry for a compressed file?

I think you should replace that by "Any index entry type not listed above must be treated as an unuseable entry (0x18)."

What you missed is that SFS stores a version number, and with it lets the fs driver know to what specification it adheres. Any SFS 1.0 driver should know that modifying an SFS > 1.0 system may lead to inconsistencies, due to the driver not knowing what is used or not. Hence, the problem with extensions is not based on this level.

On the other side, any failure (system crash) might change indexes to an unspecified value, after which 1: the index becomes unavailable, 2: any area this index might be pointing at becomes reserved, which may lead to unnecessary loss of disk space.

Brendan (in the Simple File System Specification) wrote:=== Index Area - File Entry ===
* 4§ The ending block number must be higher than the starting block number, unless...
I think that, if the file is less that one block, you should allow this two fields to be equal...
"The ending block number must be the same or higher than the starting block number, unless...", in place of this...

I have interpreted the ending block as exclusive: i.e. the file extends from the start block up to but not including the end block.

Brendan (in the Simple File System Specification) wrote:== Name Strings ==
* 13§ The forward slash character ('/', code 0x002F) must be used as a directory separator only, and is not allowed within a volume label. It is expected that some systems will silently replace these characters with a backward slash character ('\', code 0x005C) to maintain consistency with the intended environment.
So, just say that the forward slash and the backslash can be both (and equivalently) used to separate filesystem nodes...

This requires drivers to interpret / and \ as the same character, unneedingly complicating the process. Besides, the backslash \ is rather ambiguous in interpretation - try using it on an *nix system. And if you do want to use the \ instead of the /, just add an strreplace(filename,'\\','/')

About the no-fragmentation restriction... I think you should get rid of that! I'm not saying that you don't know how to design filesystems (not me, who have never designed one!), but only that you are taking the "Simple" to an unnecessary extreme...
Why don't you make use of continuation entries to store extra block strings (separated from the first block string)?

Also, how will the driver fastly find an unused block? Let's scan the whole index area?!

JJ

As you noted yourself, SFS thanks its simplicity due to the no fragmentation. This makes reads far easier, as well as writes. You can determine free disk space very easily by checking the difference between the data area end and index area start. If there's not enough, defragment and try again (in this case, you'll be defragging free disk space rather than files).
Also, SFS is designed for flash card (think mp3 player, digital camera) usage: append, append, append, append, read, read, read, delete all, in which case the fragmentation issue can almost be waived.

My implementation does not even have compacting code, for the sole reason that it would probably not get used.

JJeronimo · Post by **JJeronimo** » Mon Dec 11, 2006 2:04 pm

Combuster wrote:
JJeronimo wrote: I don't agree with this... I think this will make the filesystem extension impossible...
For example, what will you do if latter you want to express an entry for a compressed file?

I think you should replace that by "Any index entry type not listed above must be treated as an unuseable entry (0x18)."
What you missed is that SFS stores a version number, and with it lets the fs driver know to what specification it adheres. Any SFS 1.0 driver should know that modifying an SFS > 1.0 system may lead to inconsistencies, due to the driver not knowing what is used or not. Hence, the problem with extensions is not based on this level.

No... I know that SFS stores a version number... the point is that some carefull design can eliminate the need for that type of version checking...

On the other side, any failure (system crash) might change indexes to an unspecified value, after which 1: the index becomes unavailable, 2: any area this index might be pointing at becomes reserved, which may lead to unnecessary loss of disk space.

Is this sooooooo frequent?

Brendan (in the Simple File System Specification) wrote:=== Index Area - File Entry ===
* 4§ The ending block number must be higher than the starting block number, unless...
I think that, if the file is less that one block, you should allow this two fields to be equal...
"The ending block number must be the same or higher than the starting block number, unless...", in place of this...
I have interpreted the ending block as exclusive: i.e. the file extends from the start block up to but not including the end block.

Right... So, add something to the spec stating that... it's not very clear, indeed...

Brendan (in the Simple File System Specification) wrote:== Name Strings ==
* 13§ The forward slash character ('/', code 0x002F) must be used as a directory separator only, and is not allowed within a volume label. It is expected that some systems will silently replace these characters with a backward slash character ('\', code 0x005C) to maintain consistency with the intended environment.
So, just say that the forward6 slash and the backslash can be both (and equivalently) used to separate filesystem nodes...

This requires drivers to interpret / and \ as the same character, unneedingly complicating the process. Besides, the backslash \ is rather ambiguous in interpretation - try using it on an *nix system. And if you do want to use the \ instead of the /, just add an strreplace(filename,'\\','/')

Argh! No! Forget what I said! I hadn't understand, so I thought you were saying that the driver will silently replace / by \ *in* the filesystem structure when it finds one, if it likes the paths that way...

So... you are talking about the OS abstractions level, right?

About the no-fragmentation restriction... I think you should get rid of that! I'm not saying that you don't know how to design filesystems (not me, who have never designed one!), but only that you are taking the "Simple" to an unnecessary extreme...
Why don't you make use of continuation entries to store extra block strings (separated from the first block string)?

Also, how will the driver fastly find an unused block? Let's scan the whole index area?!
As you noted yourself, SFS thanks its simplicity due to the no fragmentation. This makes reads far easier, as well as writes. You can determine free disk space very easily by checking the difference between the data area end and index area start. If there's not enough, defragment and try again (in this case, you'll be defragging free disk space rather than files).

Ok, ok, ok... depending on the purposes of your SimpleFS, parhaps this is a good choice...
Due to the simplicity of the filesystem, I gess that the defragmentation code can be included in the driver itself, and called every time the FS goes out of space...

Parhaps you should add a free space fragmentation count to the superblock (which would contain the number of blocks "used" by deleted files before the end of the data area), to make it easier to determine the aditional free space that can be obtained by defragmenting...

Also, SFS is designed for flash card (think mp3 player, digital camera) usage: append, append, append, append, read, read, read, delete all, in which case the fragmentation issue can almost be waived.

Right... I wasn't taking into account that kind of usage... I was thinking about that guys who have a special partition to transfer files across operating systems... but your argument still applies to that case...

My implementation does not even have compacting code, for the sole reason that it would probably not get used.

But if the spec doesn't even provide support for marking files as compressed, your implementation couldn't implement that! Unless you were about to disrespect your own specification!

Well, what OS is your implementation for?

JJ

Combuster · Post by **Combuster** » Mon Dec 11, 2006 3:36 pm

JJeronimo wrote: Well, what OS is your implementation for?

Hit [ VDisk ] in my signature

Actually, its derived from a test implementation written for the purpose of checking the specification. And AFAIK its the only usable implementation as of yet. (there exists a simplefstools for *nix, but so far its only a mkfs_sfs)

Also, a lot of your questions have been discussed over time: http://www.osdev.org/phpBB2/viewtopic.php?t=11833

But if the spec doesn't even provide support for marking files as compressed

Its not GZIP...
SFS compacting = shifting files to create free space to the end of the data area. Basically the technically correct term for defragmenting.

The only thing "off specs" i intend to do with it is to keep the index in partial order, resulting in an amortized log n operation time for all index accesses. But that's for when i do SFS for my own os.

JJeronimo · Post by **JJeronimo** » Mon Dec 11, 2006 4:55 pm

Combuster wrote:
But if the spec doesn't even provide support for marking files as compressed
Its not GZIP...
SFS compacting = shifting files to create free space to the end of the data area. Basically the technically correct term for defragmenting.

Ok, ok, ok... I didn't know...

JJ

JJeronimo · Post by **JJeronimo** » Mon Dec 11, 2006 4:57 pm

I've found another error:

In section Index Area Format, look for the "Continuation entry for file entry 4" in the graphic representing the structure of the fs... it's just below the "File entry 3", but shouldn't...

JJ