A simple question

Programming, for all ages and all languages.
Post Reply
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

A simple question

Post by Jeko »

I'm writing syscalls for my operating system. Now I'm writing syscalls to handle virtual file system... I have a small problem:
file names can be long 256 bytes, but also 12 bytes, or 14, or 15, that is every length..
The problem occurs when an user process calls the syscall readdir
This syscall opens a directory and read a specific entry of the directory. The problem is how to return the name of the file to the user process...
In fact, if the function returns a string allocated in the kernel memory, the user process can't access to the string. If the user alloc some memory and pass the address to readdir, readdir can only return a filename of a specific length, because the user process, when allocates memory, can't know how the filename is long...

How can I resolve this problem? I've seen that other operating systems give for example a maximum of 256 characters, but I want that filenames can be of every length...
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling

http://sourceforge.net/projects/jeko - Jeko Operating System
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: A simple question

Post by Solar »

Jeko wrote:In fact, if the function returns a string allocated in the kernel memory, the user process can't access to the string.
Map the result into user address space as read-only?

Make the user provide the memory, return zero if successfull, and the amount of memory required if what the user provided isn't long enough?
Every good solution is obvious once you've found it.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Re: A simple question

Post by Jeko »

Solar wrote:
Jeko wrote:In fact, if the function returns a string allocated in the kernel memory, the user process can't access to the string.
Map the result into user address space as read-only?
With this method I must allocate memory for each readdir. Isn't this a waste of time? But I think it's the only valid method...
However if I alloc from the user heap for example 8 bytes, I use an entire page for a filename of 8 bytes?
Solar wrote:Make the user provide the memory, return zero if successfull, and the amount of memory required if what the user provided isn't long enough?
With this method the user must allocate memory, call the function, allocate more memory, call another time the function. The other method is better.
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling

http://sourceforge.net/projects/jeko - Jeko Operating System
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: A simple question

Post by AndrewAPrice »

Look how Microsoft do it. :)

e.g.
ReturnSomeKernelString(char *buffer, uint sizeOfBuffer);
My OS is Perception.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: A simple question

Post by Solar »

Jeko wrote:
Solar wrote:
Solar wrote:Make the user provide the memory, return zero if successfull, and the amount of memory required if what the user provided isn't long enough?
With this method the user must allocate memory, call the function, allocate more memory, call another time the function.
If he did this the first time around, he'd be able to catch 99% of all filename cases:

Code: Select all

#include <stdio.h>

...
char * buffer = malloc( FILENAME_MAX );
...
Oh, while we're at it - are your filenames in ASCII (if yes, which codepage?), or Unicode (if yes, which encoding?)?
Every good solution is obvious once you've found it.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Re: A simple question

Post by Jeko »

Solar wrote:
Jeko wrote:With this method the user must allocate memory, call the function, allocate more memory, call another time the function.
If he did this the first time around, he'd be able to catch 99% of all filename cases:

Code: Select all

#include <stdio.h>

...
char * buffer = malloc( FILENAME_MAX );
...
Oh, while we're at it - are your filenames in ASCII (if yes, which codepage?), or Unicode (if yes, which encoding?)?
They are ASCII, but I will support Unicode.
(What are codepages, and what are encodings?)
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling

http://sourceforge.net/projects/jeko - Jeko Operating System
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Re: A simple question

Post by Jeko »

MessiahAndrw wrote:Look how Microsoft do it. :)

e.g.
ReturnSomeKernelString(char *buffer, uint sizeOfBuffer);
So a filename is returned partially. If sizeOfBuffer < filename, only a part of the filename will be put in the buffer
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling

http://sourceforge.net/projects/jeko - Jeko Operating System
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: A simple question

Post by Solar »

Jeko wrote:(What are codepages, and what are encodings?)
Codepages is DOS vocabulary; more generally spoken it's about character encodings. ISO 646, ISO 8859, EBCDIC, Windows-125*, KOI-8, ISO 2022...

With "encoding" I meant UTF-8, UTF-16, UTF-32...

I know you usually don't want to bother with these things in the beginning, you just "want it to work". But they are a royal PITA when you attempt to retrofit this kind of international support, because it affects virtually everything down to the simplest of functions.
Every good solution is obvious once you've found it.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Re: A simple question

Post by Jeko »

Solar wrote:
Jeko wrote:(What are codepages, and what are encodings?)
Codepages is DOS vocabulary; more generally spoken it's about character encodings. ISO 646, ISO 8859, EBCDIC, Windows-125*, KOI-8, ISO 2022...

With "encoding" I meant UTF-8, UTF-16, UTF-32...

I know you usually don't want to bother with these things in the beginning, you just "want it to work". But they are a royal PITA when you attempt to retrofit this kind of international support, because it affects virtually everything down to the simplest of functions.
I think I'll use Unicode UTF-8 or UTF-16. I must study the differences between these, but I read that UTF-32 isn't good because waste space.
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling

http://sourceforge.net/projects/jeko - Jeko Operating System
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: A simple question

Post by Solar »

UTF-8 is space-efficient, at least as long as most of your characters are ASCII, or at least BMP (Basic Multilingual Plane). However, it's a multibyte encoding - i.e. you can't know how many bytes you have to skip if you want to skip n characters, as one character can be 1..? bytes.

UTF-32 is not space-efficient, as every character takes 32 bits of space (wide encoding). However, skipping characters, concatenating and many other string operations are more efficient because of this.

ISO/IEC 9899:1999 (C language standard) more or less assumes that files are stored as multibytes, while in-memory-operations are usually done in wide encoding.
Every good solution is obvious once you've found it.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Re: A simple question

Post by Jeko »

Solar wrote:UTF-8 is space-efficient, at least as long as most of your characters are ASCII, or at least BMP (Basic Multilingual Plane). However, it's a multibyte encoding - i.e. you can't know how many bytes you have to skip if you want to skip n characters, as one character can be 1..? bytes.

UTF-32 is not space-efficient, as every character takes 32 bits of space (wide encoding). However, skipping characters, concatenating and many other string operations are more efficient because of this.

ISO/IEC 9899:1999 (C language standard) more or less assumes that files are stored as multibytes, while in-memory-operations are usually done in wide encoding.
Which is, in your opinion, the best encoding? I read that UTF-32 isn't good... But, what do you think?
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling

http://sourceforge.net/projects/jeko - Jeko Operating System
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: A simple question

Post by Solar »

Personally I don't think a newly-written operating system should bother with anything but UTF-8 or UTF-32 natively. Which one you chose is really up to you, but it should be consistent all across the API. (I'd opt for the more comfortable but memory-inefficient UTF-32, but that's because any OS I'd write would be aimed at the desktop / server range. An embedded system certainly would go for UTF-8.)

With regards to the subject "readdir", I'd probably toy around with the kernel's return value. A syscall is kernel-space code, but usually provides a user-space wrapper so you don't have to fiddle with registers and the like but can do a convenient C-syntax function call. I see you want your function to return one file name at a time, which is in line with the way "readdir" as user-space coders know it works.

But who says that you need to call kernel space for every invocation of the syscall, having the memory problem with every invocation? You could, for example, have the kernel return a whole block of information the first time around, which get stored in user-space by the syscall wrapper, and only the first filename from that block actually gets returned to the caller. Subsequent calls to "readdir" are satisfied by the buffer, until that is exhausted and another call to kernel space is made.

Two advantages here: You get fewer context switches, and all memory management remains in the hands of the OS (as the syscall wrapper can return const pointers to its buffer, which is already user-space).
Every good solution is obvious once you've found it.
User avatar
AndrewAPrice
Member
Member
Posts: 2299
Joined: Mon Jun 05, 2006 11:00 pm
Location: USA (and Australia)

Re: A simple question

Post by AndrewAPrice »

Jeko wrote:
MessiahAndrw wrote:Look how Microsoft do it. :)

e.g.
ReturnSomeKernelString(char *buffer, uint sizeOfBuffer);
So a filename is returned partially. If sizeOfBuffer < filename, only a part of the filename will be put in the buffer

Hide it behind a nice interface. E.g. in your library have a function that is:

std::string ReturnSomeKernelString();

Internally it is:

Code: Select all

// somewhere that can be shared between all common functions like this (not thread safe though):
#define CHUNKS_TO_DO_AT_ONCE 1024
char buffer[CHUNKS_TO_DO_AT_ONCE];

// the system call:
void sysReturnSomeKernelString(char *bufferToStoreChars, uint offsetInName, uint charsThatFitInBuffer, bool &stillMoreCharactersRemaining);

// the func:
std::string ReturnSomeKernelString()
{
    std::string str;
    bool stillMore = true;
    uint offset = 0;

    while(stillMore)
    {
        sysReturnSomeKernelString(buffer, offset, CHUNKS_TO_DO_AT_ONCE, stillMore);
        offset += CHUNKS_TO_DO_AT_ONCE;
        if(stillMore)
             str += std::string(buffer, CHUNKS_TO_DO_AT_ONCE); // no null-terminator so we specify size
        else
             str += std::string(buffer); // will stop at null-terminator
    }

    return str;
}
Though some could say it's inefficient because it does multiple allocations. A simpler way would have 2 system calls: GetSizeOfSomeKernelString() first then GetSomeKernelString(), also make sure you pass in a maximum buffer size to the latter system call incase the contents increase between calling them both.
My OS is Perception.
Post Reply