A simple question
A simple question
I'm writing syscalls for my operating system. Now I'm writing syscalls to handle virtual file system... I have a small problem:
file names can be long 256 bytes, but also 12 bytes, or 14, or 15, that is every length..
The problem occurs when an user process calls the syscall readdir
This syscall opens a directory and read a specific entry of the directory. The problem is how to return the name of the file to the user process...
In fact, if the function returns a string allocated in the kernel memory, the user process can't access to the string. If the user alloc some memory and pass the address to readdir, readdir can only return a filename of a specific length, because the user process, when allocates memory, can't know how the filename is long...
How can I resolve this problem? I've seen that other operating systems give for example a maximum of 256 characters, but I want that filenames can be of every length...
file names can be long 256 bytes, but also 12 bytes, or 14, or 15, that is every length..
The problem occurs when an user process calls the syscall readdir
This syscall opens a directory and read a specific entry of the directory. The problem is how to return the name of the file to the user process...
In fact, if the function returns a string allocated in the kernel memory, the user process can't access to the string. If the user alloc some memory and pass the address to readdir, readdir can only return a filename of a specific length, because the user process, when allocates memory, can't know how the filename is long...
How can I resolve this problem? I've seen that other operating systems give for example a maximum of 256 characters, but I want that filenames can be of every length...
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: A simple question
Map the result into user address space as read-only?Jeko wrote:In fact, if the function returns a string allocated in the kernel memory, the user process can't access to the string.
Make the user provide the memory, return zero if successfull, and the amount of memory required if what the user provided isn't long enough?
Every good solution is obvious once you've found it.
Re: A simple question
With this method I must allocate memory for each readdir. Isn't this a waste of time? But I think it's the only valid method...Solar wrote:Map the result into user address space as read-only?Jeko wrote:In fact, if the function returns a string allocated in the kernel memory, the user process can't access to the string.
However if I alloc from the user heap for example 8 bytes, I use an entire page for a filename of 8 bytes?
With this method the user must allocate memory, call the function, allocate more memory, call another time the function. The other method is better.Solar wrote:Make the user provide the memory, return zero if successfull, and the amount of memory required if what the user provided isn't long enough?
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
- AndrewAPrice
- Member
- Posts: 2299
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
Re: A simple question
Look how Microsoft do it.
e.g.
ReturnSomeKernelString(char *buffer, uint sizeOfBuffer);
e.g.
ReturnSomeKernelString(char *buffer, uint sizeOfBuffer);
My OS is Perception.
Re: A simple question
If he did this the first time around, he'd be able to catch 99% of all filename cases:Jeko wrote:Solar wrote:With this method the user must allocate memory, call the function, allocate more memory, call another time the function.Solar wrote:Make the user provide the memory, return zero if successfull, and the amount of memory required if what the user provided isn't long enough?
Code: Select all
#include <stdio.h>
...
char * buffer = malloc( FILENAME_MAX );
...
Every good solution is obvious once you've found it.
Re: A simple question
They are ASCII, but I will support Unicode.Solar wrote:If he did this the first time around, he'd be able to catch 99% of all filename cases:Jeko wrote:With this method the user must allocate memory, call the function, allocate more memory, call another time the function.
Oh, while we're at it - are your filenames in ASCII (if yes, which codepage?), or Unicode (if yes, which encoding?)?Code: Select all
#include <stdio.h> ... char * buffer = malloc( FILENAME_MAX ); ...
(What are codepages, and what are encodings?)
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: A simple question
So a filename is returned partially. If sizeOfBuffer < filename, only a part of the filename will be put in the bufferMessiahAndrw wrote:Look how Microsoft do it.
e.g.
ReturnSomeKernelString(char *buffer, uint sizeOfBuffer);
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: A simple question
Codepages is DOS vocabulary; more generally spoken it's about character encodings. ISO 646, ISO 8859, EBCDIC, Windows-125*, KOI-8, ISO 2022...Jeko wrote:(What are codepages, and what are encodings?)
With "encoding" I meant UTF-8, UTF-16, UTF-32...
I know you usually don't want to bother with these things in the beginning, you just "want it to work". But they are a royal PITA when you attempt to retrofit this kind of international support, because it affects virtually everything down to the simplest of functions.
Every good solution is obvious once you've found it.
Re: A simple question
I think I'll use Unicode UTF-8 or UTF-16. I must study the differences between these, but I read that UTF-32 isn't good because waste space.Solar wrote:Codepages is DOS vocabulary; more generally spoken it's about character encodings. ISO 646, ISO 8859, EBCDIC, Windows-125*, KOI-8, ISO 2022...Jeko wrote:(What are codepages, and what are encodings?)
With "encoding" I meant UTF-8, UTF-16, UTF-32...
I know you usually don't want to bother with these things in the beginning, you just "want it to work". But they are a royal PITA when you attempt to retrofit this kind of international support, because it affects virtually everything down to the simplest of functions.
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: A simple question
UTF-8 is space-efficient, at least as long as most of your characters are ASCII, or at least BMP (Basic Multilingual Plane). However, it's a multibyte encoding - i.e. you can't know how many bytes you have to skip if you want to skip n characters, as one character can be 1..? bytes.
UTF-32 is not space-efficient, as every character takes 32 bits of space (wide encoding). However, skipping characters, concatenating and many other string operations are more efficient because of this.
ISO/IEC 9899:1999 (C language standard) more or less assumes that files are stored as multibytes, while in-memory-operations are usually done in wide encoding.
UTF-32 is not space-efficient, as every character takes 32 bits of space (wide encoding). However, skipping characters, concatenating and many other string operations are more efficient because of this.
ISO/IEC 9899:1999 (C language standard) more or less assumes that files are stored as multibytes, while in-memory-operations are usually done in wide encoding.
Every good solution is obvious once you've found it.
Re: A simple question
Which is, in your opinion, the best encoding? I read that UTF-32 isn't good... But, what do you think?Solar wrote:UTF-8 is space-efficient, at least as long as most of your characters are ASCII, or at least BMP (Basic Multilingual Plane). However, it's a multibyte encoding - i.e. you can't know how many bytes you have to skip if you want to skip n characters, as one character can be 1..? bytes.
UTF-32 is not space-efficient, as every character takes 32 bits of space (wide encoding). However, skipping characters, concatenating and many other string operations are more efficient because of this.
ISO/IEC 9899:1999 (C language standard) more or less assumes that files are stored as multibytes, while in-memory-operations are usually done in wide encoding.
Rewriting virtual memory manager - Working on ELF support - Working on Device Drivers Handling
http://sourceforge.net/projects/jeko - Jeko Operating System
http://sourceforge.net/projects/jeko - Jeko Operating System
Re: A simple question
Personally I don't think a newly-written operating system should bother with anything but UTF-8 or UTF-32 natively. Which one you chose is really up to you, but it should be consistent all across the API. (I'd opt for the more comfortable but memory-inefficient UTF-32, but that's because any OS I'd write would be aimed at the desktop / server range. An embedded system certainly would go for UTF-8.)
With regards to the subject "readdir", I'd probably toy around with the kernel's return value. A syscall is kernel-space code, but usually provides a user-space wrapper so you don't have to fiddle with registers and the like but can do a convenient C-syntax function call. I see you want your function to return one file name at a time, which is in line with the way "readdir" as user-space coders know it works.
But who says that you need to call kernel space for every invocation of the syscall, having the memory problem with every invocation? You could, for example, have the kernel return a whole block of information the first time around, which get stored in user-space by the syscall wrapper, and only the first filename from that block actually gets returned to the caller. Subsequent calls to "readdir" are satisfied by the buffer, until that is exhausted and another call to kernel space is made.
Two advantages here: You get fewer context switches, and all memory management remains in the hands of the OS (as the syscall wrapper can return const pointers to its buffer, which is already user-space).
With regards to the subject "readdir", I'd probably toy around with the kernel's return value. A syscall is kernel-space code, but usually provides a user-space wrapper so you don't have to fiddle with registers and the like but can do a convenient C-syntax function call. I see you want your function to return one file name at a time, which is in line with the way "readdir" as user-space coders know it works.
But who says that you need to call kernel space for every invocation of the syscall, having the memory problem with every invocation? You could, for example, have the kernel return a whole block of information the first time around, which get stored in user-space by the syscall wrapper, and only the first filename from that block actually gets returned to the caller. Subsequent calls to "readdir" are satisfied by the buffer, until that is exhausted and another call to kernel space is made.
Two advantages here: You get fewer context switches, and all memory management remains in the hands of the OS (as the syscall wrapper can return const pointers to its buffer, which is already user-space).
Every good solution is obvious once you've found it.
- AndrewAPrice
- Member
- Posts: 2299
- Joined: Mon Jun 05, 2006 11:00 pm
- Location: USA (and Australia)
Re: A simple question
Jeko wrote:So a filename is returned partially. If sizeOfBuffer < filename, only a part of the filename will be put in the bufferMessiahAndrw wrote:Look how Microsoft do it.
e.g.
ReturnSomeKernelString(char *buffer, uint sizeOfBuffer);
Hide it behind a nice interface. E.g. in your library have a function that is:
std::string ReturnSomeKernelString();
Internally it is:
Code: Select all
// somewhere that can be shared between all common functions like this (not thread safe though):
#define CHUNKS_TO_DO_AT_ONCE 1024
char buffer[CHUNKS_TO_DO_AT_ONCE];
// the system call:
void sysReturnSomeKernelString(char *bufferToStoreChars, uint offsetInName, uint charsThatFitInBuffer, bool &stillMoreCharactersRemaining);
// the func:
std::string ReturnSomeKernelString()
{
std::string str;
bool stillMore = true;
uint offset = 0;
while(stillMore)
{
sysReturnSomeKernelString(buffer, offset, CHUNKS_TO_DO_AT_ONCE, stillMore);
offset += CHUNKS_TO_DO_AT_ONCE;
if(stillMore)
str += std::string(buffer, CHUNKS_TO_DO_AT_ONCE); // no null-terminator so we specify size
else
str += std::string(buffer); // will stop at null-terminator
}
return str;
}
My OS is Perception.