Finding (free) PIDs
Finding (free) PIDs
Hello all. Honestly, I'm not sure whether this belongs in this category or in General Programming, and I'm also not sure how terribly n00bish this is, but nonetheless, I need advice.
I've gotten most of my multitasking code written, and it works, loading tasks as flat-binary images through GRUB, then constructing contexts and initial task states for them. My problem is one of efficiency. My task structures are stored as a linked list so I can easily use my kernel malloc()/free(), but while gaining that freedom, I've lost some efficiency in looking tasks up.
Currently, my function for finding a task by PID is something on the order of:
struct task *getTask(unsigned long pid)
{
struct task *temp;
temp = firstTask;
while(temp != lastTask) \
if(temp->pid == pid) \
return temp;
return NULL;
}
And my function for finding a free PID is something on the order of:
unsigned long getFreePid()
{
unsigned long pid;
struct task *temp;
for(pid = 1; pid < 0xFFFFFFFF; pid++) {
temp = getTask(pid);
if(temp == NULL) \
return pid;
}
return NULL;
}
Mostly, I'm just asking for opinions on whether this is grossly inefficient, or if it'll not really make much difference in the long run, and if it is grossly inefficient, do you have any suggestions for a way to speed it up without just doing a Linux-0.11-style fixed-size task array? Should I just drop the linked list and do an array of pointers to task structs, or would that not really gain me much other than offloading storage overhead from the task struct itself to a lookup table?
Thanks in advance.
EDIT: :-S I do have my code indented, it just stripped it during display. Guess it doesn't matter much for such a short code snippet, though.
EDIT2: Moved post to where I originally meant it to be, rather than Design & Theory. Feeling very n00bish today. :(
I've gotten most of my multitasking code written, and it works, loading tasks as flat-binary images through GRUB, then constructing contexts and initial task states for them. My problem is one of efficiency. My task structures are stored as a linked list so I can easily use my kernel malloc()/free(), but while gaining that freedom, I've lost some efficiency in looking tasks up.
Currently, my function for finding a task by PID is something on the order of:
struct task *getTask(unsigned long pid)
{
struct task *temp;
temp = firstTask;
while(temp != lastTask) \
if(temp->pid == pid) \
return temp;
return NULL;
}
And my function for finding a free PID is something on the order of:
unsigned long getFreePid()
{
unsigned long pid;
struct task *temp;
for(pid = 1; pid < 0xFFFFFFFF; pid++) {
temp = getTask(pid);
if(temp == NULL) \
return pid;
}
return NULL;
}
Mostly, I'm just asking for opinions on whether this is grossly inefficient, or if it'll not really make much difference in the long run, and if it is grossly inefficient, do you have any suggestions for a way to speed it up without just doing a Linux-0.11-style fixed-size task array? Should I just drop the linked list and do an array of pointers to task structs, or would that not really gain me much other than offloading storage overhead from the task struct itself to a lookup table?
Thanks in advance.
EDIT: :-S I do have my code indented, it just stripped it during display. Guess it doesn't matter much for such a short code snippet, though.
EDIT2: Moved post to where I originally meant it to be, rather than Design & Theory. Feeling very n00bish today. :(
Finding free PIDs is done easily if you just use the pointer to the process structure as the PID.
Looking up a process from the PID shouldn't be done from the pointer, though, as it's a security risk. Look through the list as before, or add a sorted array (dynamic malloced) that keeps track of valid PIDs. Only if the PID is in that list (use a binary search) then you can follow the PID as if it was a pointer.
Looking up a process from the PID shouldn't be done from the pointer, though, as it's a security risk. Look through the list as before, or add a sorted array (dynamic malloced) that keeps track of valid PIDs. Only if the PID is in that list (use a binary search) then you can follow the PID as if it was a pointer.
Conway's Law: If you have four groups working on a compiler, you'll get a 4-pass compiler.
Melvin Conway
Melvin Conway
Well, linear seeking is not very efficient when you have a lot of entries. Binary seeking is quite
more efficient, but why not use a hashtable to find the task struct? If the PIDs are not sparsely distributed you only have to have an array of task struct pointers.
To find a free one you only have to put recycled PIDs in a list and if theres no recycled PID you just take the next unused PID (new_pid = ++max_pid).
more efficient, but why not use a hashtable to find the task struct? If the PIDs are not sparsely distributed you only have to have an array of task struct pointers.
To find a free one you only have to put recycled PIDs in a list and if theres no recycled PID you just take the next unused PID (new_pid = ++max_pid).
Hi,
Did you know that the POSIX spec specifies that no PID should be recycled at all? (until the pid wraps over 32bits). If you're following POSIX (not sure if you are or not), then your recycling code is mostly redundant. And I would reccommend a balanced binary search tree for your pid lookups. log2(n) performance guaranteed.
Did you know that the POSIX spec specifies that no PID should be recycled at all? (until the pid wraps over 32bits). If you're following POSIX (not sure if you are or not), then your recycling code is mostly redundant. And I would reccommend a balanced binary search tree for your pid lookups. log2(n) performance guaranteed.
So far I've been avoiding any redundancy in data like the plague because of my inherent overzealousness about memory usage. The way I've been rationalizing it is to say to myself that I'm trading memory usage for processor cycles so it'll be easier to jam more into a microcontroller some time in the future, but it's mostly habit. =p I think I will spring for a hash table or something similar, though. I know my malloc needs to be rewritten, that's using a linear search, first-fit algorithm that's horribly inefficient (when I was testing the malloc code by allocating 16^n-sized blocks until I ran out of RAM, it took about 10 minutes for Bochs running on an Athlon 700 to finish the task on a 16MB RAM VM), but that was just kind of a couple-hour hack to get past that part, and will be revisited later.
JamesM: I'm still debating that with myself. It's going to be an object-oriented microkernel with connection-based IPC using Unix-permissioned "ports". I am going to try and at least resemble POSIX with the servers I'll be writing on top of it, although a large part of that will be emulated in the libc. I am going to try and follow POSIX process management in the kernel.. Is there any particular reason you would not want to reuse PIDs? How much will it break in porting apps to the system if I do?
JamesM: I'm still debating that with myself. It's going to be an object-oriented microkernel with connection-based IPC using Unix-permissioned "ports". I am going to try and at least resemble POSIX with the servers I'll be writing on top of it, although a large part of that will be emulated in the libc. I am going to try and follow POSIX process management in the kernel.. Is there any particular reason you would not want to reuse PIDs? How much will it break in porting apps to the system if I do?
Says where? Sounds broken since you have to be recycling them anyway when you run out of ids. Does it even mandate 32-bit pids?JamesM wrote:Hi,
Did you know that the POSIX spec specifies that no PID should be recycled at all? (until the pid wraps over 32bits).
Except for extreme cases this difference would be negligable. Besides it's not always true, fx on MIPS given that you have registers to spare you win nothing by counting downwards. If you write your code in C (besides by doing so you probably don't care much about that 1 clockcycle save) you don't save that clock cycle on a modern compiler since it will, given that the index is not used otherwise, changed in order to be more efficient.bewing wrote: From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.
The creating unique filenames problem is not a big problem since you'll have to take care of the eventuallity that there already are filenames in the namespace where you trying to create unique filenames and if some other process is trying to create files in that namespace they are certainly not using their PID in the same way to construct the filename (since you don't recycle a PID before it's released).
Strange, I thought it said it on the fork() page, but it seems it doesn't. I definately saw it somewhere, that a new process should have the next available PID, sequentially. If I find it again I'll post it. No, it doesn't mandate 32-bit pids, it uses pid_t.Says where? Sounds broken since you have to be recycling them anyway when you run out of ids. Does it even mandate 32-bit pids?
Yes, technically it is broken (from a "prove this process mathematically" point of view) but in real life how often does a 32bit pid wrap around? Imagine this scenario - A process finds another process (could be a daemon) and stores its pid. There could be several of these daemons active. That daemon goes down and another process is spawned, however the pid is reused for the new process. Now the original process will be holding a pointer (pid) to a different process than it thought it was originally, and no IPC-style calls will fail because that process does actually exist!
This is why pids are never reused until wraparound (in UNIX boxen)
Add to that the side effect that a POSIX system doesn't have to check for PID availability for the first 2^32 processes at all - it simply takes (last PID) + 1 as new PID until it wraps around. Now make the PID a 64bit field, and you have quite some time until you have to come up with a solution for that wrap-around.
Sorry. Too early in the morning. Brain hasn't yet engaged fully, so I don't have an elegant solution for the wrap-around at hand.
Sorry. Too early in the morning. Brain hasn't yet engaged fully, so I don't have an elegant solution for the wrap-around at hand.
Every good solution is obvious once you've found it.
Can you explain to me why that is? I always thought counting up would be faster because add is faster than sub, but the x86 has the loop instruction that subs automatically. It is however slower than sub cmp jmp anyways.From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.
It will be unique if he uses a pointer unless his malloc function really messes up big time, in which case he's got bigger problems.bewing wrote:And some OSes use PIDs to create unique temp filenames. So the PIDs need to be unique. It's not a big deal.
Conway's Law: If you have four groups working on a compiler, you'll get a 4-pass compiler.
Melvin Conway
Melvin Conway
All CPUs have the same basic set of flags to test on, when it comes to arithmetic operations. Carry, Zero, Sign, Overflow. Immediately after an arithmetic opcode, you can sometimes test them, and get a useful result.Wave wrote: Can you explain to me why that is? I always thought counting up would be faster because add is faster than sub, but the x86 has the loop instruction that subs automatically. It is however slower than sub cmp jmp anyways.
For the case of addition/subtraction all these flags flip when crossing the boundary between 1, 0, -1. The Sign flag also flips from MSB 0x7f to 0x80. These are the only efficient values to test on. If you test on any other value, you need to do an extra CMP opcode -- which calculates an extra subtraction.
Example: you have a loop, storing a value N times, in every 5th dword.
Example uses onetrip do-while construction. Results for a for or while loop would be similar, but the loop construction is a little more complex.
Code: Select all
;counting up:
xor ecx, ecx ; set ecx to 0 the fast way
mov eax, VALUE
mov esi, startptr
lp1:
mov [esi], eax ; store VALUE in ptr
lea esi, [esi + 20] ; an easy way to add 20
inc ecx
cmp ecx, N ; HAVE TO SEE IF WE REACHED N!
jb short lp1
;counting down:
mov ecx, N
mov eax, VALUE
mov esi, lastptr + 20 ; set pointer for predecrementing
lp1:
lea esi, [esi - 20] ; predecrement by 20
mov [esi], eax ; store VALUE
dec ecx
jg short lp1 ; freebie test on 0 with no CMP!
It is not always wise to count on the compiler to save your @$$ from crappy coding. And in any case, programmers often DO use the index inside the loop, and then the compiler won't touch the code. But you can still code the loop to count up or down.skyking wrote: you don't save that clock cycle on a modern compiler
It is also not wise to look at "how fast some current CPU performs this opcode vs. that one" -- all opcodes are tending toward the limit of 1 opcode per calculation cycle. It is wisest to code for that.