Finding (free) PIDs

inx · Post by **inx** » Thu Mar 06, 2008 1:31 pm

Hello all. Honestly, I'm not sure whether this belongs in this category or in General Programming, and I'm also not sure how terribly n00bish this is, but nonetheless, I need advice.

I've gotten most of my multitasking code written, and it works, loading tasks as flat-binary images through GRUB, then constructing contexts and initial task states for them. My problem is one of efficiency. My task structures are stored as a linked list so I can easily use my kernel malloc()/free(), but while gaining that freedom, I've lost some efficiency in looking tasks up.

Currently, my function for finding a task by PID is something on the order of:
struct task *getTask(unsigned long pid)
{
struct task *temp;

temp = firstTask;

while(temp != lastTask) \
if(temp->pid == pid) \
return temp;

return NULL;
}

And my function for finding a free PID is something on the order of:

unsigned long getFreePid()
{
unsigned long pid;
struct task *temp;

for(pid = 1; pid < 0xFFFFFFFF; pid++) {
temp = getTask(pid);
if(temp == NULL) \
return pid;
}

return NULL;
}

Mostly, I'm just asking for opinions on whether this is grossly inefficient, or if it'll not really make much difference in the long run, and if it is grossly inefficient, do you have any suggestions for a way to speed it up without just doing a Linux-0.11-style fixed-size task array? Should I just drop the linked list and do an array of pointers to task structs, or would that not really gain me much other than offloading storage overhead from the task struct itself to a lookup table?

Thanks in advance.

EDIT: :-S I do have my code indented, it just stripped it during display. Guess it doesn't matter much for such a short code snippet, though.

EDIT2: Moved post to where I originally meant it to be, rather than Design & Theory. Feeling very n00bish today. :(

Wave · Post by **Wave** » Thu Mar 06, 2008 2:37 pm

Finding free PIDs is done easily if you just use the pointer to the process structure as the PID.

Looking up a process from the PID shouldn't be done from the pointer, though, as it's a security risk. Look through the list as before, or add a sorted array (dynamic malloced) that keeps track of valid PIDs. Only if the PID is in that list (use a binary search) then you can follow the PID as if it was a pointer.

inx · Post by **inx** » Thu Mar 06, 2008 2:52 pm

Ah, didn't think of that. Thanks for the very useful reply. *off to finish that code*

skyking · Post by **skyking** » Thu Mar 06, 2008 2:53 pm

Well, linear seeking is not very efficient when you have a lot of entries. Binary seeking is quite
more efficient, but why not use a hashtable to find the task struct? If the PIDs are not sparsely distributed you only have to have an array of task struct pointers.

To find a free one you only have to put recycled PIDs in a list and if theres no recycled PID you just take the next unused PID (new_pid = ++max_pid).

JamesM · Post by **JamesM** » Thu Mar 06, 2008 3:05 pm

Hi,

Did you know that the POSIX spec specifies that no PID should be recycled at all? (until the pid wraps over 32bits). If you're following POSIX (not sure if you are or not), then your recycling code is mostly redundant. And I would reccommend a balanced binary search tree for your pid lookups. log2(n) performance guaranteed.

inx · Post by **inx** » Thu Mar 06, 2008 4:14 pm

So far I've been avoiding any redundancy in data like the plague because of my inherent overzealousness about memory usage. The way I've been rationalizing it is to say to myself that I'm trading memory usage for processor cycles so it'll be easier to jam more into a microcontroller some time in the future, but it's mostly habit. =p I think I will spring for a hash table or something similar, though. I know my malloc needs to be rewritten, that's using a linear search, first-fit algorithm that's horribly inefficient (when I was testing the malloc code by allocating 16^n-sized blocks until I ran out of RAM, it took about 10 minutes for Bochs running on an Athlon 700 to finish the task on a 16MB RAM VM), but that was just kind of a couple-hour hack to get past that part, and will be revisited later.

JamesM: I'm still debating that with myself. It's going to be an object-oriented microkernel with connection-based IPC using Unix-permissioned "ports". I am going to try and at least resemble POSIX with the servers I'll be writing on top of it, although a large part of that will be emulated in the libc. I am going to try and follow POSIX process management in the kernel.. Is there any particular reason you would not want to reuse PIDs? How much will it break in porting apps to the system if I do?

bewing · Post by **bewing** » Thu Mar 06, 2008 10:42 pm

From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.

And some OSes use PIDs to create unique temp filenames. So the PIDs need to be unique. It's not a big deal.

skyking · Post by **skyking** » Fri Mar 07, 2008 1:27 am

JamesM wrote:Hi,

Did you know that the POSIX spec specifies that no PID should be recycled at all? (until the pid wraps over 32bits).

Says where? Sounds broken since you have to be recycling them anyway when you run out of ids. Does it even mandate 32-bit pids?

bewing wrote: From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.

Except for extreme cases this difference would be negligable. Besides it's not always true, fx on MIPS given that you have registers to spare you win nothing by counting downwards. If you write your code in C (besides by doing so you probably don't care much about that 1 clockcycle save) you don't save that clock cycle on a modern compiler since it will, given that the index is not used otherwise, changed in order to be more efficient.

The creating unique filenames problem is not a big problem since you'll have to take care of the eventuallity that there already are filenames in the namespace where you trying to create unique filenames and if some other process is trying to create files in that namespace they are certainly not using their PID in the same way to construct the filename (since you don't recycle a PID before it's released).

JamesM · Post by **JamesM** » Fri Mar 07, 2008 2:35 am

Says where? Sounds broken since you have to be recycling them anyway when you run out of ids. Does it even mandate 32-bit pids?

Strange, I thought it said it on the fork() page, but it seems it doesn't. I definately saw it somewhere, that a new process should have the next available PID, sequentially. If I find it again I'll post it. No, it doesn't mandate 32-bit pids, it uses pid_t.

Yes, technically it is broken (from a "prove this process mathematically" point of view) but in real life how often does a 32bit pid wrap around? Imagine this scenario - A process finds another process (could be a daemon) and stores its pid. There could be several of these daemons active. That daemon goes down and another process is spawned, however the pid is reused for the new process. Now the original process will be holding a pointer (pid) to a different process than it thought it was originally, and no IPC-style calls will fail because that process does actually exist!

This is why pids are never reused until wraparound (in UNIX boxen)

Solar · Post by **Solar** » Fri Mar 07, 2008 3:15 am

Add to that the side effect that a POSIX system doesn't have to check for PID availability for the first 2^32 processes at all - it simply takes (last PID) + 1 as new PID until it wraps around. Now make the PID a 64bit field, and you have quite some time until you have to come up with a solution for that wrap-around.

Sorry. Too early in the morning. Brain hasn't yet engaged fully, so I don't have an elegant solution for the wrap-around at hand.

Jeko · Post by **Jeko** » Fri Mar 07, 2008 8:01 am

What method is the best to handle tasks?
A linked list, a dynamic array or other?

Solar · Post by **Solar** » Fri Mar 07, 2008 8:42 am

Insufficient data. Redo from start.

Define "best". (Fastest running? Fastest implemented? Most flexible?)

Define environment. (Desktop? Server? Embedded?)

The answer, of course, is: "It's in the eye of the beholder."

Wave · Post by **Wave** » Fri Mar 07, 2008 11:14 am

From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.

Can you explain to me why that is? I always thought counting up would be faster because add is faster than sub, but the x86 has the loop instruction that subs automatically. It is however slower than sub cmp jmp anyways.

bewing wrote:And some OSes use PIDs to create unique temp filenames. So the PIDs need to be unique. It's not a big deal.

It will be unique if he uses a pointer unless his malloc function really messes up big time, in which case he's got bigger problems.

Jeko · Post by **Jeko** » Fri Mar 07, 2008 12:26 pm

Solar wrote:Insufficient data. Redo from start.

Define "best". (Fastest running? Fastest implemented? Most flexible?)

Define environment. (Desktop? Server? Embedded?)

The answer, of course, is: "It's in the eye of the beholder."

Fastest running for Desktop

bewing · Post by **bewing** » Fri Mar 07, 2008 2:51 pm

Wave wrote: Can you explain to me why that is? I always thought counting up would be faster because add is faster than sub, but the x86 has the loop instruction that subs automatically. It is however slower than sub cmp jmp anyways.

All CPUs have the same basic set of flags to test on, when it comes to arithmetic operations. Carry, Zero, Sign, Overflow. Immediately after an arithmetic opcode, you can sometimes test them, and get a useful result.
For the case of addition/subtraction all these flags flip when crossing the boundary between 1, 0, -1. The Sign flag also flips from MSB 0x7f to 0x80. These are the only efficient values to test on. If you test on any other value, you need to do an extra CMP opcode -- which calculates an extra subtraction.

Example: you have a loop, storing a value N times, in every 5th dword.
Example uses onetrip do-while construction. Results for a for or while loop would be similar, but the loop construction is a little more complex.

Code: Select all

;counting up:
	xor ecx, ecx		; set ecx to 0 the fast way
	mov eax, VALUE
	mov esi, startptr
lp1:
	mov [esi], eax		; store VALUE in ptr
	lea esi, [esi + 20]		; an easy way to add 20
	inc ecx
	cmp ecx, N		; HAVE TO SEE IF WE REACHED N!
	jb short lp1

;counting down:
	mov ecx, N
	mov eax, VALUE
	mov esi, lastptr + 20		; set pointer for predecrementing
lp1:
	lea esi, [esi - 20]		; predecrement by 20
	mov [esi], eax		; store VALUE
	dec ecx
	jg short lp1		; freebie test on 0 with no CMP!

The point being that you don't need the CMP opcode, and your loop runs 20% faster, because it's 20% shorter. The difference is NOT negligible, as said above. The tighter your loop, the bigger the difference. (And yes, I know perfectly well that I didn't have to change the pointer math, but it is just an example to show that the construction is the same.)

skyking wrote: you don't save that clock cycle on a modern compiler

It is not always wise to count on the compiler to save your @$$ from crappy coding. And in any case, programmers often DO use the index inside the loop, and then the compiler won't touch the code. But you can still code the loop to count up or down.
It is also not wise to look at "how fast some current CPU performs this opcode vs. that one" -- all opcodes are tending toward the limit of 1 opcode per calculation cycle. It is wisest to code for that.