Finding (free) PIDs

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
inx
Member
Member
Posts: 142
Joined: Wed Mar 05, 2008 12:52 am

Finding (free) PIDs

Post by inx »

Hello all. Honestly, I'm not sure whether this belongs in this category or in General Programming, and I'm also not sure how terribly n00bish this is, but nonetheless, I need advice.

I've gotten most of my multitasking code written, and it works, loading tasks as flat-binary images through GRUB, then constructing contexts and initial task states for them. My problem is one of efficiency. My task structures are stored as a linked list so I can easily use my kernel malloc()/free(), but while gaining that freedom, I've lost some efficiency in looking tasks up.

Currently, my function for finding a task by PID is something on the order of:
struct task *getTask(unsigned long pid)
{
struct task *temp;

temp = firstTask;

while(temp != lastTask) \
if(temp->pid == pid) \
return temp;

return NULL;
}

And my function for finding a free PID is something on the order of:

unsigned long getFreePid()
{
unsigned long pid;
struct task *temp;

for(pid = 1; pid < 0xFFFFFFFF; pid++) {
temp = getTask(pid);
if(temp == NULL) \
return pid;
}

return NULL;
}

Mostly, I'm just asking for opinions on whether this is grossly inefficient, or if it'll not really make much difference in the long run, and if it is grossly inefficient, do you have any suggestions for a way to speed it up without just doing a Linux-0.11-style fixed-size task array? Should I just drop the linked list and do an array of pointers to task structs, or would that not really gain me much other than offloading storage overhead from the task struct itself to a lookup table?

Thanks in advance.

EDIT: :-S I do have my code indented, it just stripped it during display. Guess it doesn't matter much for such a short code snippet, though.

EDIT2: Moved post to where I originally meant it to be, rather than Design & Theory. Feeling very n00bish today. :(
User avatar
Wave
Member
Member
Posts: 50
Joined: Sun Jan 20, 2008 5:51 am

Post by Wave »

Finding free PIDs is done easily if you just use the pointer to the process structure as the PID.

Looking up a process from the PID shouldn't be done from the pointer, though, as it's a security risk. Look through the list as before, or add a sorted array (dynamic malloced) that keeps track of valid PIDs. Only if the PID is in that list (use a binary search) then you can follow the PID as if it was a pointer.
Conway's Law: If you have four groups working on a compiler, you'll get a 4-pass compiler.
Melvin Conway
User avatar
inx
Member
Member
Posts: 142
Joined: Wed Mar 05, 2008 12:52 am

Post by inx »

Ah, didn't think of that. Thanks for the very useful reply. *off to finish that code*
skyking
Member
Member
Posts: 174
Joined: Sun Jan 06, 2008 8:41 am

Post by skyking »

Well, linear seeking is not very efficient when you have a lot of entries. Binary seeking is quite
more efficient, but why not use a hashtable to find the task struct? If the PIDs are not sparsely distributed you only have to have an array of task struct pointers.

To find a free one you only have to put recycled PIDs in a list and if theres no recycled PID you just take the next unused PID (new_pid = ++max_pid).
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

Hi,

Did you know that the POSIX spec specifies that no PID should be recycled at all? (until the pid wraps over 32bits). If you're following POSIX (not sure if you are or not), then your recycling code is mostly redundant. And I would reccommend a balanced binary search tree for your pid lookups. log2(n) performance guaranteed.
User avatar
inx
Member
Member
Posts: 142
Joined: Wed Mar 05, 2008 12:52 am

Post by inx »

So far I've been avoiding any redundancy in data like the plague because of my inherent overzealousness about memory usage. The way I've been rationalizing it is to say to myself that I'm trading memory usage for processor cycles so it'll be easier to jam more into a microcontroller some time in the future, but it's mostly habit. =p I think I will spring for a hash table or something similar, though. I know my malloc needs to be rewritten, that's using a linear search, first-fit algorithm that's horribly inefficient (when I was testing the malloc code by allocating 16^n-sized blocks until I ran out of RAM, it took about 10 minutes for Bochs running on an Athlon 700 to finish the task on a 16MB RAM VM), but that was just kind of a couple-hour hack to get past that part, and will be revisited later.

JamesM: I'm still debating that with myself. It's going to be an object-oriented microkernel with connection-based IPC using Unix-permissioned "ports". I am going to try and at least resemble POSIX with the servers I'll be writing on top of it, although a large part of that will be emulated in the libc. I am going to try and follow POSIX process management in the kernel.. Is there any particular reason you would not want to reuse PIDs? How much will it break in porting apps to the system if I do?
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Post by bewing »

From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.

And some OSes use PIDs to create unique temp filenames. So the PIDs need to be unique. It's not a big deal.
skyking
Member
Member
Posts: 174
Joined: Sun Jan 06, 2008 8:41 am

Post by skyking »

JamesM wrote:Hi,

Did you know that the POSIX spec specifies that no PID should be recycled at all? (until the pid wraps over 32bits).
Says where? Sounds broken since you have to be recycling them anyway when you run out of ids. Does it even mandate 32-bit pids?
bewing wrote: From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.
Except for extreme cases this difference would be negligable. Besides it's not always true, fx on MIPS given that you have registers to spare you win nothing by counting downwards. If you write your code in C (besides by doing so you probably don't care much about that 1 clockcycle save) you don't save that clock cycle on a modern compiler since it will, given that the index is not used otherwise, changed in order to be more efficient.

The creating unique filenames problem is not a big problem since you'll have to take care of the eventuallity that there already are filenames in the namespace where you trying to create unique filenames and if some other process is trying to create files in that namespace they are certainly not using their PID in the same way to construct the filename (since you don't recycle a PID before it's released).
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Post by JamesM »

Says where? Sounds broken since you have to be recycling them anyway when you run out of ids. Does it even mandate 32-bit pids?
Strange, I thought it said it on the fork() page, but it seems it doesn't. I definately saw it somewhere, that a new process should have the next available PID, sequentially. If I find it again I'll post it. No, it doesn't mandate 32-bit pids, it uses pid_t.

Yes, technically it is broken (from a "prove this process mathematically" point of view) but in real life how often does a 32bit pid wrap around? Imagine this scenario - A process finds another process (could be a daemon) and stores its pid. There could be several of these daemons active. That daemon goes down and another process is spawned, however the pid is reused for the new process. Now the original process will be holding a pointer (pid) to a different process than it thought it was originally, and no IPC-style calls will fail because that process does actually exist!

This is why pids are never reused until wraparound (in UNIX boxen)
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

Add to that the side effect that a POSIX system doesn't have to check for PID availability for the first 2^32 processes at all - it simply takes (last PID) + 1 as new PID until it wraps around. Now make the PID a 64bit field, and you have quite some time until you have to come up with a solution for that wrap-around.

Sorry. Too early in the morning. Brain hasn't yet engaged fully, so I don't have an elegant solution for the wrap-around at hand. ;-)
Every good solution is obvious once you've found it.
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Post by Jeko »

What method is the best to handle tasks?
A linked list, a dynamic array or other?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

Insufficient data. Redo from start.

Define "best". (Fastest running? Fastest implemented? Most flexible?)

Define environment. (Desktop? Server? Embedded?)

The answer, of course, is: "It's in the eye of the beholder."
Every good solution is obvious once you've found it.
User avatar
Wave
Member
Member
Posts: 50
Joined: Sun Jan 20, 2008 5:51 am

Post by Wave »

From a really basic efficiency standpoint, you should really try to always count loops down to 0, rather than up to N.
Can you explain to me why that is? I always thought counting up would be faster because add is faster than sub, but the x86 has the loop instruction that subs automatically. It is however slower than sub cmp jmp anyways.
bewing wrote:And some OSes use PIDs to create unique temp filenames. So the PIDs need to be unique. It's not a big deal.
It will be unique if he uses a pointer unless his malloc function really messes up big time, in which case he's got bigger problems.
Conway's Law: If you have four groups working on a compiler, you'll get a 4-pass compiler.
Melvin Conway
User avatar
Jeko
Member
Member
Posts: 500
Joined: Fri Mar 17, 2006 12:00 am
Location: Napoli, Italy

Post by Jeko »

Solar wrote:Insufficient data. Redo from start.

Define "best". (Fastest running? Fastest implemented? Most flexible?)

Define environment. (Desktop? Server? Embedded?)

The answer, of course, is: "It's in the eye of the beholder."
Fastest running for Desktop
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Post by bewing »

Wave wrote: Can you explain to me why that is? I always thought counting up would be faster because add is faster than sub, but the x86 has the loop instruction that subs automatically. It is however slower than sub cmp jmp anyways.
All CPUs have the same basic set of flags to test on, when it comes to arithmetic operations. Carry, Zero, Sign, Overflow. Immediately after an arithmetic opcode, you can sometimes test them, and get a useful result.
For the case of addition/subtraction all these flags flip when crossing the boundary between 1, 0, -1. The Sign flag also flips from MSB 0x7f to 0x80. These are the only efficient values to test on. If you test on any other value, you need to do an extra CMP opcode -- which calculates an extra subtraction.

Example: you have a loop, storing a value N times, in every 5th dword.
Example uses onetrip do-while construction. Results for a for or while loop would be similar, but the loop construction is a little more complex.

Code: Select all

;counting up:
	xor ecx, ecx		; set ecx to 0 the fast way
	mov eax, VALUE
	mov esi, startptr
lp1:
	mov [esi], eax		; store VALUE in ptr
	lea esi, [esi + 20]		; an easy way to add 20
	inc ecx
	cmp ecx, N		; HAVE TO SEE IF WE REACHED N!
	jb short lp1

;counting down:
	mov ecx, N
	mov eax, VALUE
	mov esi, lastptr + 20		; set pointer for predecrementing
lp1:
	lea esi, [esi - 20]		; predecrement by 20
	mov [esi], eax		; store VALUE
	dec ecx
	jg short lp1		; freebie test on 0 with no CMP!
The point being that you don't need the CMP opcode, and your loop runs 20% faster, because it's 20% shorter. The difference is NOT negligible, as said above. The tighter your loop, the bigger the difference. (And yes, I know perfectly well that I didn't have to change the pointer math, but it is just an example to show that the construction is the same.)
skyking wrote: you don't save that clock cycle on a modern compiler
It is not always wise to count on the compiler to save your @$$ from crappy coding. And in any case, programmers often DO use the index inside the loop, and then the compiler won't touch the code. But you can still code the loop to count up or down.
It is also not wise to look at "how fast some current CPU performs this opcode vs. that one" -- all opcodes are tending toward the limit of 1 opcode per calculation cycle. It is wisest to code for that.
Post Reply