OSDev.org

Posted: **Wed Sep 07, 2016 5:07 pm**

Hey guys

So Im at the point to implement handles to my OS to gain abstraction and security for my syscalls/apis.
I thought a long time about whats actually the best way to create handles.

The best I came up with is a map or an array list which contains the handle information like the handle type, access, refcount and the pointer to the real encapsulated data.

But how does one create handles reliable and fast ( something between O(1) and O(ln(n)) ) ?
I'd have to loop through the map/array to find free slots.

This of course can be speeded up by creating a queue which holds deleted/empty slot numbers, but thats still not perfect in my opinion.
The handle list is then saved for each process/context seperately to avoid bruteforcing the indexes.

Hope someone can enlight me how this is done on Windows or Linux.

(Wasnt sure if this is more OS theory, if this is the wrong forum please move the thread if needed)

Posted: **Wed Sep 07, 2016 7:26 pm**

You could use the queue idea, but store it in-place in the mapping array, as a kind of index-based linked list. Any per-core/thread/etc caching can happen on top of that.

Posted: **Thu Sep 08, 2016 1:57 am**

Supposing you need to map a handle to some memory location (the actual resource), you could have an array of pointers (the amount of positions is the maximum number of handles). If some handle is empty, it points to the next empty handle in that array (a linked list of empty handles). Otherwise, it points to the resource. With an additional pointer to the first empty handle slot available, you could have an efficient (fast, at least) handle mechanism (for 512K handles you would need 2MiB of memory for the array on a 32-bits machine).

Posted: **Thu Sep 08, 2016 2:14 am**

What you really need is a dynamic array that's O(1) for both lookup and insert. Unfortunately there isn't one.

In the end I settled on an multi-tier array class. It has 8 items at the top and when that fills it becomes 2-level with 8 x 64 items. When they are all used a third level is created with 8 x 64 x 64 items. Thus the array can contains a maximum of 32k entries. It's quick in lookup (maximum 3 pointers) and reasonable on insert because each segment contains a num_items field so you don't go looking for space when there isn't any. The pieces are allocated with my slab allocator so that's pretty quick.
The truth is that most arrays of threads, timers, fds only contain a few items so very often the array of 8 items is enough.

I use an entirely different scheme for pids.

Code: Select all

 class Karray
{
public:
    Karray();

    int size();
    int insert(int n, void *elem);
    void *remove(int n);
    void *read(int n);
    void *iterate(int n, int *p_idnum);
    void *clean();
    int walk();

private:
    struct karraypiece
    {
        // Each piece holds 64 pointers
        void        *item[64];
        int         num_items;
    };

    void *ptr(int n, int del);

    void *m_items[8];
    int m_num_items;
    int m_level;
};

I'm not sure whether it was worth it but it was fun to write.

Posted: **Thu Sep 08, 2016 3:19 am**

Thanks alot all, I will think about the methods you suggested

I've got one more question:
What happens if we have a handle to lets say a process, and the process gets closed.
Do we have to enumerate all other processes and loop all handles to find the closed process and remove it?
Because if we dont remove it from the list, then there is a chance the next process gets the same handle and we still have access to it because the data pointers may be the same?

Posted: **Thu Sep 08, 2016 3:32 am**

Ch4ozz, unix avoids this problem (somewhat) by not recycling pids until it loops right through 32000 of them.

Posted: **Thu Sep 08, 2016 5:30 am**

Ch4ozz wrote:Thanks alot all, I will think about the methods you suggested

I've got one more question:
What happens if we have a handle to lets say a process, and the process gets closed.
Do we have to enumerate all other processes and loop all handles to find the closed process and remove it?
Because if we dont remove it from the list, then there is a chance the next process gets the same handle and we still have access to it because the data pointers may be the same?

I suppose the best to do is remove the process handle from the list when the process is closed (e.g. some destructor function could invoke a syscall to do that), since you still have the process ID. Actually, this destructor could close all other resources allocated by that process (e.g. files, etc.).

Posted: **Thu Sep 08, 2016 6:33 am**

Handles are resources that need to be managed, just like memory or threads. If I were you, I'd try to use the same type of mechanism for all of these resources, if not the exact same mechanism.

Although I haven't gotten quite that far, I'm going to end up with the exact same problem, because I'm treating every OS managed resource as objects, which will, at some point need to be garbage collected (or defragmented if they are on disk). My plan is to use a single "object manager" that will be responsible for allocating / deallocating memory, and garbage collecting objects that are no longer needed both in memory and on disk. I may end up having to give every object a handle, so that they can be referenced by other objects, even if they get moved around in memory, which would put me right where you are at, now.

The good news is that allocating and deallocating handles is a fairly simple function, and it doesn't really matter how it works, as long as it works, so you can start off with a function that simply creates a huge array, and later replace it entirely with a function that is more efficient.

EDIT: One thought that just occurred to me is that you could, technically, use a memory address as a handle, as long as that memory address is only accessible by the kernel. It could even be a virtual memory address, so that it could be moved around in physical memory while maintaining the same virtual address. This would let you essentially store your "handles" in your page tables. I'm not sure how well this would work, but I may give it a try when I get to that point.

Posted: **Thu Sep 08, 2016 6:39 am**

gerryg400 wrote:In the end I settled on an multi-tier array class. It has 8 items at the top and when that fills it becomes 2-level with 8 x 64 items. When they are all used a third level is created with 8 x 64 x 64 items. Thus the array can contains a maximum of 32k entries. It's quick in lookup (maximum 3 pointers) and reasonable on insert because each segment contains a num_items field so you don't go looking for space when there isn't any. The pieces are allocated with my slab allocator so that's pretty quick.

In other words, if you would end up iterating through a whole list, you usually use some kind of tree instead.

Posted: **Thu Sep 08, 2016 7:11 am**

SpyderTL wrote:
EDIT: One thought that just occurred to me is that you could, technically, use a memory address as a handle, as long as that memory address is only accessible by the kernel. It could even be a virtual memory address, so that it could be moved around in physical memory while maintaining the same virtual address. This would let you essentially store your "handles" in your page tables. I'm not sure how well this would work, but I may give it a try when I get to that point.

Application Handles in early Windows versions were precisely this: simply the address of the execution instance's stack segment.

Posted: **Thu Sep 08, 2016 8:08 am**

SpyderTL wrote:EDIT: One thought that just occurred to me is that you could, technically, use a memory address as a handle, as long as that memory address is only accessible by the kernel. It could even be a virtual memory address, so that it could be moved around in physical memory while maintaining the same virtual address. This would let you essentially store your "handles" in your page tables. I'm not sure how well this would work, but I may give it a try when I get to that point.

This seems interesting, however I think could be a problem for shared resources: you have just one handle for that resource. All process will have access to that resource with this unique handle (which maps directly to the resource memory) and you would need to implement in the resource itself some kind of control to avoid unauthorized access. Associating handles (e.g. integers) to the resource pointer, you could provide different handles for the same resource for different process and the access control could be implemented in the handle itself (e.g. additional field indicating the process ID associated to that handle). This way could be easier to isolate the resource from the handle details.

Scenario: I may want to share the resource A between process 1 and 2, but not with 3. But if 3 find out the handle, it could represent a security problem?

Posted: **Thu Sep 08, 2016 9:10 am**

After reading all answers carefully I decided to go with the idea SpyderTL had:
I will use a map and the address of the data itself is also the key to the map.
Only one problem exists when using this method: Having more then 1 handle to the same object.

To solve this issue (together with security issues) I will create the map for each process seperately. (This sadly ends up as O(n) for enumerating all processes but I think its fine)
The next step is the "upgrading" of the access of those handles.
If a process opens a window handle to Window A with the access READ_ONLY and then wants a new handle with access READ_WRITE I will upgrade the existing handle.
Sharing handles is easy too then, I can simply copy the handle over to the map of the second process.

Thanks alot all

EDIT: This may be the best approach (in trade of some memory)

Handle map:
- stored: globally once
- key: pointer
- task: keeps the handles itself (type, access, refcount, data) also contains a list with all processes which have access to the handle

Assignment map:
- stored: in each process
- key: pointer
- task: keeps the handle addresses and simply tracks which handle was opened (useful for process destruction)

Posted: **Sat Sep 10, 2016 2:42 am**

There is a huge problem with that. Suppose that there is a service process whose job is to do some lengthy operation and then write the result into a file specified by the caller by a handle. So user A creates a file in a directory that is readable by everyone, and then invokes the service to have it fill it with some contents. Meanwhile, a mischievous user B notices that an empty file has been created, so he opens it and then invokes the service himself with different parameters. Since it's the same file, the service just receives the same handle again, upgraded with read/write access. B is therefore successful in overwriting A's file with something of B's choosing. While A intended to download a set of slides to use in an upcoming important presentation, he now finds himself the proud owner of a video depicting male body parts in motion.

It's better to stick with what every other OS does and just use unique meaningless numbers. Also, with a free handle stack, both insertion and lookup is in fact O(1).

And needless to say, an object must not be deleted before all its references are gone. That does not mean you should enumerate processes and sneakily close all handles pointing to an object without the owning processes' knowledge. That will only lead to confusion when a process reuses the handle and then uses the old handle and accesses something entirely different by surprise. Same as with every other object, a process can't just be deleted just because it has terminated. You can delete its handle table, but the process object itself must stay in memory until everyone is done using it.

Posted: **Sat Sep 10, 2016 6:04 pm**

Linux and windows use trie-like structure for the mapping. They use reference counting for tracking if an object is in use. Also, you can have list of free handles indeed - you don't need to scan the map for "free slots."

The access granted is not part of the handle. It is part of the descriptor (kernel memory object) that the handle points to (indirectly through the map) and which descriptor gets created from an open-like operation (open the file, open processes by id, etc.) Userland gets the handle number, and kernel land stores the permissions in that object. This descriptor may also hold state like read/write pointer, socket attributes for sockets, etc. Multiple handles/fd numbers may refer to the same internal descriptor through the map (if you have handle duplication/forking) which will also be reflected in higher reference count.

The descriptor object may refer to yet another object, which represents the state of the resource independent of the particular open-operation used to get the handle - the file storage metadata cache for example as will be seen by all processes, or the process object execution state. This last object will be referred to by all descriptors for that resource jointly and must have single instance. It must utilize reference counting as well and may require some valid postmortem state. The process may have terminated, but still have handles open for it. Closing the handles after the object ceases to function in the sense of freeing them will crash the application. Just marking them somehow in the descriptor may work, although the process/thread object can keep information such as its exit code for other processes to read.

OSDev.org

Implementing Handles

Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles

Re: Implementing Handles