Security - Capabilities, ACL

AR · Post by AR » Sat Aug 13, 2005 2:16 am

I've been reading a lot of the docs from the EROS/CoyotOS projects about Object Capabilities but I haven't been able to find specifically how the starting capabilities are issued.

The papers seem to imply that the capabilities are configured manually at install time which is obviously not acceptable for a desktop OS (the user is more often then not a "techno-idiot" but is also the Administrator at the same time), has anyone read this more thouroughly and found the specific mechanism?

I was thinking that you could just not require capabilities to access factories and then restrict those with Access Control Lists, I think this could work but I need to think about it some more. Mainly, this doesn't fulfill the "least priviledge" rule, that could be overcome with a "declaration of purpose" in the program headers (ie. "webbrowser", "filebrowser", "calculator", etc) but this is inflexible (A webbrowser with some new form of interactive content that isn't catered for in "webbrowser" for example, not to mention abuse of the mechanism by lying). Does anyone have a better idea?

AR · Post by AR » Tue Aug 16, 2005 8:28 am

Ok, I've thought about it some more and the ACL protected factory scheme appears to work and does enforce least priviledge to a certain extent when combined with the artificial limitation used in EROS/CoyotOS (16/32 capabilities maximum).

This is more flexible but less flexible at the same time although it does greatly encourage the programmer to break their program into logical objects since each object can have more capabilities.

Another problem, object model and object storage... As far as I can see, a straightforward COM would be the easiest way to go for the object model since that is also inherently distributable (for scaling to become a network cluster for instance) however storage is more of a concern. "Asymetric trust" states that the caller trusts the callee but not the other way around (the object is trusted not to stuff up the process but the object won't trust the process not to stuff up the object) which gets more complex when combined with the "caller/creator pays" rule, the object uses the process' memory for storing itself which costs the process for creating it (to prevent resource denial of service attacks - servers only create and perform operations on objects, they don't store them).

To that end I thought of simply throwing in a second address space, dubbed the "object storage space" which the process owns but has no read/write priviledges for (so 4GB total address space including objects), this in itself seems alright until I factored in dynamic storage, if the objects want to malloc/free then that data must be protected in the "object storage space" as well, but a global heap cannot be used (since objects don't trust each other either) meaning that each object will need to be either overallocated or relocated into larger chunks inside the address space (meaning a sort of realloc is used on the objects within the space) - so far I believe a preallocation of 2-4MB and "reallocing" as necessary should hopefully be sufficient.

But then we come back to capabilties, the capabilties now need to store an index within the "object storage space" which will require a lookup table be used on the address space to find the objects (since the realloc can rearrange and compact them) - apart from the multiple lookups which are certainly going to slow things down, this would seem to work in theory (the lookups can be reduced by a lookaside cache stored in the capability which can be used directly unless the address space is flagged as being changed from when the lookaside was last updated).

Now that that all works in theory (feel free to point out any gaping flaws, I'm confusing myself as I design this) the distribution support has become more complex, a marshalling interface in the local system can "pretend" to be the target of capabilities that refer to external computers across the network/internet which can marshal them into a TCP/IP RPC (most likely public key encrypted with a cluster and/or system private/public key pair) but a problem emerges from the "caller/creator pays" rule... The server should not store the object locally as that would open a denial of resource attack vector but the integrity of the data needs to be ensured that it won't be altered by a malicious client and sent back to the server - the use of a public/private internal system key should hopefully prevent this. However, we are now sending the object data back and forth across the network everytime an RPC is made, hopefully I should be able to deal with this by caching the object on the server (a global cache, if the cache is assaulted then it won't affect the servers ability to process objects, it will merely take longer).

Suggestions, comments, improvements?

Pype.Clicker · Post by **Pype.Clicker** » Wed Aug 17, 2005 3:00 am

To that end I thought of simply throwing in a second address space, dubbed the "object storage space" which the process owns but has no read/write priviledges for (so 4GB total address space including objects), this in itself seems alright until I factored in dynamic storage, if the objects want to malloc/free then that data must be protected in the "object storage space" as well, but a global heap cannot be used (since objects don't trust each other either) meaning that each object will need to be either overallocated or relocated into larger chunks inside the address space (meaning a sort of realloc is used on the objects within the space) - so far I believe a preallocation of 2-4MB and "reallocing" as necessary should hopefully be sufficient.

So basically, you're giving a "small address space" to every object (or group of objects that trust each other) that could modify themselves and you're then "thread tunnelling" when you have a call that involve several objects.
IIuc, this also mean that you will suffer a performance penalty every time you "switch" to another object-chunk-space (e.g. either you have to jump to another (cs,ds) pair or you have to change the r/w property of your page table so that the outgoing object's chunk can no longer be modified and the ingoing object's chunk can now be modified ...

But then we come back to capabilties, the capabilties now need to store an index within the "object storage space" which will require a lookup table be used on the address space to find the objects (since the realloc can rearrange and compact them) - apart from the multiple lookups which are certainly going to slow things down, this would seem to work in theory (the lookups can be reduced by a lookaside cache stored in the capability which can be used directly unless the address space is flagged as being changed from when the lookaside was last updated).

Yep, i'd be tempted to say "we should allow only N 'live' object references" in the object mapping table (N being small enough to fit the cache), and managing that table as if it was a "register window" (see RISC architecture): each object call will only preserve the 'arguments' mapping and has room for 'return' mapping. it also has 'local' mapping that will only be kept while accessing the current object, etc.

Alternatively, you could enforce "checkpoints" where objects can be re-arranged and forbid any movement of the objects between those checkpoints. Structures that need to keep references to objects for longer than in-between-checkpoints will be required to check the current address against the object mapping table only once and can then re-use the same reference until checkpoint is met.

AR · Post by AR » Wed Aug 17, 2005 7:19 am

Pype.Clicker wrote:IIuc, this also mean that you will suffer a performance penalty every time you "switch" to another object-chunk-space (e.g. either you have to jump to another (cs,ds) pair or you have to change the r/w property of your page table so that the outgoing object's chunk can no longer be modified and the ingoing object's chunk can now be modified ...

The method I was thinking of, since no object trusts any other object, was that the "address space" would simply be a repository of data - when you trigger a capability to call an object, the context is switched to the object in question and the object's data is then shoved in via shared memory from the relevant section of the "object storage space" (since arbitrary read/writes can't be permitted - the "no covert channels" rule).

Objects could be executed inside the Object Storage Space though, since the design exploits a normal address space, the first 1GB can be used for running the object, the rest of the data can be protected by marking it as supervisor in the page directory.

In my design you can still use libraries though so objects will tend to only be used at the lowest level, ie. you have libgui which supports all your interface controls and just renders an output bitmap which can then be shared to the window manager via the "window" object, the context switch penalty would cost too much to operate any higher up then this.

So far it seems that the only way to determine the depth of objects that should be used would seem to be to implement the system and benchmark it (eg. Each GUI control is a seperate object vs libgui managing events and controls and transferring bitmaps).

AR · Post by AR » Tue Aug 30, 2005 1:46 am

Ok, I've revised this somewhat.

I was considering the benefits of actually mapping all the code sections into the object storage space and then allowing objects to tunnel to each other through the kernel without needing to actually remap the address space in any way, this sounded reasonably good at first until I remembered that all the so-called "safe languages" are [semi-]interpreted meaning that they would require an interpreter be mapped in which would require library support in the objects and also requires manipulation of the "code" to JIT/interpret it which couldn't really be accomidated in this scheme.

I revised and went to back to the full process space switch but I intend to leverage small address spaces if possible (although I don't know how to isolate the 2 "tasks" from directly accessing each other on architectures without segmentation [ie. everything except x86]). This requires using semi-shared memory to drop the object being operated on inside the "host server"s address space before delivering a pseudo IPC message with a pointer to the object data. The object storage space is now simply a "bank" rather than real address space, this is merely to impose a maximum size of total objects restriction per process.

The trouble with this approach is that it isn't thread safe, having the process be operating on 2 objects at once means there is a possibility for cross sharing and capability escalation however my feelings on this are crossed in that the program is presumed to have decided the object was trustworthy to begin with meaning that it is trusted not to anyway but I still prefer the idea of enforcement but that would be prohibitive of JIT anyway which brings back to the original problem. (I'm not quite willing to have one instance of the server per process tree but that may be a plausible compromise)

However the main design problem I currently have is that capabilities are stored externally to the process yet the process is required to know the interface in order to call the functions, the main difficulty here is that the capabilities are given by other processes rather than being created directly as per [MS-]COM, therefore how could the program know what objects are available without resorting to an interface comparison of each object? Lookups won't work as there may be objects that are derivatives of a given interface as well as more than one of the same interface type.

To try and rationalize the problem more, the kernel's InvokeCapability system call will take some sort of identifier (at the moment this is merely an ID Number) along with the function name and a bunch of parameters to pass, the client will have a proxy, so if it was in C++ the proxy would be a class that looked like the "remote" object, the proxy would store the identifier internally to conceal the system call complexity from the program, now this is all well and good but the problem comes from creating this proxy class to begin with, naturally the program will only recognise certain interfaces and not others so not all interfaces can be proxied directly but who instantiates the classes and how? And where are they put?

This is where I'm quite foggy, my current [extremely vague] idea is that the program will have "hidden" variables of its supported interfaces, eg.

Code: Select all

IGuiWindow **IGuiWindows;
INetSocket **INetSockets;

The init code before main would determine the capabilities and create the proxies and store the pointers in these hidden arrays which the programmer can then look at during the program (Obviously in general useage you would recieve a GUI Factory rather than individual windows...), this however, would require utility functions for searching for particular port/protocol combinations in the NetSockets array which would obviously be preferable to avoid the additional complexity for the programmer. Also, additional capabilities could be added at runtime, for this the best I can think of is a sort of special signal that the runtime library handles automatically and adds a new proxy for the object to the arrays before signalling the program that a new capability is available (probably combined with a pointer to the proxy). This approach seems to be a bit too messy for me though.

Does anyone have a better idea for creation and management of proxies?

AR · Post by AR » Tue Aug 30, 2005 2:36 am

The other possible solution would be to leave the bootstrapping to the programmer, I'm not fond of repetitive bootstrap code but it would seem to be more efficient in this case.

I've also revised my capability model as well, I've found some more information on the subject and have determined that the capabilities are elected at install time but I have an idea for creating a sort of "advisor" which makes broad suggestions about whether a program is relatively safe or not, the designs refer to the shell being in possession of the full capability "portfolio" and allocating individual capabilities to processes based on their preconfigured settings. This is much more elegant and powerful then then the ACL-factory scheme I was thinking of earlier.

As far as I can see though, the "portfolio" is quite similar to current ACL model used in Windows/*NIX, which is probably a good thing. The OS structuring turns out something like:

Code: Select all

                  Kernel
                     |
  -------------------------------------
  |                  |                 |
User A             User B     System Services
  |                  |
Shell              Shell
                    /    \
               Prog 1    Prog 2
                /            \
            Sub Prog 1A    Sub Prog 2A
                                     \
                                 Sub Prog 2AA

The basic flow is each program starts subprograms but each subprogram can only be awarded a subset of the capabilities of its parent so no program in the tree can get higher than the root process which obviously prohibits damage to the system even if the user was always logged in as "root". (Also interestingly, the capability research documents have the model include "CPU Time" as a capability therefore the tree is allocated a time slice rather than the subprocess(es) which enforces fair use and prevents a process that spawns thousands of subprocesses from lagging the rest of the processes)

I haven't quite decided how to deal with subprocesses, whether the space bank of the parent should be used to store the subprocesses or to just treat them otherwise normally, the former is more secure and simplifies the fair sharing policy, the latter is more flexible but can allow a program to eat all the RAM by spawning a ridiculous amount of other processes, since subprocess spawning should be reasonably limited most of the time, the bank withdrawl seems better (I could permit manual tweaking of the size, IIRC Apache forks for every incoming request therefore it may need a larger space bank depending on server load).

Again, improvements, suggestions?

OSDev.org

Security - Capabilities, ACL

Security - Capabilities, ACL

Re:Security - Capabilities, ACL

Re:Security - Capabilities, ACL

Re:Security - Capabilities, ACL

Re:Security - Capabilities, ACL

Re:Security - Capabilities, ACL