How can a secure/stable OS be designed that uses DMA?
How can a secure/stable OS be designed that uses DMA?
I'm facing a seemingly difficult problem in trying to let drivers use DMA in user space. The problem is that DMA transfers involve physical memory which makes it easy for a buggy/malicious driver to damage or exploit the system. I'm wondering if there are any solutions that allow devices to exist without compromising the state of the entire system...
Depends on where you have your DMA code. If it is in kernel space, it is simple enough to have the kernel check the process's permissions for that particular page/region.
If you wish to have your DMA code in user space, which I suspect is the case, it is a little more complex and there are likely many solutions. My OS has a server for DMA transfers. The driver which wants to request a DMA transfer sends the DMA server the region it wishes DMA to operate on via shared memory. The DMA server checks the physical address of the pages sent to it, does its stuff, then signals the driver that the transfer has completed. As the DMA server is the only process with access to the needed I/O ports, so long as it is not buggy/malicious, the system is secure.
From this I'm sure you can think up your own system best suited to your OS.
If you wish to have your DMA code in user space, which I suspect is the case, it is a little more complex and there are likely many solutions. My OS has a server for DMA transfers. The driver which wants to request a DMA transfer sends the DMA server the region it wishes DMA to operate on via shared memory. The DMA server checks the physical address of the pages sent to it, does its stuff, then signals the driver that the transfer has completed. As the DMA server is the only process with access to the needed I/O ports, so long as it is not buggy/malicious, the system is secure.
From this I'm sure you can think up your own system best suited to your OS.
Re: How can a secure/stable OS be designed that uses DMA?
Hi,
If the linear memory manager has a special "allocate DMA buffer" function and if this function marks the resulting page table entries as "suitable for DMA", then when the kernel's "start a DMA transfer" function is called it can check the linear address to find the correct physical address for the transfer, and make sure that all pages that are involved in the transfer were marked as "suitable for DMA" by the linear memory manager (and that they are contiguous, in case the caller allocated 2 seperate DMA buffers that happen to be adjacent in linear memory).
This means it'd be impossible to use ISA DMA to transfer data to/from pages that weren't allocated specifically for this purpose.
In addition, the linear memory manager needs to be careful about freeing pages that are marked as "suitable for DMA". For example if a device driver starts a DMA transfer and then frees the pages being used, then you don't want other code to be able to allocate the freed pages while the DMA transfer is still happening. In this case the linear memory manager needs to check to make sure there is no DMA transfer happening, and if there is, either stop the DMA transfer before freeing the pages or put the DMA pages into a "cooling off" queue until the DMA transfer completes.
For PCI bus mastering devices, you're mostly screwed. The only "nice" way to do it is to use the IOMMU that AMD recently introduced (which isn't present on a lot of computers).
There's also problems freeing pages while a bus-master transfer is occuring (for e.g. if the network card driver crashes in the middle of something, and you're adding it's pages to the "free page pool"). In this case special "allocate page/s for DMA" function/s and a cooling off queue for freed DMA pages might help (e.g. any pages that may have been part of a bus-master transfer are left in the cooling off queue for 30 seconds or so before they're freed). You can force device driver programmers to use your special "allocate page/s for DMA" function/s by making sure that these function are the only way software can get any physical addresses. Of course this doesn't prevent malicious or buggy PCI device driver code from using bus-mastering to stuff the system up. The idea here is to minimize risk when it's impossible to prevent potential problems.
Cheers,
Brendan
For ISA DMA there is...deadmutex wrote:I'm facing a seemingly difficult problem in trying to let drivers use DMA in user space. The problem is that DMA transfers involve physical memory which makes it easy for a buggy/malicious driver to damage or exploit the system. I'm wondering if there are any solutions that allow devices to exist without compromising the state of the entire system...
If the linear memory manager has a special "allocate DMA buffer" function and if this function marks the resulting page table entries as "suitable for DMA", then when the kernel's "start a DMA transfer" function is called it can check the linear address to find the correct physical address for the transfer, and make sure that all pages that are involved in the transfer were marked as "suitable for DMA" by the linear memory manager (and that they are contiguous, in case the caller allocated 2 seperate DMA buffers that happen to be adjacent in linear memory).
This means it'd be impossible to use ISA DMA to transfer data to/from pages that weren't allocated specifically for this purpose.
In addition, the linear memory manager needs to be careful about freeing pages that are marked as "suitable for DMA". For example if a device driver starts a DMA transfer and then frees the pages being used, then you don't want other code to be able to allocate the freed pages while the DMA transfer is still happening. In this case the linear memory manager needs to check to make sure there is no DMA transfer happening, and if there is, either stop the DMA transfer before freeing the pages or put the DMA pages into a "cooling off" queue until the DMA transfer completes.
For PCI bus mastering devices, you're mostly screwed. The only "nice" way to do it is to use the IOMMU that AMD recently introduced (which isn't present on a lot of computers).
There's also problems freeing pages while a bus-master transfer is occuring (for e.g. if the network card driver crashes in the middle of something, and you're adding it's pages to the "free page pool"). In this case special "allocate page/s for DMA" function/s and a cooling off queue for freed DMA pages might help (e.g. any pages that may have been part of a bus-master transfer are left in the cooling off queue for 30 seconds or so before they're freed). You can force device driver programmers to use your special "allocate page/s for DMA" function/s by making sure that these function are the only way software can get any physical addresses. Of course this doesn't prevent malicious or buggy PCI device driver code from using bus-mastering to stuff the system up. The idea here is to minimize risk when it's impossible to prevent potential problems.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
- Kevin McGuire
- Member
- Posts: 843
- Joined: Tue Nov 09, 2004 12:00 am
- Location: United States
- Contact:
I agree with Brendan's idea about:
Where the DMA driver or provider will ensure that transfers are only done to this special allocated region instead of allowing the client of the DMA service to specify a arbitrary address to be used.Brendan wrote: If the linear memory manager has a special "allocate DMA buffer" function ..
I'm facing a similar issue, but my current conception of my "solution" goes like this: unknown drivers are always loaded into userspace on "probation" -- with no direct access to I/O ports, or DMA. They have to request such services through a trusted service. This makes untrusted drivers much slower, but all malicious/buggy activity is easy to spot and report. If a superuser becomes convinced that the code is not malicious/buggy ... then the superuser can accept the driver as "trusted". At that point, the kernelmode version of the driver is placed in kernelmode with access to the resources that it has registered for. If, after that, the driver exhibits bad behavior, the superuser has only themself to blame.
And you cannot protect any OS from the activity of an idiot superuser.
And you cannot protect any OS from the activity of an idiot superuser.
Re: How can a secure/stable OS be designed that uses DMA?
I can't thank you enough Brendan for bringing this up. I've just noticed a moderately severe exploit with my current implementation. I currently trust the this problem to the stability of the DMA server. As long as it is running there exists at least one reference count for the region being written to, it will not be free()'d. The problem will come when I implement signals. SIGKILL, which cannot be caught, will remove the reference to the region, possibly freeing the pages.Brendan wrote:There's also problems freeing pages while a bus-master transfer is occurring
I'm thinking that I will have to share init's ability to ignore SIGKILLs with the DMA server. Disappointing.
Re: How can a secure/stable OS be designed that uses DMA?
Why an IOMMU and not the regular MMU? i.e. why a different page table and not the main page table? why do you think IOMMUs are designed like that?Brendan wrote: For PCI bus mastering devices, you're mostly screwed. The only "nice" way to do it is to use the IOMMU that AMD recently introduced (which isn't present on a lot of computers).
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: How can a secure/stable OS be designed that uses DMA?
The MMU is only separating the processor from main memory. Hardware devices can access all of memory without any protection since there is no unit checking that behaviour. An IOMMU sits between devices and main memory and can therefore check on the bus transactions occurring between devices and main memory.axilmar wrote:Why an IOMMU and not the regular MMU? i.e. why a different page table and not the main page table? why do you think IOMMUs are designed like that?Brendan wrote: For PCI bus mastering devices, you're mostly screwed. The only "nice" way to do it is to use the IOMMU that AMD recently introduced (which isn't present on a lot of computers).
There's a more complete introduction on wikipedia
-
- Member
- Posts: 170
- Joined: Wed Jul 18, 2007 5:51 am
Hi,
Does open source restrict problems caused by buggy code? In theory, thousands of times more people checking the code make it thousands of times more likely that a bug will be spotted. In practice very few people actually do read the code unless they're the ones writing/maintaining it. For example, I'm a programmer who's quite interested in OS programming, I've been using Gentoo for years and I've got (several versions of) the Linux kernel source code sitting on my hard drive. Despite this I've read less than 200 lines of the (over 6 million lines of) source code. Did I do a detailed analysis of the SCSI driver that this kernel uses on my machine to find out if there's a bug somewhere? No. If I did do a detailed analysis would I actually find a bug if there was one? Probably not - I could spend months and still might not find the bug (if there was one). So, does open source help find the bugs? Mostly it's just marketting hype - most bugs are found when users complain that something didn't work (and the authors find and fix the bug). The users don't find the bugs themselves.
Does open source give you immediate feedback when something goes wrong (including what went wrong and exactly which piece of code the problem is in) to make it extremely easy to debug? No. For something like an intermittent bug in a device driver's DMA handling, you might just have random processes crashing occasionally where no-one has any idea which piece of code is causing the problem.
Imagine something like an "off by one" error in a network driver, where it transfers 4097 bytes instead of 4096 bytes, trashing one byte in next physical page Sometimes the next physical page will be free and nothing happens, sometimes an unused byte is trashed and nothing happens, sometimes an unimportant byte is trashed (a pixel in some graphics data might change colour, a sound might have an unnoticeable "click", etc), sometimes some random process crashes unexpectedly, and sometimes the kernel behaves erratically. If all you see is the symptoms, how would you find this bug? I'd probably start by doing RAM testing for 2 days, then I guess I'd try removing as many devices and drivers as I could to see if the problem goes away when a specific device is removed. If the OS crashes once per day (on average), it could take a month or more just to find out which device driver has the bug (without actually finding the bug).
Alternatively, imagine a nice dialog box that pops up as soon as the bug occurs, saying "The network driver tried to do a DMA transfer into page it doesn't have access to and has been terminated." and asking you if you want to send an automated bug report, or view details (register contents, etc), or do a core dump of the process, etc. Even an ugly blue screen of death would be much much more useful than trying to guess what happened with no information to rely on.
Here's an interesting summary I found (part of a University course AFAIK). Some interesting quotes:
Put it like this, if open source does help, then combining open source and protection will help more.
Cheers,
Brendan
In general there's 3 reasons for security/protection:tom9876543 wrote:There is an easy solution... require open source drivers. This solution is also "portable" - it works on any CPU!
- - restricting problems caused by malicious code
- restricting problems caused by buggy code
- getting immediate feedback when something goes wrong (including what went wrong and exactly which piece of code the problem is in) to make it extremely easy to debug.
Does open source restrict problems caused by buggy code? In theory, thousands of times more people checking the code make it thousands of times more likely that a bug will be spotted. In practice very few people actually do read the code unless they're the ones writing/maintaining it. For example, I'm a programmer who's quite interested in OS programming, I've been using Gentoo for years and I've got (several versions of) the Linux kernel source code sitting on my hard drive. Despite this I've read less than 200 lines of the (over 6 million lines of) source code. Did I do a detailed analysis of the SCSI driver that this kernel uses on my machine to find out if there's a bug somewhere? No. If I did do a detailed analysis would I actually find a bug if there was one? Probably not - I could spend months and still might not find the bug (if there was one). So, does open source help find the bugs? Mostly it's just marketting hype - most bugs are found when users complain that something didn't work (and the authors find and fix the bug). The users don't find the bugs themselves.
Does open source give you immediate feedback when something goes wrong (including what went wrong and exactly which piece of code the problem is in) to make it extremely easy to debug? No. For something like an intermittent bug in a device driver's DMA handling, you might just have random processes crashing occasionally where no-one has any idea which piece of code is causing the problem.
Imagine something like an "off by one" error in a network driver, where it transfers 4097 bytes instead of 4096 bytes, trashing one byte in next physical page Sometimes the next physical page will be free and nothing happens, sometimes an unused byte is trashed and nothing happens, sometimes an unimportant byte is trashed (a pixel in some graphics data might change colour, a sound might have an unnoticeable "click", etc), sometimes some random process crashes unexpectedly, and sometimes the kernel behaves erratically. If all you see is the symptoms, how would you find this bug? I'd probably start by doing RAM testing for 2 days, then I guess I'd try removing as many devices and drivers as I could to see if the problem goes away when a specific device is removed. If the OS crashes once per day (on average), it could take a month or more just to find out which device driver has the bug (without actually finding the bug).
Alternatively, imagine a nice dialog box that pops up as soon as the bug occurs, saying "The network driver tried to do a DMA transfer into page it doesn't have access to and has been terminated." and asking you if you want to send an automated bug report, or view details (register contents, etc), or do a core dump of the process, etc. Even an ugly blue screen of death would be much much more useful than trying to guess what happened with no information to rely on.
Trying to trust millions of lines of code seems like a waste of time to me....tom9876543 wrote:Trying to make an operating system with "untrusted" drivers seems like a waste of time to me.
Here's an interesting summary I found (part of a University course AFAIK). Some interesting quotes:
Device drivers are the biggest cause of crashes
– Drivers cause 85% of Windows XP crashes
– Drivers in Linux are 7 times buggier than the kernel
This may or may not be accurate, but think about it. Millions of lines of code (that are "less well tested" because most people don't use most device drivers), written by thousands of different people (with varying skills), running in kernel mode with no protection at all? IMHO this sounds entirely insane, but it also describes most modern OSs fairly well.• 10s of thousands of device drivers exist
– Over 35K drivers on Win/XP!
Put it like this, if open source does help, then combining open source and protection will help more.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.