Multi-core CPUS

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
jcmatias
Posts: 11
Joined: Mon Nov 08, 2004 12:00 am
Location: Ribeirao Preto SP Brasil

Multi-core CPUS

Post by jcmatias »

A lot of questions

1- In Multi-core CPUs its possible one of then run in real mode and another in long-mode or protected mode?
2- Where I can find information about APIC ?
3- Paging is very slowly. There are any way to avoid paigin in long-mode ?
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: Multi-core CPUS

Post by AJ »

Hi,
jcmatias wrote:1- In Multi-core CPUs its possible one of then run in real mode and another in long-mode or protected mode?
Yes, but it sounds like a nightmare to keep track of what's going on.
2- Where I can find information about APIC ?
1. APIC
2. http://www.google.com
3- Paging is very slowly. There are any way to avoid paigin in long-mode ?
No.

Cheers,
Adam
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Re: Multi-core CPUS

Post by Colonel Kernel »

AJ wrote:
3- Paging is very slowly. There are any way to avoid paigin in long-mode ?
No.
Really? I find that surprising... Why wouldn't you be able to disable paging just like in protected mode (clear PG bit in CR0)?
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
Hyperdrive
Member
Member
Posts: 93
Joined: Mon Nov 24, 2008 9:13 am

Re: Multi-core CPUS

Post by Hyperdrive »

Colonel Kernel wrote:
AJ wrote:
3- Paging is very slowly. There are any way to avoid paigin in long-mode ?
No.
Really? I find that surprising... Why wouldn't you be able to disable paging just like in protected mode (clear PG bit in CR0)?
Because 64 bit mode requires paging. From the Intel manual:
Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 wrote:8.8.5.4 Switching Out of IA-32e Mode Operation
To return from IA-32e mode to paged-protected mode operation. Operating systems must use the following sequence:
1.Switch to compatibility mode.
2.Deactivate IA-32e mode by clearing CR0.PG = 0. This causes the processor to set IA32_EFER.LMA = 0. The MOV CR0 instruction used to disable paging and subsequent instructions must be located in an identity-mapped page.
3.Load CR3 with the physical base address of the legacy page-table-directory base address.
4.Disable IA-32e mode by setting IA32_EFER.LME = 0.
5.Enable legacy paged-protected mode by setting CR0.PG = 1
6.A branch instruction must follow the MOV CR0 that enables paging. Both the MOV CR0 and the branch instruction must be located in an identity-mapped page.
EDIT: Maybe this is the better quote:

[quote=""Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, section 8.8.5 'Initializing IA-32e Mode'"]The processor performs 64-bit mode consistency checks whenever software attempts to modify any of the enable bits directly involved in activating IA-32e mode (IA32_EFER.LME, CR0.PG, and CR4.PAE). It will generate a general protection fault (#GP) if consistency checks fail. 64-bit mode consistency checks ensure that the processor does not enter an undefined mode or state with unpredictable behavior.
64-bit mode consistency checks fail in the following circumstances:
  • An attempt is made to enable or disable IA-32e mode while paging is enabled.
  • IA-32e mode is enabled and an attempt is made to enable paging prior to enabling physical-address extensions (PAE).
  • IA-32e mode is active and an attempt is made to disable physical-address extensions (PAE).
  • If the current CS has the L-bit set on an attempt to activate IA-32e mode.
  • If the TR contains a 16-bit TSS.
[/quote]
--TS
Last edited by Hyperdrive on Fri Feb 20, 2009 11:16 am, edited 1 time in total.
User avatar
AJ
Member
Member
Posts: 2646
Joined: Sun Oct 22, 2006 7:01 am
Location: Devon, UK
Contact:

Re: Multi-core CPUS

Post by AJ »

Hi,

Paging is an integral part of long mode. I guess this may come from the fact that the potential physical and virtual address spaces can be quite different sizes, but am not certain of that. Anyway - disable paging and you are not in long mode.

Cheers,
Adam

[edit - too late!]
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Re: Multi-core CPUS

Post by Colonel Kernel »

Bummer. :( I think it's better to not have to pay for what you might not use (e.g. -- OSes that use software isolated processes).
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Multi-core CPUS

Post by Brendan »

Hi,
jcmatias wrote:3- Paging is very slowly. There are any way to avoid paigin in long-mode ?
Paging has a little overhead (RAM used by paging structures, and time used by TLB misses), but if you don't use paging you end up dealing with free RAM fragmentation, and major problems supporting things like swap space and memory mapped files efficiently, and you can't do other tricks (like copying page table entries instead of copying data, or using allocation on demand and/or copy on write to reduce RAM usage, etc). In practice the overhead of not using paging can cost a lot more than the overhead of using paging.
Colonel Kernel wrote:Bummer. :( I think it's better to not have to pay for what you might not use (e.g. -- OSes that use software isolated processes).
Hehehe. Can you even imagine a research paper that says "Yeah, we spent years developing this technique and optimizing it and found out that it's good in theory but blows chunks in practice"...

It's far too tempting for researchers to end up with biased results; like getting performance statistics from the most IPC intensive benchmark they can find (and not bothering to provide the same statistics from something that isn't IPC intensive); or neglecting to mention how well the technique scales and only testing on single-CPU; or "disabling" software isolation by disabling generation of array bound and other checks in the compiler when they compile one process, without doing anything to remove software isolation overhead from the OS's memory management (garbage collector, exchange heap, etc) and then saying "software isolation only costed 5%"; or by claiming that paging costs 6.3% because of TLB misses, etc without implementing any of the things that use paging to reduce overhead (and without mentioning RAM usage overheads anywhere at all); or by neglecting to mention that they've completely failed to find a sane way to implement swap space.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Re: Multi-core CPUS

Post by Colonel Kernel »

Brendan wrote:Hehehe. Can you even imagine a research paper that says "Yeah, we spent years developing this technique and optimizing it and found out that it's good in theory but blows chunks in practice"...
So when did you develop a sense of sarcasm? :P
Brendan wrote:It's far too tempting for researchers to end up with biased results
That's what peer review is for.
Brendan wrote:like getting performance statistics from the most IPC intensive benchmark they can find (and not bothering to provide the same statistics from something that isn't IPC intensive)
That's L4, and I agree that it's pretty silly to try and extrapolate overall performance from just IPC performance.

From here on out I'm going to assume your comments are about Singularity...
Brendan wrote:or neglecting to mention how well the technique scales and only testing on single-CPU
Performance and scalability are not actually goals of the Singularity project. The primary goal was dependability. AFAIK scalability is not a goal for L4 either (although I'm sure the researchers responsible would like it to scale).

Memory management data structures are a pretty big source of contention in today's commercial OSes, BTW. Getting rid of at least some of them can't hurt.
Brendan wrote:or "disabling" software isolation by disabling generation of array bound and other checks in the compiler when they compile one process, without doing anything to remove software isolation overhead from the OS's memory management (garbage collector, exchange heap, etc) and then saying "software isolation only costed 5%"
Maybe you read something I didn't. The impression I got from the description of the experiment was that they disabled run-time checks for all generated code (including the 95% of the OS itself that is safe code), not just for a single process. The benchmarks they ran include many processes (web server, file system, network stack, etc.). If you have a source, I'd like to see it. Otherwise I call BS on this one.
Brendan wrote:or by claiming that paging costs 6.3% because of TLB misses, etc without implementing any of the things that use paging to reduce overhead
The claim, which is pretty straightforward, is that their own system ran 6.3% slower with paging enabled. That's it. It's interesting, but it's not an attempt to generalize to all OSes. Again, performance was not a goal for the Singularity project, it was just an interesting side-effect that makes people like me think twice. :)
Brendan wrote:(and without mentioning RAM usage overheads anywhere at all)
Maybe because Singularity doesn't use a lot of RAM...? Why would this be worth mentioning if it wasn't a goal, or wasn't dramatically different than the other OSes being compared?
Brendan wrote:or by neglecting to mention that they've completely failed to find a sane way to implement swap space.
Heh... That's the first thing I look for when studying a new OS. :) But if you think about the larger issue, you should realize that virtual memory is just one implementation technique. In the end, what you want is to store more data than will fit in physical RAM. In other words, to treat RAM as a cache. In systems based on managed code, there are ways to achieve this in software (object persistence via serialization and others). I'm not saying it's necessarily practical, but no one will know for sure until a researcher somewhere tries it. ;)

I think you're confusing "research" with "marketing" and ignoring a lot of good ideas as a result.

---edit---

To tie this back to the original topic (sorry :P ) I think it limits the potential uses for a processor when it forces you to use certain features that you may not need. Then again, it will probably be a long time before anybody needs a 64-bit CPU in an embedded system. :)
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Multi-core CPUS

Post by Brendan »

Hi,
Colonel Kernel wrote:
Brendan wrote:It's far too tempting for researchers to end up with biased results
That's what peer review is for.
I'd prefer "non-peer review", where someone who isn't another researcher does the review. For example, did Microsoft's kernel designers look at Singularity's research papers and decide software isolation isn't worth implementing in future versions of Windows?
Colonel Kernel wrote:
Brendan wrote:like getting performance statistics from the most IPC intensive benchmark they can find (and not bothering to provide the same statistics from something that isn't IPC intensive)
That's L4, and I agree that it's pretty silly to try and extrapolate overall performance from just IPC performance.
Take a look at "Singularity: Rethinking the Software Stack" on page 45. See the nice bar graph labelled "Unsafe Code Tax" on the right which at first glance makes it look like hardware isolation has 37.7% more overhead. Now check the fine print and notice how they've gathered these statistics from "the WebFiles benchmark". They describe this benchmark as "The WebFiles benchmark is an I/O intensive benchmarks based on SPECweb99. It consists of three SIPs: a client which issues random file read requests across files with a Zipf distribution of file size, a file system, and a disk device driver.". IMHO in this case "I/O intensive" actually means "ping-pong" messaging and continuous task switching with very little processing.

Now see if you can find a similar bar graph showing overhead for CPU intensive workloads. I couldn't find it, which makes me wonder if they specifically chose an I/O intensive benchmark to skew the results in favour of software isolation. I also doubt it would've been difficult to compare this benchmark on Singularity (with software isolation) to the same benchmark running on something like L4 or even Linux, but then that might not have lead to impressive figures like 37.7%.
Colonel Kernel wrote:
Brendan wrote:or neglecting to mention how well the technique scales and only testing on single-CPU
Performance and scalability are not actually goals of the Singularity project. The primary goal was dependability. AFAIK scalability is not a goal for L4 either (although I'm sure the researchers responsible would like it to scale).
I couldn't find anything about software isolation being more or less dependable in any of their research papers (but I did find performance/overhead statistics), and I don't see how using hardware isolation as an additional form of protection (on top of the software isolation, safe programming languages, verification, etc) would reduce dependability (you can't have too much protection if you don't care about performance).

Regardless of what their stated goals are, the principle of "bang per buck" applies - once you've got software isolation you'd use it to find out if it's a viable alternative to hardware isolation; and you can't do that without considering scalability. Ignoring scalability might have been acceptable 10 years ago, but not now.

Here's a true story. Last week I went to a local computer shop and asked them for a quote for a new computer. I gave them a fairly specific list of requirements, including a 2.4 GHz triple-core AMD Phenom CPU (there's something about prime numbers I like, from an OS testing point of view). I went back today and guess what? They can't get any triple core CPUs anymore - the best they can do is a quad core (Phenom II X4). Single CPU systems are dead (unless you want a Intel Atom notebook, but I'm guessing manufacturers are just getting rid of old stock before pushing the newer models with dual core Atom 330s).
Colonel Kernel wrote:Memory management data structures are a pretty big source of contention in today's commercial OSes, BTW. Getting rid of at least some of them can't hurt.
While I'm being skeptical, let me assume that they didn't include statistics for scalability because they failed to remove contention in the memory management data structures (like their "exchange heap"). ;)

The research paper I linked to above does contain "4.3 Heterogeneous Multiprocessing". It's 4 paragraphs of speculation with no indication of any results (even though they've already implemented some interesting stuff in this area).
Colonel Kernel wrote:
Brendan wrote:or "disabling" software isolation by disabling generation of array bound and other checks in the compiler when they compile one process, without doing anything to remove software isolation overhead from the OS's memory management (garbage collector, exchange heap, etc) and then saying "software isolation only costed 5%"
Maybe you read something I didn't. The impression I got from the description of the experiment was that they disabled run-time checks for all generated code (including the 95% of the OS itself that is safe code), not just for a single process. The benchmarks they ran include many processes (web server, file system, network stack, etc.). If you have a source, I'd like to see it. Otherwise I call BS on this one.
I think we both read the same paper (linked to above), where the only thing they say (on page 46) is "By comparison, the runtime overhead for safe code is under 5% (measured by disabling generation of array bound and other checks in the compiler).". They don't say which pieces of code they disabled generation of array bound and other checks for (and for all I know, they disabled generation of array bound and other checks in the compiler, but didn't actually recompile anything and just used the old binaries ;) ).

Part of my point here is that disabling the array bound and other checks in the compiler won't suddenly convert one OS (where every design descision has been made knowing that software isolation will be used) into a completely different OS (where every design descision has been made knowing that hardware isolation will be used).
Colonel Kernel wrote:
Brendan wrote:or by claiming that paging costs 6.3% because of TLB misses, etc without implementing any of the things that use paging to reduce overhead
The claim, which is pretty straightforward, is that their own system ran 6.3% slower with paging enabled. That's it. It's interesting, but it's not an attempt to generalize to all OSes. Again, performance was not a goal for the Singularity project, it was just an interesting side-effect that makes people like me think twice. :)
["Objection! Assuming facts not in evidence Your Honor!"]

The massive performance loss (that they hide with biased, inconclusive and/or omitted benchmarks) is an interesting side effect? [-X
Colonel Kernel wrote:
Brendan wrote:(and without mentioning RAM usage overheads anywhere at all)
Maybe because Singularity doesn't use a lot of RAM...? Why would this be worth mentioning if it wasn't a goal, or wasn't dramatically different than the other OSes being compared?
It's worth mentioning because there's lots of tricks you can do with paging to reduce RAM usage, that can't be done without paging. Maybe they didn't mention RAM usage because they were embarrassed at just how much RAM Singularity wastes? ;)
Colonel Kernel wrote:
Brendan wrote:or by neglecting to mention that they've completely failed to find a sane way to implement swap space.
Heh... That's the first thing I look for when studying a new OS. :)
SMP support is the first thing I look for, followed by long mode support. Swap space support is probably about fifth on my list - not because swap space is important for me, but because it gives you an indication of how well built the OS is.

It's not just about swap space though. IMHO they're comparing "software isolation with all possible optimizations" to "hardware isolation with no optimizations" and their performance/overhead findings are worthless because of this.
Colonel Kernel wrote:But if you think about the larger issue, you should realize that virtual memory is just one implementation technique. In the end, what you want is to store more data than will fit in physical RAM. In other words, to treat RAM as a cache. In systems based on managed code, there are ways to achieve this in software (object persistence via serialization and others). I'm not saying it's necessarily practical, but no one will know for sure until a researcher somewhere tries it. ;)

I think you're confusing "research" with "marketing" and ignoring a lot of good ideas as a result.
After all the work these researchers have done, don't you think it'd be nice if you could read their research papers and find out the advantages and disadvantages of software isolation? An honest comparison?

As far as I can tell the research papers are marketting - just like marketting they don't mention disadvantages.

Maybe I'll become a researcher. I have a theory that if you restrict each process to 4 bytes of code, then the scheduler could create a single "mega binary" from all the applications by copying all the little applications into a single stream of instructions, and then you could JMP to this stream of instructions. Then I could make up a name for it ("Process splicing"?) and do some benchmarks ("round robin scheduler capable of billions of task switches per second, with zero cycles per task switch!") and ignore any little disadvantages... :D


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Re: Multi-core CPUS

Post by Colonel Kernel »

Maybe you should split this thread...
Brendan wrote:
Colonel Kernel wrote:That's what peer review is for.
I'd prefer "non-peer review", where someone who isn't another researcher does the review. For example, did Microsoft's kernel designers look at Singularity's research papers and decide software isolation isn't worth implementing in future versions of Windows?
Realistically, given the backwards-compatibility requirements for Windows, I think it's safe to say that its underlying architecture will never undergo such a radical change. However, there are people at Microsoft outside MSR who pay a lot of attention to Singularity. Unfortunately I cannot say more.
Brendan wrote:Take a look at "Singularity: Rethinking the Software Stack" on page 45. See the nice bar graph labelled "Unsafe Code Tax" on the right which at first glance makes it look like hardware isolation has 37.7% more overhead. Now check the fine print and notice how they've gathered these statistics from "the WebFiles benchmark". They describe this benchmark as "The WebFiles benchmark is an I/O intensive benchmarks based on SPECweb99. It consists of three SIPs: a client which issues random file read requests across files with a Zipf distribution of file size, a file system, and a disk device driver.". IMHO in this case "I/O intensive" actually means "ping-pong" messaging and continuous task switching with very little processing.

Now see if you can find a similar bar graph showing overhead for CPU intensive workloads. I couldn't find it, which makes me wonder if they specifically chose an I/O intensive benchmark to skew the results in favour of software isolation. I also doubt it would've been difficult to compare this benchmark on Singularity (with software isolation) to the same benchmark running on something like L4 or even Linux, but then that might not have lead to impressive figures like 37.7%.
CPU-intensive workloads are not interesting when trying to measure OS overhead, because in such workloads there is very little overhead.

As for comparing with L4 or Linux, that doesn't make sense for this benchmark, given the goals of the experiment (from section 4.2.1 of the doc you linked to, emphasis mine):
Singularity offers a unique opportunity to quantify the costs of hardware and software isolation in an apples-to-apples comparison. Once the costs are understood, individual systems can choose to use hardware isolation when its benefits outweigh the costs.
You are reading way too much into how the results are presented. I think I may be somewhat to blame here, since when I talk about Singularity on these forums I tend to focus a lot on software isolation, when in fact that is only a small part of what Singularity is all about (sealed process architecture, channel-based IPC, capability security model, declarative installation/configuration model, etc.).

Take the results at face value. They tell you exactly what was tested, and what the result was. They are not trying to convince you that software isolation is the Greatest Thing Since Sliced Bread.
Brendan wrote:I couldn't find anything about software isolation being more or less dependable in any of their research papers (but I did find performance/overhead statistics)
The dependability comes from the sealed process architecture, verifiable channel contracts, declarative dependency infrastructure, etc. Software isolation is how they initially chose to implement the system, but it is not a requirement for this architecture (your own OS already follows at least a few of these tenets, IIRC).

The point of research is to try new things and see what happens... to experiment. They had the chance to build an OS using managed code to find out what the consequences would be. They did it, and are learning from it. I happen to think it's a really neat idea. Don't confuse their goals with my interests. :)
Brendan wrote:and I don't see how using hardware isolation as an additional form of protection (on top of the software isolation, safe programming languages, verification, etc) would reduce dependability (you can't have too much protection if you don't care about performance).
Agreed. In fact, bewing pointed out in another thread last year the most compelling reason I've seen so far why hardware isolation is always going to be needed.
Brendan wrote:
Colonel Kernel wrote:Memory management data structures are a pretty big source of contention in today's commercial OSes, BTW. Getting rid of at least some of them can't hurt.
While I'm being skeptical, let me assume that they didn't include statistics for scalability because they failed to remove contention in the memory management data structures (like their "exchange heap"). ;)
Read about the implementation of channels and exchange heap messaging in this paper. AFAIK their implementation of IPC is entirely lock-free and should scale very well. Don't be surprised if concurrency is the next thing they're tackling....
Brendan wrote:
Colonel Kernel wrote:The claim, which is pretty straightforward, is that their own system ran 6.3% slower with paging enabled. That's it. It's interesting, but it's not an attempt to generalize to all OSes. Again, performance was not a goal for the Singularity project, it was just an interesting side-effect that makes people like me think twice. :)
["Objection! Assuming facts not in evidence Your Honor!"]

The massive performance loss (that they hide with biased, inconclusive and/or omitted benchmarks) is an interesting side effect? [-X
You're assuming at least as much as I am. If there is one thing I want you to get out of this conversation, it is this: I am way, waaaaay more biased about this topic than the Singularity researchers are. ;) Save your grain of salt for me, but please give them the benefit of the doubt. They are very smart people. :)
Brendan wrote:It's not just about swap space though. IMHO they're comparing "software isolation with all possible optimizations" to "hardware isolation with no optimizations" and their performance/overhead findings are worthless because of this.
It's not worthless at all, because this is the first time (to my knowledge) such a direct comparison has been done. It is helping decision-makers figure out what engineering trade-offs to make. Sure, the comparison scenario is not perfect, but neither is Singularity... It is a research OS, a prototype. The only way to get the comparison you want is to build a complete, commercial-grade system that can function either way (hw isolation or sw isolation, both fully optimized). That's too expensive just to satisfy your curiosity! ;)
Brendan wrote:After all the work these researchers have done, don't you think it'd be nice if you could read their research papers and find out the advantages and disadvantages of software isolation? An honest comparison?

As far as I can tell the research papers are marketting - just like marketting they don't mention disadvantages.
Sure they do, just not the disadvantages that you mention:
  • Reflection is not allowed, which requires a new compile-time template-like mechanism to be used instead.
  • Dynamic code generation is not allowed due to the sealed process invariant. This makes it difficult to implement dynamic and interpreted programming languages efficiently.
This is just off the top of my head. It is certainly not an exhaustive list, but all the facts are there for you to draw your own conclusions (which you have, although to be honest your choice to speculate on the researchers' motives rather than critique their specific technical claims is not doing much to convince me of your point of view).

Aside: Marketing has only one 't' in it. You've been making this mistake consistently for years. Sorry I didn't point it out sooner. :P

Brendan wrote:Maybe I'll become a researcher. I have a theory that if you restrict each process to 4 bytes of code, then the scheduler could create a single "mega binary" from all the applications by copying all the little applications into a single stream of instructions, and then you could JMP to this stream of instructions. Then I could make up a name for it ("Process splicing"?) and do some benchmarks ("round robin scheduler capable of billions of task switches per second, with zero cycles per task switch!") and ignore any little disadvantages... :D
Without researchers, our modern computing infrastructure (including the Internet that allows us to participate in this forum) wouldn't even exist, and then where would we be?
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Multi-core CPUS

Post by Solar »

Colonel Kernel wrote:Without researchers, our modern computing infrastructure (including the Internet that allows us to participate in this forum) wouldn't even exist, and then where would we be?
In a better world, I believe. ;-)
Every good solution is obvious once you've found it.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: Multi-core CPUS

Post by Brendan »

Hi,
Colonel Kernel wrote:
Brendan wrote:Take a look at "Singularity: Rethinking the Software Stack" on page 45. See the nice bar graph labelled "Unsafe Code Tax" on the right which at first glance makes it look like hardware isolation has 37.7% more overhead. Now check the fine print and notice how they've gathered these statistics from "the WebFiles benchmark". They describe this benchmark as "The WebFiles benchmark is an I/O intensive benchmarks based on SPECweb99. It consists of three SIPs: a client which issues random file read requests across files with a Zipf distribution of file size, a file system, and a disk device driver.". IMHO in this case "I/O intensive" actually means "ping-pong" messaging and continuous task switching with very little processing.

Now see if you can find a similar bar graph showing overhead for CPU intensive workloads. I couldn't find it, which makes me wonder if they specifically chose an I/O intensive benchmark to skew the results in favour of software isolation. I also doubt it would've been difficult to compare this benchmark on Singularity (with software isolation) to the same benchmark running on something like L4 or even Linux, but then that might not have lead to impressive figures like 37.7%.
CPU-intensive workloads are not interesting when trying to measure OS overhead, because in such workloads there is very little overhead.
For an OS that uses hardware isolation there's little overhead for CPU-intensive workloads (a little after a task switch, and a little for any kernel API calls and exceptions). For an OS that uses software isolation there's still array bounds tests, pointer checking, etc - basically all of the overhead is still there. I'd expect that hardware isolation would give better performance than software isolation for CPU-intensive workloads, especially if there's only one "ready to run" thread per CPU or there's some other reason why very little preemption occurs.

If hardware isolation does give better performance for CPU-intensive workloads and software isolation gives better performance for I/O intensive workloads, then which method gives better performance for typical/common workloads that are a mixture of I/O intensive and CPU intensive? At which point do you reach the "break even" point, where both methods give equal performance?

The basic question here is "when is software isolation better?". IMHO this question is the question all sane OS developers would ask, and this question is the question that researchers working with software isolation would be expected to answer.
Colonel Kernel wrote:As for comparing with L4 or Linux, that doesn't make sense for this benchmark, given the goals of the experiment (from section 4.2.1 of the doc you linked to, emphasis mine):
Singularity offers a unique opportunity to quantify the costs of hardware and software isolation in an apples-to-apples comparison. Once the costs are understood, individual systems can choose to use hardware isolation when its benefits outweigh the costs.
But it's not an apples-to-apples comparison. An apples-to-apples comparison would be comparing the performance of software isolation on an OS designed for software isolation to the performance of hardware isolation on an OS designed for hardware isolation. If someone added support for software isolation to an OS that was designed for hardware isolation (like Linux) and then claimed that software isolation sucks completely (because of the hacks, etc involved in making it work), then would you be convinced that software isolation sucks completely?

Once the costs are understood, individual systems can choose to use hardware isolation when its benefits outweigh the costs, it's just a pity that there's so little information to help anyone understand these costs and make an informed descision.
Colonel Kernel wrote:You are reading way too much into how the results are presented. I think I may be somewhat to blame here, since when I talk about Singularity on these forums I tend to focus a lot on software isolation, when in fact that is only a small part of what Singularity is all about (sealed process architecture, channel-based IPC, capability security model, declarative installation/configuration model, etc.).
I'm reading too much into how many results were ommitted and how benchmarks that were included were selected.

Part of the problem is that all of these things can be done (and probably have been done) without software isolation, and software isolation is the largest most visible difference in the Singularity project; and while the researches don't say "software isolation gives better performance" people see "37.7% slowdown for hardware isolation" (for a rare case under specific conditions) and tend to assume that software isolation improves performance in general.
Colonel Kernel wrote:
Brendan wrote:It's not just about swap space though. IMHO they're comparing "software isolation with all possible optimizations" to "hardware isolation with no optimizations" and their performance/overhead findings are worthless because of this.
It's not worthless at all, because this is the first time (to my knowledge) such a direct comparison has been done. It is helping decision-makers figure out what engineering trade-offs to make. Sure, the comparison scenario is not perfect, but neither is Singularity... It is a research OS, a prototype. The only way to get the comparison you want is to build a complete, commercial-grade system that can function either way (hw isolation or sw isolation, both fully optimized). That's too expensive just to satisfy your curiosity! ;)
What they've done is provided benchmarks that provide little useful information (or even misleading information) to help decision-makers make the wrong descisions.

They don't need to write 2 full-fledged OSs. How hard would it be to take an OS like Linux or Windows and disable all the things where paging has been used to improve performance or reduce RAM usage? If they did this and found that these optimizations reduce application startup time by 50%, make already started processes run 5% faster and makes the average process use 15% less RAM, then they'd be able to mention this to give their own "unsafe code tax" statistics some perspective.

How hard would it be for them to provide benchmarks for a CPU intensive workload?

How hard would it be to run their WebFiles benchmark on another OS (Windows, Linux) and compare the performance to the performance they get running the benchmark on Singularity?

All of these things would help decision-makers make better (more informed) descisions.

So, let me ask you the most important (IMHO) questions that the Singularity research should have already answered. Does software isolation provide better performance than hardware isolation in all cases? If the answer is "no", then in which cases does software isolation provide better performance? Why?


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Re: Multi-core CPUS

Post by Colonel Kernel »

Brendan wrote:For an OS that uses hardware isolation there's little overhead for CPU-intensive workloads (a little after a task switch, and a little for any kernel API calls and exceptions). For an OS that uses software isolation there's still array bounds tests, pointer checking, etc - basically all of the overhead is still there. I'd expect that hardware isolation would give better performance than software isolation for CPU-intensive workloads, especially if there's only one "ready to run" thread per CPU or there's some other reason why very little preemption occurs.
Your expectation matches the experimental results from section 4.4 of "Deconstructing Process Isolation":
The run-time overhead of language safety is slightly higher than hardware isolation for the Bartok benchmark
The Bartok benchmark is very CPU-intensive.

This tradeoff exists today even in conventional OSes. If you write code in a managed language and do a lot of array accesses in a tight loop, you're going to get lousy performance. One of the major advances in programming languages that will deal with this problem is dependent types, which allow for invariants that relate types to run-time values (e.g. -- instead of int[] as an array of ints of unknown size, you could have a type like int[](length==n) where n is a run-time parameter, variable, or expression). Forbidding dynamic loading makes this approach much more tractable since all the code is available to the compiler to do inter-procedural optimizations.
Brendan wrote:If hardware isolation does give better performance for CPU-intensive workloads and software isolation gives better performance for I/O intensive workloads, then which method gives better performance for typical/common workloads that are a mixture of I/O intensive and CPU intensive? At which point do you reach the "break even" point, where both methods give equal performance?
The nice thing about building a system that supports both isolation mechanisms is that you could leave it up to the application developers to decide based on their own benchmarks.
Brendan wrote:The basic question here is "when is software isolation better?". IMHO this question is the question all sane OS developers would ask, and this question is the question that researchers working with software isolation would be expected to answer.
It's not up to the researchers to answer this question, because it is an engineering question, not a research question. Ask a researcher what's possible; ask an engineer what's best for a given set of circumstances.
Brendan wrote:But it's not an apples-to-apples comparison. An apples-to-apples comparison would be comparing the performance of software isolation on an OS designed for software isolation to the performance of hardware isolation on an OS designed for hardware isolation.
A good experiment varies as few variables as possible at a time. Comparing two radically different OSes introduces too many variables to reach a meaningful conclusion. I think you're still missing the point of the experiment...
Brendan wrote:If someone added support for software isolation to an OS that was designed for hardware isolation (like Linux) and then claimed that software isolation sucks completely (because of the hacks, etc involved in making it work), then would you be convinced that software isolation sucks completely?
No, but no one is claiming that hardware isolation sucks completely either. :)
Brendan wrote:Once the costs are understood, individual systems can choose to use hardware isolation when its benefits outweigh the costs, it's just a pity that there's so little information to help anyone understand these costs and make an informed descision.
That's because it's early days. This HIP/SIP comparison was only done a little under three years ago.
Brendan wrote:
Colonel Kernel wrote:It's not worthless at all, because this is the first time (to my knowledge) such a direct comparison has been done. It is helping decision-makers figure out what engineering trade-offs to make. Sure, the comparison scenario is not perfect, but neither is Singularity... It is a research OS, a prototype. The only way to get the comparison you want is to build a complete, commercial-grade system that can function either way (hw isolation or sw isolation, both fully optimized). That's too expensive just to satisfy your curiosity! ;)
What they've done is provided benchmarks that provide little useful information (or even misleading information) to help decision-makers make the wrong descisions.
So... engineers might make the wrong choice and waste some time and money following a blind alley? Do you think any OS dev organization worth its salt will jump head-first into something like this without doing some experiments of their own first? I think this situation looks a lot different to independent OS developers with small-to-zero budgets, versus Microsoft and its annual R&D budget. ;) Guess who benefits most from MSR...
Brendan wrote:They don't need to write 2 full-fledged OSs. How hard would it be to take an OS like Linux or Windows and disable all the things where paging has been used to improve performance or reduce RAM usage? If they did this and found that these optimizations reduce application startup time by 50%, make already started processes run 5% faster and makes the average process use 15% less RAM, then they'd be able to mention this to give their own "unsafe code tax" statistics some perspective.
This would be interesting, but it wouldn't mean much in terms of a comparison to software isolation. There are just too many variables, like the effect of "tree shaking" on RAM usage and app startup time, for example.
Brendan wrote:How hard would it be for them to provide benchmarks for a CPU intensive workload?
They did, it's just in a different paper (see above).
Brendan wrote:How hard would it be to run their WebFiles benchmark on another OS (Windows, Linux) and compare the performance to the performance they get running the benchmark on Singularity?
I don't know, because I'm not sure whether WebFiles was developed exclusively for Singularity or not. It's a question of cost/benefit, and like I said, the intent was to compare different configurations of Singularity, not to compare Singularity to other OSes. Singularity is not a commerical OS, so this would be largely pointless.
Brendan wrote:So, let me ask you the most important (IMHO) questions that the Singularity research should have already answered. Does software isolation provide better performance than hardware isolation in all cases? If the answer is "no", then in which cases does software isolation provide better performance? Why?
The answer is "no", it is already in one of the papers (see above), and I think in general the answer I would give is that SIPs are faster in situations where processes communicate a lot with each other, like for I/O.

Let me give you another example that demonstrates why this research and the HIP/SIP choice is about much more than performance. Consider this comparison: A web browser written in an unsafe language loading plug-ins as dynamic libraries, versus a managed browser that loads its plug-ins in SIPs in the same domain (address space) and communicates with them via zero-copy IPC. That would make for a very interesting experiment. If it's possible to achieve a comparable level of performance with the SIP-based approach (not even necessarily as fast, just close enough to be barely noticeable by end users), then the security benefits of going with separate processes IMO far outweigh the performance advantage of using unsafe code.
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
User avatar
bewing
Member
Member
Posts: 1401
Joined: Wed Feb 07, 2007 1:45 pm
Location: Eugene, OR, US

Re: Multi-core CPUS

Post by bewing »

Colonel Kernel wrote: ... A web browser written in an unsafe language loading plug-ins as dynamic libraries, versus a managed browser that loads its plug-ins in SIPs in the same domain (address space) and communicates with them via zero-copy IPC. That would make for a very interesting experiment. If it's possible to achieve a comparable level of performance with the SIP-based approach (not even necessarily as fast, just close enough to be barely noticeable by end users), then the security benefits of going with separate processes IMO far outweigh the performance advantage of using unsafe code.
For a theoretical comparison, that's fine ... but in reality that would not work, for exactly the reason you quoted of mine. You can't eliminate the hardware isolation function for any process, by merging the address spaces, or you'll crash.
User avatar
Colonel Kernel
Member
Member
Posts: 1437
Joined: Tue Oct 17, 2006 6:06 pm
Location: Vancouver, BC, Canada
Contact:

Re: Multi-core CPUS

Post by Colonel Kernel »

bewing wrote:
Colonel Kernel wrote: ... A web browser written in an unsafe language loading plug-ins as dynamic libraries, versus a managed browser that loads its plug-ins in SIPs in the same domain (address space) and communicates with them via zero-copy IPC. That would make for a very interesting experiment. If it's possible to achieve a comparable level of performance with the SIP-based approach (not even necessarily as fast, just close enough to be barely noticeable by end users), then the security benefits of going with separate processes IMO far outweigh the performance advantage of using unsafe code.
For a theoretical comparison, that's fine ... but in reality that would not work, for exactly the reason you quoted of mine. You can't eliminate the hardware isolation function for any process, by merging the address spaces, or you'll crash.
No, I don't think so. :) Your original argument was that an OS based entirely on SIPs is not actually safe because of the effects of transient hardware failures, and I agree with you. However, using hardware isolation doesn't eliminate these failures, it just helps to contain them to a single process (unless the MMU malfunctions, but that's less likely than a single failure anywhere). Since HIPs turn hardware errors that would otherwise cause random memory corruption into crashes, if such errors were as common as you think, wouldn't our current OSes be crashing all the time?

The scenario I described above doesn't assume that the browser itself runs in ring 0 or is in the same address space as the kernel or the rest of the OS. It assumes that the browser shares an address space with its plug-ins. This is already true for popular browsers running on commercial OSes today.
Top three reasons why my OS project died:
  1. Too much overtime at work
  2. Got married
  3. My brain got stuck in an infinite loop while trying to design the memory manager
Don't let this happen to you!
Post Reply