OSDev.org

Posted: **Wed Mar 23, 2011 2:16 pm**

berkus wrote:Desktop is a dying breed.

An good OS is not bound to a specific platform but only restricted by limited hardware support.

Posted: **Wed Mar 23, 2011 2:27 pm**

Hi,

lemonyii wrote:At the end of the segment war, why don't we continue discussing the OS performance?

Ok. Here's a list of things that I suspect may be causing some performance loss on modern OSs (Windows, OS X, Linux, etc), in no particular order:

Anti-virus: Checking files for signatures/patterns, hooking APIs to add extra checking, etc.
Graphics: Higher resolutions, anti-aliased scaled fonts, alpha blending, desktop effects and animations, etc
Search: Generating and maintaining indexes to improve file search speeds
Internationalisation: For example, can't just print a string (you might have to load and parse a file for the current locale during application startup and find the string you want from that) and you can't just print a number (you have to figure decide whether to do "123 456.78" or "123,456.78" or "123.456,78"). You also need to support things like right-to-left text, unicode canonicalization, conversions between character encodings, etc.
Misplaced Optimisation: There's a difference between throughput and latency. Some OSs (Linux) tend to optimise for throughput at the expense of latency, which can make the OS feel sluggish to the end user (e.g. applications that take longer to respond to user input)
Ineffective Optimisation: Some of the mechanisms used by some OSs that are intended to improve performance aren't as effective as they should be. This can include "over simplistic" algorithms to determine which page/s to send to swap, which data to pre-fetch from disk, etc.
Lacking Optimisation: Some OSs are still trying to catch up with changes in modern systems, like multi-core and NUMA, and things like schedulers and memory management still aren't designed for these things.
Serialised startup: During boot, most OSs will start device drivers one at a time, then start services (network stack, file systems, daemons, etc) one at a time. Most device drivers need delays during initialisation, things like starting networking and starting file systems need to wait for IO, and modern hardware (especially multi-core/multi-CPU) isn't being used effectively.
DLLs/shared libaries: They have advantages (faster software development, easier code maintenance) but they have disadvantages too (slower process startup, run-time overhead)
Development time: For user-space, for most modern software the emphasis is on reducing development time. This includes using using less error-prone languages and also spending less time profiling/optimising the code.
Scalability: It's still a major problem. Lots of software is still "single threaded" (and I suspect that a lot of software that is multi-threaded isn't as well designed as it could be).
Lack of prioritised asynchronous IO: While most (all?) OSs support asynchronous IO and most OSs support prioritised IO; application programmers rarely use it to improve performance. This is partly because the APIs being used are obsolete (e.g. "read()") and/or too messy (POSIX asynchronous IO) and/or inadequate (still no "aio_open()"?).
Lack of thread priorities: Some OSs are stupidly brain-dead when it comes to thread priorities (Linux). POSIX has no clear (portable) definition for thread priorities either, so portable applications designed for (Unix-like) OSs tend not to use thread priorities when they are run on an OS that does support it properly (e.g. FreeBSD, Solaris).

I should point out that I haven't done any benchmarking or profiling or other tests to determine if any of the things on this list actually are causing performance loss (and/or to quantify how much performance loss on which OS/s). Also, it should be fairly obvious that a lot of the things on this list don't apply to all OSs; and some are kernel-space issues and some are user-space issues.

Cheers,

Brendan

Posted: **Wed Mar 23, 2011 4:17 pm**

gravaera wrote:
JamesM wrote:
* Portable designs cannot use segmentation
* Portable designs are bloated with endian-issues
* Portable designs are always "every possible platform must support this feature".
Segmentation is slow. It's deprecated, and Intel doesn't design its chips to be fast with it any more. It optimises for paging, with a flat memory model. The small exception is TLS and use of swapgs to get into the kernel.
Segmentation exists in x86-32 and is implemented quite nicely: there's an internal cache in the CPU that is reloaded on segment reload, so looking up segment descriptors and calculating offsets isn't slow at all. There's not much more optimization they can do. On die caching is as close to "the fastest possible lookup" as you can get.

Just because it is on-die doesn't mean it is optimised. Sometimes quite the opposite. Chips are optimised for the hot paths - the hot components are designed to be close together to maximise throughput and minimise latency - components that are used once and never again (such as in bootup) are designed so that they share resources with other non-critical components.

Intel don't optimise for their segmentation system because they've deprecated it. It is not fast. For all you know they could shift it off into the northbridge next rev...

Little-endian is the de facto standard; unless you're writing an RTOS to run in routers, you'll be using little endian.
I'm not very sure about that: at least one thing makes it probably a good idea to use a big endian architecture for hardcore networking: the fact that IP's stack of protocols are big endian encoded. At least from a pragmatic point of view, a big endian processor has the chance to shave a lot of cycles on each network transmission.

As I said - unless you're writing a networking operating system (which by definition will be real-time, otherwise why are you writing a networking operating system?)

Posted: **Wed Mar 23, 2011 4:21 pm**

@JamesM: thanks for the info, yo: always useful to have someone in the actual business clarify things

--All the best
gravaera

Posted: **Wed Mar 23, 2011 4:56 pm**

gravaera wrote:@JamesM: thanks for the info, yo: always useful to have someone in the actual business clarify things

--All the best
gravaera

I'm always a bit wary about saying where I work because of this exactly - I'm not a chip designer or hardware engineer; I'm a compiler engineer so I don't know a *huge* amount more than normal about hardware design. Especially Intel's.

Not that I think I'm wrong, I just don't want you to think I'm using an Industry Hammer to back my own opinions and thoughts up. I'm not

Posted: **Wed Mar 23, 2011 10:07 pm**

We are also willing to put up with extra overhead for protection. Safe-guards such as ASLR (can't remember the acronym at the moment), which is a Randomized Virtual Address Scheme. The code/data/stack segments are randomized at runtime. This greatly deters hackers from exploiting buffer overflows, since they won't know where there smuggled code is sitting anymore. I think, there are still ways around this though (return 2 relative).

Posted: **Thu Mar 24, 2011 1:43 am**

That´s a good example (ASLR), with segmentation (if you would use it right) you wouldn´t need something like that and you also wouldn´t need the NX bit. I think that it would speed up the things a little (but I have no proof just some feeling

).

Posted: **Thu Mar 24, 2011 5:30 am**

That´s a good example (ASLR), with segmentation (if you would use it right) you wouldn´t need something like that and you also wouldn´t need the NX bit. I think that it would speed up the things a little (but I have no proof just some feeling ).

Not quite - Address space randomisation is to remove the ability to predict where a particular piece of code is. Segmentation would actually increase this dramatically, as a piece of code will always be in the same position, relative to a segment selector.

Unless you plan to have some sort of fine-grained access rights about which code from which selector can access which data/code from another?

Posted: **Thu Mar 24, 2011 5:48 am**

I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?

Posted: **Thu Mar 24, 2011 11:31 am**

FlashBurn wrote:I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?

Exactly. In a flat memory model, the code-segment can be written from the DS selector, unless the OS maps the pages in the image as read-only, which is not always the case. In a segmented memory model, the application cannot write to it's own code segments because there exists no selector with write access (a code selector is at most readable).

Posted: **Thu Mar 24, 2011 3:17 pm**

FlashBurn wrote:I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?

Two things:

(1) Malicious code execution isn't the only type of exploit. Data mining can be just as harmful, and you won't stop that with just read/write permissions (these are implemented in paging anyway, as well as no-execute, so I don't see what you're gaining here). ASR is implemented on top of permissions based paging systems precisely for this reason. The only advantage you're gaining with segmentation, again, is having protection on a subpage boundary.

(2) Self modifying code needs to change its code.

Posted: **Thu Mar 24, 2011 3:27 pm**

What do you mean with data mining? And how would ASR help there?

Posted: **Thu Mar 24, 2011 3:33 pm**

FlashBurn wrote:What do you mean with data mining? And how would ASR help there?

Reading data you shouldn't be able to. Essentially guessing the location of a sensitive buffer and reading the contents.

ASR helps by randomising the location of that buffer.

Posted: **Thu Mar 24, 2011 3:41 pm**

Yeah, but for reading some data you want you need to execute your code, don´t you? And to use this data you would need to send it to somewhere? So how will you do that, if you can´t insert your code and run it?

Posted: **Thu Mar 24, 2011 4:03 pm**

FlashBurn wrote:Yeah, but for reading some data you want you need to execute your code, don´t you? And to use this data you would need to send it to somewhere? So how will you do that, if you can´t insert your code and run it?

This assumes you've already found a runnable-code exploit. Or you could rewrite a pointer so the subject program does the read for you.

OSDev.org

what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?