what's the real SLOW parts in popular OS/OS theories?

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
User avatar
os64dev
Member
Member
Posts: 553
Joined: Sat Jan 27, 2007 3:21 pm
Location: Best, Netherlands

Re: what's the real SLOW parts in popular OS/OS theories?

Post by os64dev »

berkus wrote:Desktop is a dying breed.
An good OS is not bound to a specific platform but only restricted by limited hardware support.
Author of COBOS
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re: what's the real SLOW parts in popular OS/OS theories?

Post by Brendan »

Hi,
lemonyii wrote:At the end of the segment war, why don't we continue discussing the OS performance?
Ok. Here's a list of things that I suspect may be causing some performance loss on modern OSs (Windows, OS X, Linux, etc), in no particular order:
  • Anti-virus: Checking files for signatures/patterns, hooking APIs to add extra checking, etc.
  • Graphics: Higher resolutions, anti-aliased scaled fonts, alpha blending, desktop effects and animations, etc
  • Search: Generating and maintaining indexes to improve file search speeds
  • Internationalisation: For example, can't just print a string (you might have to load and parse a file for the current locale during application startup and find the string you want from that) and you can't just print a number (you have to figure decide whether to do "123 456.78" or "123,456.78" or "123.456,78"). You also need to support things like right-to-left text, unicode canonicalization, conversions between character encodings, etc.
  • Misplaced Optimisation: There's a difference between throughput and latency. Some OSs (Linux) tend to optimise for throughput at the expense of latency, which can make the OS feel sluggish to the end user (e.g. applications that take longer to respond to user input)
  • Ineffective Optimisation: Some of the mechanisms used by some OSs that are intended to improve performance aren't as effective as they should be. This can include "over simplistic" algorithms to determine which page/s to send to swap, which data to pre-fetch from disk, etc.
  • Lacking Optimisation: Some OSs are still trying to catch up with changes in modern systems, like multi-core and NUMA, and things like schedulers and memory management still aren't designed for these things.
  • Serialised startup: During boot, most OSs will start device drivers one at a time, then start services (network stack, file systems, daemons, etc) one at a time. Most device drivers need delays during initialisation, things like starting networking and starting file systems need to wait for IO, and modern hardware (especially multi-core/multi-CPU) isn't being used effectively.
  • DLLs/shared libaries: They have advantages (faster software development, easier code maintenance) but they have disadvantages too (slower process startup, run-time overhead)
  • Development time: For user-space, for most modern software the emphasis is on reducing development time. This includes using using less error-prone languages and also spending less time profiling/optimising the code.
  • Scalability: It's still a major problem. Lots of software is still "single threaded" (and I suspect that a lot of software that is multi-threaded isn't as well designed as it could be).
  • Lack of prioritised asynchronous IO: While most (all?) OSs support asynchronous IO and most OSs support prioritised IO; application programmers rarely use it to improve performance. This is partly because the APIs being used are obsolete (e.g. "read()") and/or too messy (POSIX asynchronous IO) and/or inadequate (still no "aio_open()"?).
  • Lack of thread priorities: Some OSs are stupidly brain-dead when it comes to thread priorities (Linux). POSIX has no clear (portable) definition for thread priorities either, so portable applications designed for (Unix-like) OSs tend not to use thread priorities when they are run on an OS that does support it properly (e.g. FreeBSD, Solaris).
I should point out that I haven't done any benchmarking or profiling or other tests to determine if any of the things on this list actually are causing performance loss (and/or to quantify how much performance loss on which OS/s). Also, it should be fairly obvious that a lot of the things on this list don't apply to all OSs; and some are kernel-space issues and some are user-space issues.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: what's the real SLOW parts in popular OS/OS theories?

Post by JamesM »

gravaera wrote:
JamesM wrote:
* Portable designs cannot use segmentation
* Portable designs are bloated with endian-issues
* Portable designs are always "every possible platform must support this feature".
Segmentation is slow. It's deprecated, and Intel doesn't design its chips to be fast with it any more. It optimises for paging, with a flat memory model. The small exception is TLS and use of swapgs to get into the kernel.
Segmentation exists in x86-32 and is implemented quite nicely: there's an internal cache in the CPU that is reloaded on segment reload, so looking up segment descriptors and calculating offsets isn't slow at all. There's not much more optimization they can do. On die caching is as close to "the fastest possible lookup" as you can get.
Just because it is on-die doesn't mean it is optimised. Sometimes quite the opposite. Chips are optimised for the hot paths - the hot components are designed to be close together to maximise throughput and minimise latency - components that are used once and never again (such as in bootup) are designed so that they share resources with other non-critical components.

Intel don't optimise for their segmentation system because they've deprecated it. It is not fast. For all you know they could shift it off into the northbridge next rev...
Little-endian is the de facto standard; unless you're writing an RTOS to run in routers, you'll be using little endian.
I'm not very sure about that: at least one thing makes it probably a good idea to use a big endian architecture for hardcore networking: the fact that IP's stack of protocols are big endian encoded. At least from a pragmatic point of view, a big endian processor has the chance to shave a lot of cycles on each network transmission.
As I said - unless you're writing a networking operating system (which by definition will be real-time, otherwise why are you writing a networking operating system?)
User avatar
gravaera
Member
Member
Posts: 737
Joined: Tue Jun 02, 2009 4:35 pm
Location: Supporting the cause: Use \tabs to indent code. NOT \x20 spaces.

Re: what's the real SLOW parts in popular OS/OS theories?

Post by gravaera »

@JamesM: thanks for the info, yo: always useful to have someone in the actual business clarify things

--All the best
gravaera
17:56 < sortie> Paging is called paging because you need to draw it on pages in your notebook to succeed at it.
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: what's the real SLOW parts in popular OS/OS theories?

Post by JamesM »

gravaera wrote:@JamesM: thanks for the info, yo: always useful to have someone in the actual business clarify things

--All the best
gravaera
I'm always a bit wary about saying where I work because of this exactly - I'm not a chip designer or hardware engineer; I'm a compiler engineer so I don't know a *huge* amount more than normal about hardware design. Especially Intel's.

Not that I think I'm wrong, I just don't want you to think I'm using an Industry Hammer to back my own opinions and thoughts up. I'm not :)
User avatar
xfelix
Member
Member
Posts: 25
Joined: Fri Feb 18, 2011 5:40 pm

Re: what's the real SLOW parts in popular OS/OS theories?

Post by xfelix »

We are also willing to put up with extra overhead for protection. Safe-guards such as ASLR (can't remember the acronym at the moment), which is a Randomized Virtual Address Scheme. The code/data/stack segments are randomized at runtime. This greatly deters hackers from exploiting buffer overflows, since they won't know where there smuggled code is sitting anymore. I think, there are still ways around this though (return 2 relative).
The More I See, The More I See There Is To See!
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: what's the real SLOW parts in popular OS/OS theories?

Post by FlashBurn »

That´s a good example (ASLR), with segmentation (if you would use it right) you wouldn´t need something like that and you also wouldn´t need the NX bit. I think that it would speed up the things a little (but I have no proof just some feeling ;) ).
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: what's the real SLOW parts in popular OS/OS theories?

Post by JamesM »

That´s a good example (ASLR), with segmentation (if you would use it right) you wouldn´t need something like that and you also wouldn´t need the NX bit. I think that it would speed up the things a little (but I have no proof just some feeling ).
Not quite - Address space randomisation is to remove the ability to predict where a particular piece of code is. Segmentation would actually increase this dramatically, as a piece of code will always be in the same position, relative to a segment selector.

Unless you plan to have some sort of fine-grained access rights about which code from which selector can access which data/code from another? ;)
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: what's the real SLOW parts in popular OS/OS theories?

Post by FlashBurn »

I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?
rdos
Member
Member
Posts: 3310
Joined: Wed Oct 01, 2008 1:55 pm

Re: what's the real SLOW parts in popular OS/OS theories?

Post by rdos »

FlashBurn wrote:I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?
Exactly. In a flat memory model, the code-segment can be written from the DS selector, unless the OS maps the pages in the image as read-only, which is not always the case. In a segmented memory model, the application cannot write to it's own code segments because there exists no selector with write access (a code selector is at most readable).
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: what's the real SLOW parts in popular OS/OS theories?

Post by JamesM »

FlashBurn wrote:I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?
Two things:

(1) Malicious code execution isn't the only type of exploit. Data mining can be just as harmful, and you won't stop that with just read/write permissions (these are implemented in paging anyway, as well as no-execute, so I don't see what you're gaining here). ASR is implemented on top of permissions based paging systems precisely for this reason. The only advantage you're gaining with segmentation, again, is having protection on a subpage boundary.

(2) Self modifying code needs to change its code.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: what's the real SLOW parts in popular OS/OS theories?

Post by FlashBurn »

What do you mean with data mining? And how would ASR help there?
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: what's the real SLOW parts in popular OS/OS theories?

Post by JamesM »

FlashBurn wrote:What do you mean with data mining? And how would ASR help there?
Reading data you shouldn't be able to. Essentially guessing the location of a sensitive buffer and reading the contents.

ASR helps by randomising the location of that buffer.
FlashBurn
Member
Member
Posts: 313
Joined: Fri Oct 20, 2006 10:14 am

Re: what's the real SLOW parts in popular OS/OS theories?

Post by FlashBurn »

Yeah, but for reading some data you want you need to execute your code, don´t you? And to use this data you would need to send it to somewhere? So how will you do that, if you can´t insert your code and run it?
User avatar
JamesM
Member
Member
Posts: 2935
Joined: Tue Jul 10, 2007 5:27 am
Location: York, United Kingdom
Contact:

Re: what's the real SLOW parts in popular OS/OS theories?

Post by JamesM »

FlashBurn wrote:Yeah, but for reading some data you want you need to execute your code, don´t you? And to use this data you would need to send it to somewhere? So how will you do that, if you can´t insert your code and run it?
This assumes you've already found a runnable-code exploit. Or you could rewrite a pointer so the subject program does the read for you.
Post Reply