what's the real SLOW parts in popular OS/OS theories?

lemonyii · Post by **lemonyii** » Thu Mar 24, 2011 7:21 pm

FlashBurn wrote:I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?

that's a stupid idea. if you code is unsafe, even everything is separated, it will overwrite something as well, so what's your definition of safe? not to overwrite code and return addresses, but other data?
and as to the protection, page is enough. privileged/unprivileged, read/write, isn't that enough or should an OS take the responsibility to ensure every code is safe? that's some thing like, OS is a country, which will keep you from killing others and access government secrets, but it will not keep you from suicide.
and as to MY TOPIC, this really has no enough efficient on performance as your discuss does, right? so if continue, discuss some performance topics and thank you.

j8long · Post by **j8long** » Thu Mar 24, 2011 9:30 pm

bad code achieve it.

stranger · Post by **stranger** » Fri Mar 25, 2011 11:27 am

xfelix wrote:The code/data/stack segments are randomized at runtime.

Hi, can you provide example of systems that are using runtime randomization, I am interested in implementation details (pointers update algorithm, how often is it randomized, overhead cost, etc.). For example PaX afik is using only load time randomization, making demons that fork() vulnerable to bruteforce attacks, but I suppose that they have good reason (overhead, code limitation) not to implement runtime randomization.

Owen · Post by **Owen** » Fri Mar 25, 2011 9:15 pm

FlashBurn wrote:I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?

In the age of non-executable stacks, ASLR's main purpose is defense against 'Return-to-Libc' and similar attacks: Attacks where you craft a backwards stack frame which invokes methods of libc (or some other system library) of your choosing. This is a pretty trivial process on x86; significantly less so on amd64 and other architectures which have adopted register-based calling conventions.

I would say that segmentation makes things worse here: If you do randomize the segments, you still have less room to do so, and a segmentation based OS is likely to have some pre-defined special entry points (e.g. call gates) with static numbers.

Brendan wrote:

DLLs/shared libaries: They have advantages (faster software development, easier code maintenance) but they have disadvantages too (slower process startup, run-time overhead)

I'd probably say this is actually a net win; while it adds runtime overhead, it provides a significant reduction in cache pressure. I'm inclined to believe Apple here when they say that compiling with optimization for size produces the best performance for most of their system libraries.

Brendan wrote:

Lack of prioritised asynchronous IO: While most (all?) OSs support asynchronous IO and most OSs support prioritised IO; application programmers rarely use it to improve performance. This is partly because the APIs being used are obsolete (e.g. "read()") and/or too messy (POSIX asynchronous IO) and/or inadequate (still no "aio_open()"?).

Also on many systems (E.G. Linux) support for AIO is limited to verging on non-existant (Kernel support is minimal; userspace support is emualted using threads). POSIX only supports waiting on file descriptors traditionally and AIO using (realtime) signals and select; realtime signals are not widely implemented, and signals themselves have problems in general.

This is an area where POSIX could learn a lot from Windows NT 3...

Brendan wrote:

Lack of thread priorities: Some OSs are stupidly brain-dead when it comes to thread priorities (Linux). POSIX has no clear (portable) definition for thread priorities either, so portable applications designed for (Unix-like) OSs tend not to use thread priorities when they are run on an OS that does support it properly (e.g. FreeBSD, Solaris).

See my comment for AIO...

rdos · Post by **rdos** » Sat Mar 26, 2011 2:26 am

Owen wrote:
FlashBurn wrote:I would say code segments are read-only and you can´t execute data segments. So where is the problem? Another idea would be do make 2 stack segments, one for the call graph and one for the data, so you can´t manipulate the return address, should be safe, shouldn´t it?
In the age of non-executable stacks, ASLR's main purpose is defense against 'Return-to-Libc' and similar attacks: Attacks where you craft a backwards stack frame which invokes methods of libc (or some other system library) of your choosing. This is a pretty trivial process on x86; significantly less so on amd64 and other architectures which have adopted register-based calling conventions.

* OpenWatcom supports register-based calling conventions for 32-bit targets
* I don't see the problem with returning to libc. Sane designs does not depend on a user-level library as a protection enforcer.

Owen wrote:I would say that segmentation makes things worse here: If you do randomize the segments, you still have less room to do so, and a segmentation based OS is likely to have some pre-defined special entry points (e.g. call gates) with static numbers.

* You cannot return to a call gate
* A sane call gate design can handle applications calling random gates, just like a centralized entry-point to kernel must support applications calling random functions. There is no difference here.
* Call gates IS the trusted API, and as such can be (but need to be) encapsulated into libc and/or other libraries.
* I don't use static call gates. I allocate GDT selectors for call gates on first use of an API function by an application. The static numbers are the gate-numbers, not the call gate selectors.
* When call gates are used in flat mode applications, there basically is no far calls (e.g. call gate access) that is not part of the API definition because flat mode compilers do not generate such code.

Owen wrote:
Brendan wrote:

Lack of prioritised asynchronous IO: While most (all?) OSs support asynchronous IO and most OSs support prioritised IO; application programmers rarely use it to improve performance. This is partly because the APIs being used are obsolete (e.g. "read()") and/or too messy (POSIX asynchronous IO) and/or inadequate (still no "aio_open()"?).
Also on many systems (E.G. Linux) support for AIO is limited to verging on non-existant (Kernel support is minimal; userspace support is emualted using threads). POSIX only supports waiting on file descriptors traditionally and AIO using (realtime) signals and select; realtime signals are not widely implemented, and signals themselves have problems in general.

This is an area where POSIX could learn a lot from Windows NT 3...

IMO, both the POSIX and Win32 AIO is too complex, non-portable, and does not offer any advantage over creating more threads instead. RDOS has no AIO-interface, as I don't think this is a good idea. If you need parallell IO, create a new thread and do the IO there instead. AIO is for mono-tasking.

OTOH, I have multiwait support for some objects that usually are encapsulated onto the read/write functions in POSIX/Win32. I therefore can setup multiple waits for sockets, com-ports and keyboard input that RDOS does not regard as "files". But I do not support waiting for real file-IO to complete.

JamesM · Post by **JamesM** » Sat Mar 26, 2011 4:18 am

* I don't see the problem with returning to libc. Sane designs does not depend on a user-level library as a protection enforcer.

It's not about protection in the priveleged sense (getting access to kernel data or code) - it's about data mining, harvesting, and taking control of the user-space app.

FlashBurn · Post by **FlashBurn** » Sat Mar 26, 2011 3:00 pm

As I said if you would/could use a separate stack for data and return-addresses you would not have such problems (but others

).

A simpler solution would be if you could say where the return address of the current frame is and that the cpu would fire an exception if one tries to overwrite it (this would also work for paging w/o segmentation).

And many security solutions are not good for performance.

Brendan · Post by **Brendan** » Sat Mar 26, 2011 7:18 pm

Hi,

rdos wrote:
Owen wrote:
Brendan wrote:

Lack of prioritised asynchronous IO: While most (all?) OSs support asynchronous IO and most OSs support prioritised IO; application programmers rarely use it to improve performance. This is partly because the APIs being used are obsolete (e.g. "read()") and/or too messy (POSIX asynchronous IO) and/or inadequate (still no "aio_open()"?).
Also on many systems (E.G. Linux) support for AIO is limited to verging on non-existant (Kernel support is minimal; userspace support is emualted using threads). POSIX only supports waiting on file descriptors traditionally and AIO using (realtime) signals and select; realtime signals are not widely implemented, and signals themselves have problems in general.

This is an area where POSIX could learn a lot from Windows NT 3...
IMO, both the POSIX and Win32 AIO is too complex, non-portable, and does not offer any advantage over creating more threads instead. RDOS has no AIO-interface, as I don't think this is a good idea. If you need parallell IO, create a new thread and do the IO there instead. AIO is for mono-tasking.

For an example, imagine an image viewer. There's a directory containing 50 photos; and the user clicks on one of the photos. However, you know it's likely that the user will probably move forward/backward through all the photos and/or enable the "slide show" feature; so you want to pre-load all 50 photos in the background.

The photo the user clicked on needs to be loaded as "high priority" and displayed as quickly as possible; the next photo needs to be loaded as "medium priority" so that its file IO doesn't slow down the "high priority" file IO; and in the same way the other photos need to be loaded at progressively lower priorities. Basically, the priority of file IO for each photo being loaded/pre-loaded depends on the photo's "distance" from the photo currently being displayed.

While all this loading/pre-loading is happening, the user is moving forward/backward through all the photos. If the photo currently being displayed keeps changing, and priority of file IO for each photo depends on the "distance" from the currently displayed photo; then the priority of file IO for each photo being loaded/pre-loaded should also change. For example, maybe the user started at "photo #1" and you told the OS to pre-load "photo #9" at a very low priority, but then the user clicked on the "next photo" button 7 times and now you're displaying "photo #8"; so you tell the OS to change the priority of the file IO for "photo #9" to "medium/high".

Of course there's also memory constraints - there's 50 photos, but you've only got enough memory to hold 25 photos. As the user is moving through all the photos you need to be discarding "less likely to be needed" photos to free up memory and cancelling any file IO that is no longer necessary, and starting file IO for photos that become more likely to be needed.

There's also decoding. You don't just want to pre-load the photos, you want to pre-decode the JPGs into raw pixel data so that when the user clicks on the "next photo" button it can be displayed faster. Don't forget about the memory constraints though - maybe you've got enough memory to have 12 of the photos loaded and decoded, and 13 photos loaded but not decoded. Of course for the 4 "most likely to be needed soon" photos you also want to upload the data into display memory, so that those photos can be displayed with an extremely fast 2D accelerated "display memory to display memory" bit blit, and so that when the user clicks on the "next photo" button the next photo is displayed instantly.

Now, you're not using asynchronous file IO. Instead, you've got 50 threads (potentially all fighting for the same CPU/s) plus one main thread responsible for organising everything and displaying the current photo. Every time the user clicks on the "next photo" button the main thread unleashes a storm of thread switches as it tries to re-synchronise all those 50 threads. You're also dealing with lock contention and race conditions and who knows what else.

Remember this:

Brendan wrote:[*]Scalability: It's still a major problem. Lots of software is still "single threaded" (and I suspect that a lot of software that is multi-threaded isn't as well designed as it could be).

That's right - most application developers simply will not bother with any of this. They'd look at everything above and decide it's too hard and give up. They'll use plain file IO without any threads, without any priorities and without any pre-loading; and each time the user clicks on the "next photo" button there's going to be noticeable lag before the next photo is displayed. The user is going to think the image viewer is laggy. What about other applications? If the user thinks every application is slow/laggy then the user is going to blame the OS and not the applications. So, how do you make it easier for application developers?

Asynchronous file IO should make this image viewer a lot easier to write. You'd have one high priority thread doing all the synchronisation and file IO; and maybe a few "worker threads" for decoding images (e.g. no more than one worker thread per CPU). Almost all of the hassle of synchronising the threads (and the locks, race conditions, etc) disappears; and the "unleashes a storm of thread switches" problem is gone too.

Unfortunately, I believe you (and Owen) are right: asynchronous file IO is badly implemented and badly supported on most existing OSs. Asynchronous file IO should make the image viewer a lot easier to write, but probably won't.

Cheers,

Brendan

FlashBurn · Post by **FlashBurn** » Sun Mar 27, 2011 1:47 am

@Brendan

The next problem is that our computers are getting faster and faster, so the programmers of the photo viewer will think, "ah it´s fast enough we need not to bother with the performance of it" or "well, computers have enough memory so why making things harder and decide which photos to load, load all photos, the os can do the rest" (the rest would be swapping and such things). Especially if they start to write a 64bit photo viewer, why not loading all photos, we have a large address space.

rdos · Post by **rdos** » Sun Mar 27, 2011 3:38 am

Brendan wrote:For an example, imagine an image viewer. There's a directory containing 50 photos; and the user clicks on one of the photos. However, you know it's likely that the user will probably move forward/backward through all the photos and/or enable the "slide show" feature; so you want to pre-load all 50 photos in the background.

Good example. This does resemble the sequences our payment terminal has. They are built-up with sequences of jpg/png images. This was easier than porting an MPEG (or similar) player. The slide-show will run a variable number of images, and then restart.

Brendan wrote:The photo the user clicked on needs to be loaded as "high priority" and displayed as quickly as possible; the next photo needs to be loaded as "medium priority" so that its file IO doesn't slow down the "high priority" file IO; and in the same way the other photos need to be loaded at progressively lower priorities. Basically, the priority of file IO for each photo being loaded/pre-loaded depends on the photo's "distance" from the photo currently being displayed.

While all this loading/pre-loading is happening, the user is moving forward/backward through all the photos. If the photo currently being displayed keeps changing, and priority of file IO for each photo depends on the "distance" from the currently displayed photo; then the priority of file IO for each photo being loaded/pre-loaded should also change. For example, maybe the user started at "photo #1" and you told the OS to pre-load "photo #9" at a very low priority, but then the user clicked on the "next photo" button 7 times and now you're displaying "photo #8"; so you tell the OS to change the priority of the file IO for "photo #9" to "medium/high".

In our terminal, the sequence can be broken at any point by user interaction that should show a new sequence. Therefore, the scenario is similar here as well,

Brendan wrote:Of course there's also memory constraints - there's 50 photos, but you've only got enough memory to hold 25 photos. As the user is moving through all the photos you need to be discarding "less likely to be needed" photos to free up memory and cancelling any file IO that is no longer necessary, and starting file IO for photos that become more likely to be needed.

In the beginning, we preloaded all the sequences, but as the number of sequences grew, it turned out there were not enough memory to do this, and it took too long to do when the terminal started up.

Brendan wrote:There's also decoding. You don't just want to pre-load the photos, you want to pre-decode the JPGs into raw pixel data so that when the user clicks on the "next photo" button it can be displayed faster.

Yes, this is the most time consuming operation for an JPEG. The disc-accesses are also done in small chunks (unless you memory-map the whole file, but then it will still be in small chunks of pages).

Brendan wrote:Now, you're not using asynchronous file IO.

asynchoronous file IO does not solve the issue. As pointed out above, file IO is the small part of handling JPEGs, and the compression algorithm accesses the file in small chunks.

Brendan wrote:Instead, you've got 50 threads (potentially all fighting for the same CPU/s) plus one main thread responsible for organising everything and displaying the current photo. Every time the user clicks on the "next photo" button the main thread unleashes a storm of thread switches as it tries to re-synchronise all those 50 threads. You're also dealing with lock contention and race conditions and who knows what else.

No, you don't have 50 threads. When I couldn't preload all sequences at startup, I changed algorithm to include a single preloader thread. The display thread will tell the loader thread to start preloading a sequence. It will then wait for completion for images that it needs. If the terminal needs to display another sequence, the preload thread is stopped (it stops preloading), and is instructed to preload another sequence. I think this algorithm can also be used (a little modified) on the more general-purpose image viewer.

rdos · Post by **rdos** » Sun Mar 27, 2011 3:44 am

FlashBurn wrote:@Brendan

The next problem is that our computers are getting faster and faster, so the programmers of the photo viewer will think, "ah it´s fast enough we need not to bother with the performance of it" or "well, computers have enough memory so why making things harder and decide which photos to load, load all photos, the os can do the rest" (the rest would be swapping and such things). Especially if they start to write a 64bit photo viewer, why not loading all photos, we have a large address space.

Yes. People are lazy. I was fortunate to only have 256MB of RAM, which wasn't enough for preloading all sequences.

JamesM · Post by **JamesM** » Sun Mar 27, 2011 12:36 pm

rdos wrote:
Brendan wrote:For an example, imagine an image viewer. There's a directory containing 50 photos; and the user clicks on one of the photos. However, you know it's likely that the user will probably move forward/backward through all the photos and/or enable the "slide show" feature; so you want to pre-load all 50 photos in the background.
Good example. This does resemble the sequences our payment terminal has. They are built-up with sequences of jpg/png images. This was easier than porting an MPEG (or similar) player. The slide-show will run a variable number of images, and then restart.

Just somewhat off-topic, but was a risk analysis done for the development of a custom OS for a payment terminal? It seems like just inviting disaster when it's either not maintained any more or a security flaw is exposed.

Just purely out of interest, what was the business value in not reusing an existing OS such as linux or BSD?

rdos · Post by **rdos** » Sun Mar 27, 2011 1:35 pm

JamesM wrote:Just somewhat off-topic, but was a risk analysis done for the development of a custom OS for a payment terminal? It seems like just inviting disaster when it's either not maintained any more or a security flaw is exposed.

The causation was reversed. RDOS was already a mature OS when the decision to transition to a PC platform was made. We have evaluated both Windows CE (a disaster) and Linux, but IMO RDOS was the best choice. None of the desktop OSes are adapted for embedded, standalone systems. You just cannot send error messages or blue-screens to end-customers.

As for security flaws, there are no security flaws. Using RDOS, it is possible to turn-off each and every entrance to the system, which is exactly what we have done as well. Even as the system operates over Internet, there is nobody that can connect to it. All conections are out-bound. Also, since nobody has good experience with RDOS, there will be no viruses, and if somebody steals the whole system, they will not be able to get any sensitive information from it. We could fullfill the PCI requirements with ease because these relates to flaws in desktop OSes.

Additionally, I've worked with the terminal software on non-PC platforms since 1995. The development of RDOS started in 1988, and thus RDOS was already a stable OS when I started with our company's terminal software. During the time since, a lot of the code for the terminal kernel on non-PC systems was borrowed from RDOS, and at the end of the 1990s we had a 16-bit RDOS version of the terminal were all development and debugging was made. This was necesary since we had no debugger for our non-PC system. Because the source was the same to 90-95%, we could find many bugs in the terminal with the RDOS version. Because of this we already had a working terminal for RDOS (now using a flat memory model) when we decided to transition to a PC platform. IOW, it was much cheaper to use this, already existing, software than to transition to Linux or Windows. And because RDOS and the terminal was developped in parallel, many of the needs for a self-service terminal was added to RDOS, which made RDOS a much better choice because it had a natural adaptation to embedded systems, something that Windows or Linux never had. Their adaptations are add-ons, with varying degrees of effectiveness.

It is also worth pointing out that there is a well-designed interface between the terminal and RDOS, which can be ported to both Windows and Linux just in case. However, I don't anticipate we will use this opportunity anytime soon because the problems we have are all related to the new software-functions in the terminal software and not to bugs in RDOS.

JackScott · Post by **JackScott** » Sun Mar 27, 2011 4:50 pm

rdos wrote:As for security flaws, there are no security flaws.

Therefore the only thing I know for absolute certain about your operating system is that there definitely are security flaws.

rdos · Post by **rdos** » Sun Mar 27, 2011 11:49 pm

JackScott wrote:
rdos wrote:As for security flaws, there are no security flaws.
Therefore the only thing I know for absolute certain about your operating system is that there definitely are security flaws.

I'm sure there could be security-flaws in RDOS if it is configured in an inappropriate way. However, security of a system is dependent not only on OS security, but mostly on specfics of the setup. And the card-industry's PCI requirements are among one of the toughest in the world to comply with, especially for a desktop OS. Our setup, in combination with the method of configuration, makes it easy to comply with PCI:

* There are no open ingoing ports. This is ensured by an external firewall (no open ports) as well as by the OS configuration (no drivers that listens to ports are present in the configuration)

* There are no user-accounts

* There is no command-line interpreter, neither any other type of shell or GUI that can be exploited. The only running applications is the loader and the payment application.

* No services or programs can be made to start locally.

* If somebody steals the system, he/she will not be able to operate it because of hardware switches that makes the system inoperable

* All configuration of the system is done from our host, and therefore there are no sensitive settings locally

* Connection to our host is done by connecting a socket to a fixed IP. This IP is compiled into the payment application.

* Upgrading is done by the loader contacting a fixed IP. This IP is compiled into the loader application.

* If somebody steals the CompactFlash disc, they will not be able to extract sensitive card-data from the file system.

As you can see, security is based on the setup, not how secure the OS in itself is.

OSDev.org

what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?

Re: what's the real SLOW parts in popular OS/OS theories?