BCOS: Threads

Candy · Post by **Candy** » Fri Jan 20, 2006 3:35 am

kmcguire wrote: Yeah. It makes sense not to have to switch CR3 as much as possible. It wastes CPU cycles. And, for a web-server you would want to thread it up so that clients do not sit waiting in a line, instead they start getting something right away, and using the CR3 switch is going to be a performance drain when it could have been made simplier by using the process and thread model. Even for the example he uses above, it would be smarter to not switch CR3, it is no reason to do so expect to protect threads from each other.

The idea behind a thread is it being a lightweight process that has less "of its own" than a normal process. There are nearly infinitely many in-between forms, and there's a form above it (jobs, see also WinNT scheduling). You can have stuff Windows calls fibers, which are threads that avoid the kernel-land switch, or stuff in between with a dozen options (try the Linux clone() call).

Brendan's idea is another form, that has slightly more overhead due to switching than a normal thread would have. This causes you to have to limit the number of switches, which leads to this design. If you made the switch lighter you'd use more threads as it's easier to program.

You could generalize the idea as well. What if you made a "request queue"? On this queue you put all deferred function calls, in order of calling. Then, you can make deferred calls to stuff like writes in an operating system or stuff like that. You would make a form of call that takes a function pointer and an argument structure pointer.

I also played with the idea of a thread-call, which would be useful for a number of things, first of which is replacing the ugly C-style interfacing for making a new thread in c++ code. You just tell it to tcall a certain funciton and it does.

I'm still not entirely sure you can use variable arguments, but if you can, all the better. Might be a problem with C++ name mangling as ... is put as z, but I think the linker knows about that.

You'd have one or two threads running in the background to catch dcalls and to execute them. You'd use them for stuff that's "do and forget", stuff you'd use a servant for in ancient Rome. Sending a message, writing a complex structure, reading 20 buffers simultaneously (yes, you can use 20 dcalls at once and wait for them all to complete, so that the read ordering is left to the disk level and can be quite optimized)...

Of course, there's a real-time version (soft) of tcall, so you can use it to create realtime threads. Pri is a relative variable, higher pri threads get all the time until complete, low pri threads get the rt-leftovers. RT execution time is limited to +- 70% of all cpu time. There's no load factor, since this is determined automatically (and it's near impossible to get right for all computers, so it'd be pointless).

Code: Select all

void dcall(void *(*)(...), ...);
void tcall(void *(*)(...), ...);
void tcall_rt(int pri, void *(*)(...), ...);

Brendan · Post by **Brendan** » Fri Jan 20, 2006 5:41 am

Hi,

Candy wrote:I consider it to be not the best design. If any of the objects wanted to do an operation with the system, it'd have to be converted to nonblocking or async operations, no object can have a big calculation loop or the rest suffers, gang scheduling is almost a requirement for acceptable performance and why reduce the amount of threads (usually for reducing the overhead of thread switching) when you're going to switch like hell because of this IPC and near-continual waits?

Candy, by now I thought you would've remembered that I use asynchronous (or non-blocking) messaging for everything. In practice the IPC would not cause constant thread switches, but the code does need to be designed for this non-blocking messaging.

I could have included a description of how my OS's IPC works, then explained how to write code where the "result of a request" happens at any time after the original request (where results may be returned in a different order to their requests), and then applied this to the examples presented.

Given that the post was exceptionally long and that the first 2 replies were about the length of the post rather than the content, I think it was wise not to include details of the IPC.

Also (for my OS), the overhead involved with having a process with multiple threads at the same priority level (where "number of threads" is greater than the number of CPUs) is not limited to just thread switch costs. Each thread also costs 20 KB of RAM (kernel stack, thread control block, page directory, a page for the thread's stack and a page table for this page). Then there's the performance effects of "multiple incomplete requests".

For a (simplified) example, imagine you need to do 2 large calculations that take one second each and that both calculations are equally important (same priority), and that it's a single CPU computer. If you use 2 threads (one thread for each calculation) then after one second both calculations would be "half complete" because both threads are sharing CPU time. If you use a single thread and do these calculations one at a time, then after one second the first calculation would be complete and the other calculation wouldn't have been started. This means that the first result is available 50% faster when only one thread is used (and the second result would take the same amount of time regardless).

Now extend this (and my little "person" and "location" objects example) to a multi-user database server. One user does a search for all person objects that have a name beginning with 'F' and ending with 'e', and just after this another user does a search for all location objects with names that begin with 'Q'. With non-blocking messaging, the "main thread" would receive the first "search request", send messages out to all "worker threads", and these worker threads would start doing the first search. Then the main thread would receive the second search request and send a second set of messages out the worker threads. The worker threads would be busy doing the first search so these messages would wait on their message queues. As each worker thread finishes it's part of the first search it'd send a result back to the main thread, grab the next message off it's queue and start the second search. When the main thread has received all of the results from the first search it'd send the result back to the first user, and when the main thread has received all of the results from the second search it'd send the result back to the second user.

If the "person class" is implemented as seperate thread/s to the "location class" and both are at the same priority, then both users would probably need to wait until both searches are almost complete (due to CPU sharing by the threads). This could be avoided by using different priorities for the threads - for example, the "locations class" threads could be at a higher priority, which would mean that the second user gets their result sooner than the first user because the CPUs would stop doing the first search/thread when the second search/thread is started, and resume this earlier thread when the higher priority thread/s stop running. Depending on what you're doing, this might be desirable.

Now consider the worst possible design (in this situation, for my OS) - one thread per search using shared data. In this case, a single search wouldn't take advantage of multiple CPUs. For "2 searches at the same time" the results of the first search would be delayed because of multiple threads at the same priority sharing the same CPU. To make it worse, the data would need to be protected with re-entrancy locking (more overhead for acquiring/freeing the locks and possibly lock contention problems).

Cheers,

Brendan

Brendan · Post by **Brendan** » Fri Jan 20, 2006 6:39 am

Hi,

Candy wrote:
kmcguire wrote: Yeah. It makes sense not to have to switch CR3 as much as possible. It wastes CPU cycles. And, for a web-server you would want to thread it up so that clients do not sit waiting in a line, instead they start getting something right away, and using the CR3 switch is going to be a performance drain when it could have been made simplier by using the process and thread model. Even for the example he uses above, it would be smarter to not switch CR3, it is no reason to do so expect to protect threads from each other.
The idea behind a thread is it being a lightweight process that has less "of its own" than a normal process. There are nearly infinitely many in-between forms, and there's a form above it (jobs, see also WinNT scheduling). You can have stuff Windows calls fibers, which are threads that avoid the kernel-land switch, or stuff in between with a dozen options (try the Linux clone() call).

Yes - and I should point out that my terminology could be considered technically incorrect. My "threads" is closer to most people's "process", and my "process" is closer to most people's "jobs" (depending on which perspective you look at it from). To me this is similar to the term "task", where the meaning is ambiguous and probably should be defined when used.

In general, all systems have one or more "levels" where each level can have different properties and different sharing and ownership rules for resources (memory, CPU time, I/O ports, IRQs, IPC ports/exchanges/sockets/whatever, file handles, etc).

Candy wrote:Brendan's idea is another form, that has slightly more overhead due to switching than a normal thread would have. This causes you to have to limit the number of switches, which leads to this design. If you made the switch lighter you'd use more threads as it's easier to program.

I'd say that the single largest thing that influenced my design is the non-blocking messaging, and that this has contributed to most other design decisions (including the way threads use memory) - I've used non-blocking messaging since the earliest prototypes, while multi-threaded processes came later.

In practice, it's a set of design decisions intended to complement each other. If any one of these design decisions were changed then all other design decisions would need to be re-evaluated, and everything could end up radically different.

Cheers,

Brendan

Colonel Kernel · Post by **Colonel Kernel** » Fri Jan 20, 2006 11:16 am

Candy wrote:That way, multithreaded design is a trivial thing to add, easier than it's now in Unix due to all standard things kind of assuming you don't multithread and a whole lot easier than in Windows, where the entire subsystem can't handle multithreading properly (so they create apartments).

You can thank 16-bit Windows for the horrors of Apartment threading. It's a way to run single-threaded COM objects (which is the only kind of COM object that existed in 16-bit Windows) in a multi-threaded process without the need for additional locking. It's yucky.

Brendan, I find your examples a little strange, mainly because I consider concurrency and OO design to be orthogonal concepts (the fact that COM tied them together with apartment threading is what made it so clunky for developers to deal with). IMO it's weird to assign threads and thread priorities to things like "Person" and "Location", which are data-centric things, not process-centric things (process in the sense of a sequence of steps, not an OS process).

In a nutshell, I think you need to spend more time looking at how applications are designed today, and seeing how that would apply to your system. After all, it's not the job of the OS to prescribe a particular design methodology to developers. Instead, the OS should allow developers to create designs that maximize the efficient use of available resources. Also, CPUs are not the only source of parallelism within a system, as I'll point out below...

Brendan wrote:For a (simplified) example, imagine you need to do 2 large calculations that take one second each and that both calculations are equally important (same priority), and that it's a single CPU computer. If you use 2 threads (one thread for each calculation) then after one second both calculations would be "half complete" because both threads are sharing CPU time.

No, they'd be less than half complete due to context-switching overhead.

If you use a single thread and do these calculations one at a time, then after one second the first calculation would be complete and the other calculation wouldn't have been started. This means that the first result is available 50% faster when only one thread is used (and the second result would take the same amount of time regardless).

Again, more than 50% faster, but you get the idea. To put it simply, you're describing a trade-off between latency and throughput.

Now extend this (and my little "person" and "location" objects example) to a multi-user database server.

Now things get ugly, because I work on database query engines for a living. ;D

One user does a search for all person objects that have a name beginning with 'F' and ending with 'e', and just after this another user does a search for all location objects with names that begin with 'Q'. With non-blocking messaging, the "main thread" would receive the first "search request", send messages out to all "worker threads", and these worker threads would start doing the first search. Then the main thread would receive the second search request and send a second set of messages out the worker threads.

There is actually nothing here so far that depends on kernel-level non-blocking messaging. On today's OSes, this would be done by having a shared queue (it can be a lock-free queue if the overhead of locking is a concern and if such a thing is available in your library of choice) into which the main thread puts requests, and out of which the worker threads take requests. Note that this doesn't even require system calls to work (except maybe for the locking, if that's how the queue is synchronized).

continued...

Colonel Kernel · Post by **Colonel Kernel** » Fri Jan 20, 2006 11:20 am

The worker threads would be busy doing the first search so these messages would wait on their message queues. As each worker thread finishes it's part of the first search it'd send a result back to the main thread, grab the next message off it's queue and start the second search. When the main thread has received all of the results from the first search it'd send the result back to the first user, and when the main thread has received all of the results from the second search it'd send the result back to the second user.

Ok, I understand this part...

If the "person class" is implemented as seperate thread/s to the "location class" and both are at the same priority, then both users would probably need to wait until both searches are almost complete (due to CPU sharing by the threads). This could be avoided by using different priorities for the threads - for example, the "locations class" threads could be at a higher priority, which would mean that the second user gets their result sooner than the first user because the CPUs would stop doing the first search/thread when the second search/thread is started, and resume this earlier thread when the higher priority thread/s stop running. Depending on what you're doing, this might be desirable.

Or, you could not make things like "person" and "location" correspond to particular threads and scheduling priorities. In a real DBMS, the system has no notion of what content is being searched for (i.e. -- it isn't even aware of what "person" and "location" mean). Instead, it understand particular actions that it needs to take such as parsing the query, building and optimizing an execution plan, and executing each step in the execution plan. The first two steps I just mentioned are CPU-bound, while the last is disk-bound and can be sped up a lot by using lots of memory for buffering.

An execution plan is set up as a pipeline where data is "pulled" out the top and subsequently gets pulled out of each stage on demand. This means that each stage can execute in parallel if need be. Also, each disk can be counted among the parallel "execution" resources available, since reading disk blocks is where most of the time goes in executing database queries.

To take your example, imagine that "person" and "location" are each tables sitting on disk, each of which has a "name" column of type string. IMO, the amount of parallelism you can achieve in querying these two tables has a lot more to do with how many disks you have rather than the type of IPC or threading scheme being used by the underlying OS. Assume that you have four disks, and each table spans two disks (the two tables do not share any disks). Now work backwards from there and look at the execution plan for each. Assuming that there is no index on the "name" columns of the two tables, then each table must be scanned. For each block of records, the CPU must spend some time examining the "name" fields. Within the execution plan itself, there is an opportunity for parallelism since you can profitably scan each half of the table concurrently, since they're on separate disks. This means on a four-CPU system, you'd have a thread on each scanning half of one of the tables. It makes no sense for these threads to be distributed based just on "priority" or how they somehow relate to the content being queried. My point is, let the developers using your OS decide how to maximize resource utilization.

Now consider the worst possible design (in this situation, for my OS) - one thread per search using shared data. In this case, a single search wouldn't take advantage of multiple CPUs.

If you're talking about in-memory searches, you could parallelize the search in the same way I mentioned -- by searching each half of each "person" or "location" list separately, as long as there are enough CPUs to support such parallelism.

For "2 searches at the same time" the results of the first search would be delayed because of multiple threads at the same priority sharing the same CPU. To make it worse, the data would need to be protected with re-entrancy locking (more overhead for acquiring/freeing the locks and possibly lock contention problems).

I don't see why locking is required, unless other threads are trying to modify the data as it's being searched.

I think the real world is a little more complex than many OS devers appreciate.

How would your OS handle my more realistic DBMS example given above, and how would it improve on it relative to today's OSes?

Kevin McGuire · Post by **Kevin McGuire** » Fri Jan 20, 2006 12:40 pm

@Brendan
My OS is using IPC for all communication. It uses a process and thread model with a simple round-robin scheduler. I appear to already be running into issues with performance from loops waiting for data - it seems to be the problem. I have loops in my applications that wait for keys. In a normal system these loops would make some kind of blocking call, and might have the thread put to sleep while it waits - I am no expert and may be wrong here. However, mine just loops. I am going to have to track down the root cause, but it seems you might end up for a headache with you're design too.
http://www.mega-tokyo.com/forum/index.php?board=7;action=display;threadid=8887
@Anyone & Brendan
The whole purpose of my design was to make destributed processing a easier task for the developer. If you hackers want to shoot the hell out of my idea you are welcome to. It has flaws, but never the less is it going to run into similar problems as brendan's?

Kevin McGuire · Post by **Kevin McGuire** » Fri Jan 20, 2006 3:45 pm

I figured it out! I think, bare with me. The aritecture i use is similiar to brendon's. We both want to go totaly IPC to make distributed processing totaly transparent to software. However, I have figured out what the differences are and possibily how to solve them.

Code: Select all

------------------- top of call stack -------------------
 |  | |                            |   |
------------------- kernel layer ------------------------
 |  |/                             |   |
  \/                               /__/
  |                                |
------------------- user layer --------------------------
Job0                              Job1

Alright, the above would depict a normal I think system. The jobs execute code and make calls into the kernel layer, and of course they go through the normal transistions. As the scheduler switches jobs it saves the stack of each transistion level. So, basicly jobs are stringed all through out the system.

In a totaly IPC system, it could look like this.

(Job0) (Job1) (Kernel)

Now, of course they got those funny looking lines but they never actually cross into another enviroment or space. The IPC provides the communication and seperation.

The problem is you have a hell load of waits. Job0 waits for Job1 to send something, while Job1 waits for the kernel to do some work. If the waits do not release CPU time you just sit there wasting CPU power. To fix this you release CPU time, with some inventive methods, managed by the scheduler. These inventive methods could prevent a switch into a waiting process(or its threads) entirely. Not having to load CR3 and transistion.

The next problem is, with Brendon's design apparently and possibly with mine since I am using IPC to seperate system you could say, process switching overhead. Even if these inventive wait states work really well. If something has to go through quite a few processes instead of staying in the same enviroment using DLLs and such it will enbark unapon hellish CR3 changes. The more granular the system becomes using objects to more hellish the CR3 change would become. This something could be data coming down a object oriented device chain.

1. application
[could add some network protocol things in here just to waste some more CPU power]
2. device type system (network devices)
3. pci bus (bus)
4. pci device (network card)
That could end up being four processes. It would for sure be four processes using Brendon's approach unless he implemented drivers and such differently. Anyway, he has no threads so he is stuck with a definite 4 switches.

A total IPC system won't be the best idea.

A system that used the normal means of interacting with the kernel through interrupts and such, along with DLLs. But, employing a advanced IPC system to allow distributed processing if a developer wanted to support it, or fast transfer of data from a driver to a user-application. Then again isn't this what already happens in a windows system for example using direct X and such?

Im starting to feel like any idea that is inovative has already been done in OS Dev. The guys who are paid for this.

The singularity project does seem to be the only one that is actually inovative and could do something useful.

Brendan · Post by **Brendan** » Fri Jan 20, 2006 9:40 pm

Hi,

Colonel Kernel wrote:Now things get ugly, because I work on database query engines for a living. ;D

Colonel Kernel wrote:I think the real world is a little more complex than many OS devers appreciate. How would your OS handle my more realistic DBMS example given above, and how would it improve on it relative to today's OSes?

I'm not sure it'd be wise for me to even attempt to answer this - I'd assume that professional DBMS designers have learnt to bypass the operating system's design a long time ago. For example, with enough work it'd be possible to implement a multi-threaded distributed data base server on top of DOS - you wouldn't need a multi-threaded distributed OS to do it, and could possibly get better performance by using your own "application specific" code instead of using general purpose OS services.

To be honest, if I were actually designing a DBMS for my OS I probably wouldn't use OOP (part of the context of this topic) to begin with.

Cheers,

Brendan

Brendan · Post by **Brendan** » Fri Jan 20, 2006 11:38 pm

Hi,

kmcguire wrote: @Brendan
My OS is using IPC for all communication. It uses a process and thread model with a simple round-robin scheduler. I appear to already be running into issues with performance from loops waiting for data - it seems to be the problem. I have loops in my applications that wait for keys. In a normal system these loops would make some kind of blocking call, and might have the thread put to sleep while it waits - I am no expert and may be wrong here. However, mine just loops. I am going to have to track down the root cause, but it seems you might end up for a headache with you're design too.
http://www.mega-tokyo.com/forum/index.php?board=7;action=display;threadid=8887

What you're describing is called a "polling loop" - something like:

Code: Select all

   while( isDataPresent() == NotYet) {  }

Polling loops should be avoided completely - they waste CPU time and there's never any need for them (except for joysticks due to old and crappy hardware).

To avoid a polling loop you need to ensure that when data does become ready some sort of signal is sent, and then instead of waiting for this signal you make the thread block until the signal arrives. For OSs that use messaging this signal would be a message and the thread would block until the message arrives.

This can make "non-blocking messaging" sound confusing - my "get_message()" function does block. The reason it's "non-blocking messaging" is because sending a message does not block.

Consider the following examples...

Example 1 - Polling

Code: Select all

main() {
    while( (keypress = get_keypress() ) == NotYet) { }
    while( (fp = fopen(filename, "w") ) == NotYet) { }
    result = do_large_calculation();
    while( (file_char = fgetc(fp) ) == NotYet) { }

    printf("Keypress: %c, File: %c, Result: %d\n", keypress, file_char, result);
}

Example 2 - Blocking IPC

Code: Select all

main() {
    keypress = get_keypress();
    fp = fopen(filename, "w");
    result = do_large_calculation();
    file_char = fgetc(fp);

    printf("Keypress: %c, File: %c, Result: %d\n", keypress, file_char, result);
}

Note: The actual IPC/messaging stuff is hidden in the "get_keypress()" and "fopen()" functions, which don't return until the data is available (the thread blocks until the data arrives).

Example 3 - Non-blocking Messaging

Code: Select all

main() {
    key_status = NotYet;
    fget_status = NotYet;

    ask_for_keypress();
    ask_for_fopen(filename, "w");
    result = do_large_calculation();

    do {
        get_message();              // Thread blocked waiting for *any* message
        switch(function) {
            case KEYPRESS_STATUS:
                key_status = yes;
                keypress = get_the_data_from_the_message();
                break;
            case FOPEN_STATUS:
                fp = get_the_data_from_the_message();
                ask_for_fget(fp);
            case FGET_STATUS:
                fget_status = yes;
                file_char = get_the_data_from_the_message();
        }
    } while( (key_status == NotYet) || (fget_status == NotYet) );

    printf("Keypress: %c, File: %c, Result: %d\n", keypress, file_char, result);
}

[continued]

Brendan · Post by **Brendan** » Fri Jan 20, 2006 11:41 pm

[continued]

The first example (polling) is similar to what you might find on a single-tasking OS like DOS. This is fine for DOS because there's nothing to do until the data arrives anyway, but for a multi-threaded OS it's not very good because other threads can be run instead of wasting CPU time.

The second example (blocking IPC) is what most OSs do. It's easy to write software for and does allow other threads to run while your waiting for the data.

The last example (non-blocking messaging) is what my OS does. It also allows other threads to run while your waiting for the data, but it's harder to write software for. The difference between this and blocking IPC is that it allows everything to happen in parallel, where for the blocking IPC example everything happens one at a time.

Consider what would happen if it takes 5 seconds for the user to press a key, 1 second to open the file, 1 second to get the first character from the file, and 3 seconds to do the large calculation. For the polling example it would consume 10 seconds of CPU time and take 10 seconds to complete. For the blocking IPC example it'd consume 3 seconds of CPU time and take 10 seconds to complete. For the non-blocking messaging it'd consume 3 seconds of CPU time and take 5 seconds to complete.

Of course for the blocking IPC you can do everything in parallel by spawning more threads. For example:

Code: Select all

main() {
    key_status = NotYet;
    fget_status = NotYet;

    spawn_thread(&first_thread);
    spawn_thread(&second_thread);
    result = do_large_calculation();

    while( (key_status == NotYet) || (fget_status == NotYet) ) {
        sleep(1);
    }

    printf("Keypress: %c, File: %c, Result: %d\n", keypress, file_char, result);
}

first_thread() {
    keypress = get_keypress();
    key_status = yes;
    terminate_thread();
}

spawn_thread() {
    fp = fopen(filename, "w");
    file_char = fgetc(fp);
    fget_status = yes;
    terminate_thread();
}

This will get similar performance to the non-blocking example, but it's (IMHO) just as hard to write and there is extra overhead (spawning and terminating threads). I also don't like the "while" loop in "main()" - it looks too much like polling to me.

Cheers,

Brendan

Colonel Kernel · Post by **Colonel Kernel** » Sat Jan 21, 2006 3:11 am

Brendan wrote: [continued]Of course for the blocking IPC you can do everything in parallel by spawning more threads.

<snip/>

This will get similar performance to the non-blocking example, but it's (IMHO) just as hard to write and there is extra overhead (spawning and terminating threads). I also don't like the "while" loop in "main()" - it looks too much like polling to me.

It's hard to write the way you've written it, but a good library with thread pooling and "futures" support solves some problems here:

Code: Select all

main() {
    // Each background task starts when its future object is
    // constructed.
    future<char> keypress( first_thread );
    future<char> file_char( second_thread );

    result = do_large_calculation();

    // The future::get() method is blocking and returns the
    // result of the background task.
    printf("Keypress: %c, File: %c, Result: %d\n", keypress.get(), file_char.get(), result);
}

char first_thread() {
    return get_keypress();
}

char second_thread() {
    fp = fopen(filename, "w");
    return fgetc(fp);
}

This has the added benefit that the programmer doesn't have to care how many processors are present and therefore how many threads are needed to fully utilize them -- it's the job of the thread pool to keep track of that.

I'm not sure it'd be wise for me to even attempt to answer this

You should try -- it will make you think critically about the role your OS plays in the entire system.

I'd assume that professional DBMS designers have learnt to bypass the operating system's design a long time ago.

Heh, that's one way of looking at it. Another way of looking at it is that if I as a developer want to synchronize multiple threads with user-space data structures such as task queues, the OS should not get in my way or prevent me from doing it. IMO it's not the job of the OS to dictate how developers should solve problems -- that's the job of the programming languages, tools, and run-time environments.

To be honest, if I were actually designing a DBMS for my OS I probably wouldn't use OOP (part of the context of this topic) to begin with.

This is a misconception. It is very possible to design system-level software in an OOP fashion, including DBMSes. The problem is a fundamental misunderstanding of what OOP is. (Please don't think I'm singling you out here, I see this all the time.) Your "person" and "location" example was not OOP, it was OOA -- OO analysis. You proposed a model of the problem domain, which was data-centric and quite sensible. However, the problem domain rarely if ever maps cleanly to the solution domain, which I demonstrated with my little rant on how DBMSes work.

For your example to be real and meaningful, you have to bridge the gap between OOA and OOD. The real objects in the solution space are records, queries, disks, buffers, users, permissions, connections, etc. These are not things that can (or should) be transparently distributed across a network. IMHO, until a lot of research into distributed algorithms and co-ordination really solidifies, software architects are the only ones in a position to decide what parts of the solution space ought to be distributed. It certainly isn't the job of the OS to make these decisions, or to suggest a particular way to map a given problem space to that OS' solution space (in your case, "active objects" with priorities interacting via non-blocking messaging).

Brendan · Post by **Brendan** » Sat Jan 21, 2006 7:10 am

Hi,

Colonel Kernel wrote:
Brendan wrote:This will get similar performance to the non-blocking example, but it's (IMHO) just as hard to write and there is extra overhead (spawning and terminating threads). I also don't like the "while" loop in "main()" - it looks too much like polling to me.
It's hard to write the way you've written it, but a good library with thread pooling and "futures" support solves some problems here:

Why not use a library that does the calculation, gets a character from a file, gets a keypress and displays them? The code would be incredibly easy:

Code: Select all

main() {
    do_everything();
}

That looks a lot simpler than my "non-blocking messaging" example (which included all code except the kernel API calls)...

Colonel Kernel wrote:
To be honest, if I were actually designing a DBMS for my OS I probably wouldn't use OOP (part of the context of this topic) to begin with.
This is a misconception. It is very possible to design system-level software in an OOP fashion, including DBMSes. The problem is a fundamental misunderstanding of what OOP is. (Please don't think I'm singling you out here, I see this all the time.) Your "person" and "location" example was not OOP, it was OOA -- OO analysis. You proposed a model of the problem domain, which was data-centric and quite sensible. However, the problem domain rarely if ever maps cleanly to the solution domain, which I demonstrated with my little rant on how DBMSes work.

That depends on what the problem domain is. For my person and location example the "problem domain" was creating an example to describe a concept - I wasn't expecting people to build the next killer application out of it or anything.

Colonel Kernel wrote:For your example to be real and meaningful, you have to bridge the gap between OOA and OOD. The real objects in the solution space are records, queries, disks, buffers, users, permissions, connections, etc. These are not things that can (or should) be transparently distributed across a network. IMHO, until a lot of research into distributed algorithms and co-ordination really solidifies, software architects are the only ones in a position to decide what parts of the solution space ought to be distributed. It certainly isn't the job of the OS to make these decisions, or to suggest a particular way to map a given problem space to that OS' solution space (in your case, "active objects" with priorities interacting via non-blocking messaging).

I'm also not conviced "OOD" is something I'll ever care about - from here:

"The maximum impact from OOD is achieved when used with the goal of designing reusable software systems. For objects without significant reuse potential, OOD techniques were more costly than traditional software development methodologies. This was because of the costs associated with developing objects and the software to implement these objects for a one-time use [Maring 96]."

I've never liked the idea of code reuse - to me it sounds like a fast way to create bloated code (which is a commercially viable approach) rather than code that is custom designed and optimized for a specific problem (which isn't necessarily a commercially viable approach). I guess this is why I'm writing an OS and using assembly instead of being content with Windows or starting with Linux From Scratch.

Of course I'm expecting it to take less than 24 hours for someone to take the above paragraph out of context and complain that code reuse is quicker, that computers are fast enough anyway, that I shouldn't expect everyone in the world to stop reusing code or that they knew a guy who's brother's girlfriend said that it saves her 10 minutes in the bathroom each morning.

I guess I should employ a team of lawyers to filter out all the potential interpretations of my wording, draw up a definition of terms, add clauses describing the conditions of each statement in painful detail, etc. Either that or just give people a meaningless grunt and pretend my OS is yet another boring POSIX system so that nothing needs to be described.

Cheers,

Brendan

Candy · Post by **Candy** » Sat Jan 21, 2006 12:22 pm

Colonel Kernel wrote: You can thank 16-bit Windows for the horrors of Apartment threading. It's a way to run single-threaded COM objects (which is the only kind of COM object that existed in 16-bit Windows) in a multi-threaded process without the need for additional locking. It's yucky.

I'd rather blame them for it.

There is actually nothing here so far that depends on kernel-level non-blocking messaging. On today's OSes, this would be done by having a shared queue (it can be a lock-free queue if the overhead of locking is a concern and if such a thing is available in your library of choice) into which the main thread puts requests, and out of which the worker threads take requests. Note that this doesn't even require system calls to work (except maybe for the locking, if that's how the queue is synchronized).

That'd be the idea behind my dcall implementation, but in a library-form, so you just tell it to start N threads for dcall handling and then you can just dcall a function instead of normal calling it and expect it to be done. tcall would be used when you're interested in when it finishes and/or its outcome. For orthogonality and a nice finish, shutting the dcalled threads off again is a matter of telling them to execute deferred thread_end calls

.

Colonel Kernel wrote: <his code>

Code: Select all

main() {
    // Each background task starts when its future object is
    // constructed.
    char key, fchar;
    list<thread> threads;
    threads.add(tcall(&key, first_thread));
    threads.add(tcall(&fchar, second_thread));

    result = do_large_calculation();
    twait(threads);

    printf("Keypress: %c, File: %c, Result: %d\n", key, fchar, result);
}

char first_thread() {
    return get_keypress();
}

char second_thread() {
    fp = fopen(filename, "w");
    return fgetc(fp);
}

That would be my idea.

Brendan wrote: I've never liked the idea of code reuse - to me it sounds like a fast way to create bloated code (which is a commercially viable approach) rather than code that is custom designed and optimized for a specific problem (which isn't necessarily a commercially viable approach). I guess this is why I'm writing an OS and using assembly instead of being content with Windows or starting with Linux From Scratch.

I think you are exactly opposed to me.

I consider current application of code reuse to be quite bad in design, thereby slowing down applications and so on. There are two fundamental problems causing this:

1. Code reuse isn't transparent and is "hacky" at best
2. People have no clue on algorithmic performance and why some operations are a very bad idea.

#2 can be solved by proper education or by making a free manual explaining the idea. #1 can't be solved easily.

Reused code should be algorithmically optimal and if possible, coded optimally as well. If code is "baked in" to applications, each form of code reuse is going to be "static", insofar that later optimizations can't be applied to it afterwards and you can only hope the application designer to fit it in. Also, largely-unused applications (very specific ones for example) are going to be very unoptimized and cumbersome to handle.

As a last bit, code reuse is mainly for big companies. They have loads of code and they control how it can be used.

Solving this requires a generic interface that allows runtime linking. I'm not going to go in the details again, but I think it suffices to say that such a system is possible.

I think that, given generic interfaces and runtime linking code reuse is simple and fast. I hope this is a "silver bullet".

Of course I'm expecting it to take less than 24 hours for someone to take the above paragraph out of context and complain that code reuse is quicker, that computers are fast enough anyway, that I shouldn't expect everyone in the world to stop reusing code or that they knew a guy who's brother's girlfriend said that it saves her 10 minutes in the bathroom each morning.

Candy reporting for duty, of course

For the first bit at least. Computers aren't fast enough so you can waste it (it's like saying "there's enough water in the oceans, so let's waste it") and people just plain aren't going to stop reusing code. See the first bit of the Therac accidents for examples of why it can be very bad to do so.

It's like having a bulldozer. If you use it crudely, you're going to do more damage than you would do without it. If you use it with knowledge, accidents occur. If you use it for evil, god help your victims. If you use it with computer-controlled fine degree handling, it can be one of the best tools for the job.

Colonel Kernel · Post by **Colonel Kernel** » Sat Jan 21, 2006 2:15 pm

@Brendan:
Whoa, calm down. I'm not trying to pick a fight here. I'm trying to evaluate the advantages of the BCOS architecture, and the tools you've provided me to do this evaluation are the OOP-ish examples above. If the examples aren't sufficiently clear or realistic, then I can't grasp the advantages of your approach. That's all.

You're trying to develop a commercial OS, and as such, you should have a really rock-solid understanding of how developers are supposed to use your OS in order to get benefits that they wouldn't get on other platforms. I'm not seeing this understanding from the examples you've given -- either because I'm dense (it has been known to happen) or because the examples need work.

On to specifics...

Brendan wrote:Why not use a library that does the calculation, gets a character from a file, gets a keypress and displays them? The code would be incredibly easy:
Code: Select all
main() {
    do_everything();
}
That looks a lot simpler than my "non-blocking messaging" example (which included all code except the kernel API calls)...

I'll take this as a segue into your discussion of code reuse below...

I'm also not conviced "OOD" is something I'll ever care about

That article is quite dated. It's quite right that specific OOD methodologies have not always been successful, but I was referring more to "small-o OOD" of the sort that has been considered mainstream for the past few years. I found a quote in the article that sums up quite nicely where I disagree with it:

Analysis and design are closer to each other in the object-oriented approach than in structured analysis and design.

This is indeed how many misguided individuals (like those responsible for J2EE) have attempted to design systems -- by directly translating an analysis into a design just by adding detail. IMO, design is somewhat orthogonal to analysis because it deals with how things will get done as opposed to what things we need to deal with. Candy's right on the mark with his comment about people not understanding algorithmic complexity. I'd put it another way -- the requirements drive analysis, while the architecture of the system on which the software will run should drive design towards an optimal fulfillment of those requirements. The two need to meet in the middle somewhere.

I've never liked the idea of code reuse - to me it sounds like a fast way to create bloated code (which is a commercially viable approach) rather than code that is custom designed and optimized for a specific problem (which isn't necessarily a commercially viable approach). I guess this is why I'm writing an OS and using assembly instead of being content with Windows or starting with Linux From Scratch.

Your approach to kernel development is right for you. Your intention is to optimize for the target processors, which is a perfectly valid goal. It was also valid for the first versions of L4, which were also written entirely in assembler.

Reuse in the context of this discussion is not reuse in the kernel -- it's reuse in user-mode libraries. The key point I was making about future<T> is that you can have user-mode libraries for managing concurrency, and they'll work just fine. I think you're confusing two separate issues in your critique of blocking messaging: convenience for the programmer, and optimal use of hardware resources. My point was that you can make things convenient for the programmer via user-mode libraries, which leaves your argument hinging on the efficient use of hardware resources.

I guess I should employ a team of lawyers to filter out all the potential interpretations of my wording, draw up a definition of terms, add clauses describing the conditions of each statement in painful detail, etc. Either that or just give people a meaningless grunt and pretend my OS is yet another boring POSIX system so that nothing needs to be described.

There's no need for a definition of terms, just a set of really compelling use cases for why your OS architecture solves certain problems better than other OSes do. I'm trying to offer constructive feedback, not shoot holes in your ideals. To put it another way, I want to believe, but I'm naturally a skeptic, so I need more evidence that your design offers compelling advantages. Now that you're recruiting volunteers by way of the "community OS" project, I have no doubt I won't be the last person to ask such questions.

Please don't interpret my relentless counterpoints as hostility -- I value your contributions to this forum, and the help that you've given me over the past 18 months. This kind of back-and-forth discussion can serve to teach us both more about OS design, and help you to refine your ideas.

Brendan · Post by **Brendan** » Sun Jan 22, 2006 9:42 am

Hi,

Colonel Kernel wrote: @Brendan:
Whoa, calm down. I'm not trying to pick a fight here. I'm trying to evaluate the advantages of the BCOS architecture, and the tools you've provided me to do this evaluation are the OOP-ish examples above. If the examples aren't sufficiently clear or realistic, then I can't grasp the advantages of your approach. That's all.

Sorry - it's just sometimes I feel like I'm defending rather than explaining, and last night I had no sleep and a head full of physical memory management stuff (it's surprising how messy "simple free page stacks" can get).

Colonel Kernel wrote:You're trying to develop a commercial OS, and as such, you should have a really rock-solid understanding of how developers are supposed to use your OS in order to get benefits that they wouldn't get on other platforms. I'm not seeing this understanding from the examples you've given -- either because I'm dense (it has been known to happen) or because the examples need work.

It probably has more to do with the main benefits being for users rather than developers. For example, the asynchronous messaging makes things harder for developers, not easier, because it forces them to minimize thread blocking rather than providing them with the possibility of being lazy and using something like "fopen()" without thinking about it. Of course things like libraries don't help with this - sometimes it seems that as long as it works no-one cares how it works.

Don't get me wrong here - there are a lot of very good programmers who actually would use something like thread pools and "futures" support, but there are also plenty of programmers who've never written anything that uses more than one single threaded process. For your profession, I'd expect that your colleagues would be the former rather than the latter, but I wouldn't consider your colleagues to be a good statistical sample.

Cheers,

Brendan

OSDev.org

BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads

Re:BCOS: Threads