Hi,
Candy wrote:
I schedule threads, and for my OS I recommend using the TLS areas for as much as possible - process space should only be used for the executable file, and any data that is shared by all threads (which should be almost nothing if good OOP practices are used).
I tend to disagree quite strongly with that. Given a bunch of programs that run together, they can be multiple processes. Interaction between threads can be more than just a few classes, some designs I've seen are based on proper concurrent access to the entire model for all threads. They were good OO designs, but not on separating threads off all the rest. Threads were created especially for when there are more than one context running in the same environment. In some design patterns they run side by side sharing only their buffer, in some they run in the same code or in interweaved bits of code.
Let me clarify my statement..
...
for my OS and only my OS
I recommend but do not require or insist upon
using the TLS areas for as much as practically
possible - process space should only be used for the executable file, and any data that is shared by all threads (which should be almost nothing if and only if
good OOP practices are used, where the term "good OOP practices" does not imply that other OOP practices or non-OOP practices are less "good" in general
).
Some notes...
In an OS where multiple threads belonging to the same process are run with pre-emptive scheduling or at the same time (i.e. on seperate CPUs), any user-level data structures shared by multiple threads must be protected by re-entrancy locking. Also, in an OS where multiple threads belonging to the same process can be run at the same time (i.e. on seperate CPUs) all linear address space regions that can be accessed by multiple threads must have re-entrancy locks used by any code that may modify those linear address space regions. For these cases, lock contention and lock overhead can be improved by maximizing the use of thread local storage, as anything that can only be accessed by one thread never needs any locks.
For my scheduler threads are given CPU time in order of priority. Therefore (within a single process), with one CPU several threads at the same priority doing one "thing" each will get the same amount of CPU time as one thread at that priority doing all "things". For N CPUs (within a single process), more than N threads at the same priority doing one "thing" each will get the same amount of CPU time as N threads at that priority doing the same "things". The additional threads increase the number of thread switches without improving the amount of work done. For these cases, scheduling overhead can be improved by having one thread per CPU at each priority.
For my OS, "good OOP practices" means when the code is being designed you split it into classes, then assign each class a priority and create groups of all classes at the same priority. At run time, each group would be split into one thread per CPU. Of course this is difficult in practice (it's a guideline only).
For example, your webserver could have 4 classes, "main", "connection", "cache" and "log". The main class and the connection class would be at the same priority, the cache class would be at higher priority and the log class would be at a lower priority. On a single CPU computer this works out to 3 threads, one for each priority.
On a computer with 2 CPUs, you only need one thread for the log class as it's not that important and there's only one instance of it. The cache class can be split into 2 threads where each thread caches roughly one half of the data (put the file name through a hash function to determine which file is cached by which thread). For the remaining classes I'd have one thread for the main class (only one instance) and half of the connections and another thread for the other half of the connections (with some sort of load balancing to determine which thread handles each new connection).
For a computer with 4 CPUs, you still only need one thread for the log class, the cache class would be split across 4 threads, and I'd probably go for one thread for the main class (only one instance) and 3 threads for the connections.
This example probably isn't too good as I'm not that familiar with the internals of a web server, and I do realise standard high level languages might have trouble with it, but it should illustrate the point.
For my original comment, I was only trying to get Pype to think about the possible uses for his TLS - for me and my OS, anything small enough to use INVLPG instead of an address space switch would be far too small to be useful.
Cheers,
Brendan