OdinOS: I'd love some design feedback

mrstobbe · Post by **mrstobbe** » Tue Nov 19, 2013 11:58 pm

So I've been working on a new OS lately and, while previously, I had just gotten to a stage of "holy monkeys I can control everything!", gotten frustrated, and probably went back to a pursuit of following uniformally distributed data down a rabbit hole (as is a messed up tendency of mine [wait! shiny mystical thing! get it!]... seriously... math... I love it and hate it), this one seems to be sticking. I have a vision for it that I'd like to run by you guys (red flags and "oooh have you thought of this" and "you're an idiot" and what-not).

This is a tad long winded so if you're not in the mood then please move on before complaining (no one likes complaints simply because the person didn't bother to read).

First, basic overview of design goals:

Microkernel with (minimal) drivers in ring1/2 and their segments appropriately assigned (I'm planning to still use sysenter/syscall stuff and I have some questions about that)
Minimal hardware support (specific hardware supported... don't have it, out of luck).
Single-process, multi-task. Multi-core/cpu necessary, but 4+ core/cpu highly recommended.
General IRQs firing on CPU0, drivers running on CPU0, any general task/gc on the ring0 level, and the main process on CPU0.
Main process "workers" are sticky to CPU1..n and there's no preemption except during panic.
All I/O is async by nature unless explicitly overridden.
64-megs minimum RAM (including a simple ring3 process)
The heart of the kernel is something similar to kqueue or epoll or IO completion ports, but much more powerful and much more elegant.
A basic TTY and user system to allow controlling/reviewing the system, but nothing beyond that.

Second, a few basic non-goals:

Support most environments (I already said this was a non-goal... repeating).
POSIX/BSD/LIBC/whatever compatibility of any kind. The antithesis of a goal.
Support of any standards or designs that might hinder the intention of this design (such as would cause crippling resource consumption on CPU0)
Conventional ring3 task switching (although semi-support for CPU0 given it's need to handle the drivers).
Support for things like OpenGL, any kind of window management system, stdio, what-have-you... it's all out the window. No web browsers or super-duper shell scripting or wine or whatever. You get the picture.

Settled. So what I'm envisioning is a mono-process multitasking system that operates at real-time for all the processes' "workers" (sticky to each remaining cpu and preemeted only by signals). What separates this concept from most other approaches is my emphasis on non-blocking I/O, generic event signalling, and high-level synchronization. I'm aiming for something that can either act as an insanely high through-put static http server, an insanely high throughput tcp proxy, a node in a fast distributed computing environment, or a fantastically responsive DB, but not more than one of those things at the same time.

Full disclosure: I was inspired by this idea through a cascade of events and thoughts, and the one that solidified it for me was the random OS I came across. [snip] (EDIT: Brendan pointed out that I was thinking of Barmetal OS. check it out if you haven't yet)

The kernel's responsibilities will include:

Management of all CPU0 IRQs
Task switching between the kernel, all drivers, and the main process thread.
Handling low-level errors and fataling gracefully as necessary.
Proxying/buffering/etc all "driver" requests (filesystem/network/etc).
Managing all I/O wait states in a poll() state (the heart of this)
Avoiding event starvation via defined distribution patterns and event preemption/aborting as necessary.
Provision of generic event tracking and firing.
Provision of low to mid-level atomic, semaphore, mutex, and consumer/producer models.
Panic support and recovery.

So, here's some userland psuedocode... completely randomly thought up just before writing this. It's just a representation of a simple ring3 telnet "hello world"-then-hup program and how it might operate. Again, psuedocode and not even reasonable code at that. I just wanted to express some sort of conceptual end goal. Again, unrefined vision.

Code: Select all

handle sockworker = undefined;
handle logger = undefined;
handle userland_event = undefined;
handle conn_consumer = undefined;

struct worker {
	// worker specific vars can go in here (really just thread-local-storage)
	handle events;
};

main() {
	// In CPU0
	handle events = events_create();
	handler logger = open(events, "file:///var/log/whatever.log", APPEND);
	// WAITFOR here does a couple of things... tells the write call to I/O block,
	// and possibly rw locks (in this case write). The workers aren't started
	// yet so contention isn't possible... so the rwlock doesn't matter.
	write(logger, "Starting up\n", WAITFOR);
	sockworker = worker_register(sockworker_setup, sockworker_teardown, sizeof(worker));
	conn_consumer = worker_consumer(sockworker, sockworker_acceptconn, ROUNDROBIN, nullptr);
	worker_startup(sockworker);
	handle listener = listen(events, "tcp://[*]:12345");
	events_on(events, PROCESS | EXCEPTION, UNHANDLED, unhandled_error, nullptr); // << just an idea... don't know where it goes yet though.
	events_on(events, TIMER | ONESHOT, 60*60*1000000, after_one_hour, nullptr);
	events_on(events, LISTENER | INCOMING, listener, do_accept, nullptr);
	userland_event = events_register(events);
	events_on(events, USERLAND, userland_event, arbitrary_userland_event, nullptr);
	status = events_wait(events, INFINITE);
	// We're shutting down for 'status' reason now... a couple of things:
	// First, if it's a graceful shutdown (SIGTERM or whatever), then
	// sockworker_teardown() has already fired and it _had_ an opportuinity to
	// gracefully shutdown (or hang indefinitely if it's a jerk).
	// An event hook like PROCESS | HANG could probably be be provided to force
	// a SIGKILL of hanging workers after x-amount of time.
};

do_accept(handle events, handle listener, void* userland /* is nullptr */) {
	// We're still in the "main" process on CPU0.
	// This _could_ be given to a worker, but it's just an accept(), so why?
	handle conn = accept(listener); //immediate so no blocking/event poll is needed
	worker_give(conn_consumer, conn); //this, however... _may_ be blocking depending on system resource limits
};

sockworker_setup(worker* wls) {
	// This is now run on CPU1..n in parallel
	// Initialize anything in wls (worker* ... worker local storage)
	wls->events = events_create();
	// Note the possible syncronization issue below... this could be handled
	// kernel-level because of the WAITFOR status (a classic rwlock).
	write(logger, "Worker online\n", WAITFOR);
};

sockworker_teardown(worker* wls) {
	// Still in CPU1..n
	// Shutdown/waitfor any active connections this worker was handling
	write(logger, "Worker going offline\n", WAITFOR);
	// Destroy anything in wls
};

sockworker_acceptconn(worker* wls, handle conn, void* userland /* is nullptr */) {
	// Still in CPU1..n
	// Possible contention again, but this time we don't care. Just try to gain
	// a lock... if not, move on without completing the action.
	write(logger, "Hey there, I've got a connection to work with!\n", FIREANDFORGET);
	// Normally, you would place the conn in a poll and track it's state of course.
	// For example, a simple HTTP server would have a structure for that handle
	// and figure out to do on each event based on what state it's in.
	// Of course, additionally, we'd register/handle all appropriate events like
	// READYREADY, EXCEPTION, etc.
	// Here though we'll just do a simple send-and-hup.
	events_on(wls->events, CONNECTION | WRITEREADY, conn, socketworker_can_write, wls);
	events_on(wls->events, CONNECTION | HUP, conn, socketworker_conn_hup, wls);
};

socketworker_can_write(handle events, handle conn, worker* wls /* this was "userland" elsewhere */) {
	// Still in CPU1..n
	// What's interresting here is that WAITFOR _usually_ doesn't actually block
	// this time because this function is _only_ fired when the handle is writable.
	// We can extend this so it's always non-blocking of course (even when the buffer is full)
	// by simply providing a WRITESIZEREADY event with an exact size for what we want to
	// write.
	write(handle, conn, "Hello there!\nAnd goodbye...", WAITFOR);
	close(handle, HARD | FIREANDFORGET); // immediate if possible, queued if necessary, killed if sys queue limits are reached.
};

socketworker_conn_hup(handle events, handle conn, worker* wls) {
	// Still in CPU1..n
	// Note that the below isn't worker-local
	events_trigger(events, USERLAND, userland_event, "whatever userland param\n" /* << this could be a struct ref or handle or whatever */);
};


arbitrary_userland_event(handle events, handle event, void* inital_bound_arg, char* triggered_arg) {
	// Haven't decided if we're in CPU1..n, or CPU0 (the registrar of the event) here... probably will be flag-controlled.
	// Whatever, now a user-defined event has been fired and we can do whatever with the args.
	// Just an example.
	write(logger, triggered_arg, FIREANDFORGET);
};

after_one_hour(handle events) {
	// Yes, yes, I know, the args aren't right, but it's a prototype example. Simply a unhindered chain-of-thought.
	write(logger, "It's been an hour and I need to go to sleep... good night.\n", WAITFOR);
	process_term();
};

So what do you guys think? Overall? See any problems (besides portability and what-not)? Think I'm an idiot? Been done before? Etc. I'm all ears.

Oh yeah... googling turned up nothing... anyone know if I'm stepping on anyone's toes by using OdinOS?

Thanks a great deal in advance for the thoughts.

EDIT: As a name... I meant using "OdinOS" as a name (not as some kind of product that I'm "using")... I quite like it for a bunch of synchronistic reasons and would love to hang onto it if no one's using it.
EDIT: Removed the reference to SMP as OSwhatever correctly pointed out that it was antithetical.
EDIT: Clarification that I don't plan to support anything that would hinder the overall design goal of this OS. Anything. As per clarification inspired by Kevin.

bwat · Post by **bwat** » Wed Nov 20, 2013 2:12 am

My not so important input:

1) It's nice to see that you are not wanting to be POSIX/BSD/LIBC/Whatever compliant. However don't be surprised if you implement large chunks of behaviour defined by some standard. My rule of thumb is 75% of an ANSI/BSI/IEEE/ISO standard is the result of hard won experience, 25% is there to protect the economic interests of current implementers and raise the bar of entrance to newcomers. Try to identify the 75% which is good and you'll be getting all that experience on the cheap.

2) As Alan Kay said "The future is not laid out on a track. It is something that we can decide, and to the extent that we do not violate any known laws of the universe, we can probably make it work the way that we want to.". I think the only thing that will stop you with this design is you. Keep on keeping on and you'll get her finished.

mrstobbe · Post by **mrstobbe** » Wed Nov 20, 2013 2:27 am

bwat wrote: 1) It's nice to see that you are not wanting to be POSIX/BSD/LIBC/Whatever compliant. However don't be surprised if you implement large chunks of behaviour defined by some standard. My rule of thumb is 75% of an ANSI/BSI/IEEE/ISO standard is the result of hard won experience, 25% is there to protect the economic interests of current implementers and raise the bar of entrance to newcomers. Try to identify the 75% which is good and you'll be getting all that experience on the cheap.

2) As Alan Kay said "The future is not laid out on a track. It is something that we can decide, and to the extent that we do not violate any known laws of the universe, we can probably make it work the way that we want to.". I think the only thing that will stop you with this design is you. Keep on keeping on and you'll get her finished.

Excellent points. Thank you.

The problem with following those standards though is that they're entirely situated around organically evolving ideas that started 50+ years ago. Blocking by nature is the defacto standard... general purpose task switching is the defacto standard. These ideas were developed before 4,6,8+ cores were even an everyday commodity let alone in server hardware, and RAM was non-existent. The problem (of course) with not following them is that software contributions will be nil. I'm quite frankly not worried about the second problem. If this OS dies because of a lack of portability, then so be it.

To your point about the future... again, excellent point. I'm not holding things in stone. I was just looking for some general feedback on the overall concept. I will reiterate thought that my "line" is turning this project into something generalized. The world doesn't need another everyday-os or even another RTOS to be embedded somewhere. The world needs immensely efficient processing for every-day server tasks on low-energy commodity hardware.

Don't get me wrong, I understand and appreciate your warning. And thank you for it.

Any immediate technical thoughts on the general idea?

bwat · Post by **bwat** » Wed Nov 20, 2013 4:02 am

mrstobbe wrote: Any immediate technical thoughts on the general idea?

It looks good enough for a high-level design. If it were me, I would just build some example benchmark applications, one for each area you've identified, and then just build the thing, using the benchmarks to evaluate each design choice you come across - proper quantitative design. If you listen too much to other people you'll end up rebuilding their designs.

bwat · Post by **bwat** » Wed Nov 20, 2013 5:40 am

mrstobbe wrote: EDIT: As a name... I meant using "OdinOS" as a name (not as some kind of product that I'm "using")... I quite like it for a bunch of synchronistic reasons and would love to hang onto it if no one's using it.

Somebody somewhere has used it before it seems. My second Alan Kay quote of the day:"The name was also a reaction against the "IndoEuropean god theory" where systems were named Zeus, Odin, and Thor, and hardly did anything. "

This is taken from: http://www.smalltalk.org/smalltalk/TheE ... ltalksName

p.s. I'm not trying to say your Odin won't do anything.

OSwhatever · Post by **OSwhatever** » Wed Nov 20, 2013 7:03 am

mrstobbe wrote:Single-process, multi-task. SMP necessary, but 4+ core/cpu highly recommended.
[*]General IRQs firing on CPU0, drivers running on CPU0, any general task/gc on the ring0 level, and the main process on CPU0.
[*]Main process "workers" are sticky to CPU1..n and there's no preemption except during panic.

To me this sounds like you are not utilizing the hardware to its full potential. First you mention that is should be SMP then certain workers are "dedicated" to specific CPUs which contradicts the whole SMP paradigm. The point of SMP is to spread the load evenly in order to utilize the processing power as much as possible. Also most interrupt controllers on SMP systems are designed to pick the most appropriate CPU so usually that job is already done for you.

Operating systems where you "stick" certain tasks to a specific CPU exist and they are often used in safety critical systems. SMP systems can be difficult to debug and predict and therefore predetermined CPUs can help to make the system more predictable.

The question is what you want to achieve with your operating system.

mrstobbe · Post by **mrstobbe** » Wed Nov 20, 2013 8:36 pm

OSwhatever wrote:you mention that is should be SMP then certain workers are "dedicated" to specific CPUs which contradicts the whole SMP paradigm

True... so, I suppose I just meant to say multi-cpu system [facepalm]. I'll edit the original in a sec.

OSwhatever wrote:To me this sounds like you are not utilizing the hardware to its full potential. [snip] The point of SMP is to spread the load evenly in order to utilize the processing power as much as possible. Also most interrupt controllers on SMP systems are designed to pick the most appropriate CPU so usually that job is already done for you.

The idea in this design is that CPU0, being the most underutilized by nature, picks up all the hardware level work, leaving the rest of the CPUs to focus on userland tasks unmolested. Hypothetically they then could become almost perfectly CPU bound as work demands, but would be extremely energy efficient while more "idle". In a general purpose SMP design, there is tons of context switching going on even in a mostly "idle" scenario, which is expensive in terms of energy, and a process (or processes) never get the opportunity to fully utilize a CPU because they are constantly being preempted at that point. This system would be for running something like ngnix with the exact number of workers that the system can efficiently handle. The load-balancing would be handled by the producer handler in the kernel on CPU0, so I don't think that would necessarily be more expensive (probably quite cheaper actually) than leaving it entirely up to userland like what is currently happening in server software (think nginx again). I could be wrong though.

Am I totally missing anything about the point you're making?

The interrupt "shuttling" nature of this design (worker->(maybe kernel)->driver->kernel->(maybe driver)->worker) might be expensive though... probably more expensive than a monolithic kernel allowed to use the interrupts on each CPU as needed. I'm not sure though. I really need to do some research into what's been experimented with already along these lines.

Any other thoughts, details, comments, suggestions? Did you have a chance to look the whole thing over or did you just stop at that quoted point? I really appreciate the feedback.

mrstobbe · Post by **mrstobbe** » Wed Nov 20, 2013 9:01 pm

Oh yeah, one more important point about why I think the mono-process/multitask design with the tasks sticky to each CPU will be more efficient: correct me if I'm wrong, but doesn't a CPU's internal caching (TLB, opcode, and mem caches) and most (if not all???) of it's branch prediction accuracy get thrown out the window every time a context switch happens?

Once again, thanks for the input.

Brendan · Post by **Brendan** » Wed Nov 20, 2013 10:23 pm

Hi,

mrstobbe wrote:Oh yeah, one more important point about why I think the mono-process/multitask design with the tasks sticky to each CPU will be more efficient: correct me if I'm wrong, but doesn't a CPU's internal caching (TLB, opcode, and mem caches) and most (if not all???) of it's branch prediction accuracy get thrown out the window every time a context switch happens?

That depends.

For TLB, newer CPUs have "address space IDs" and (if the OS uses this feature) the TLB contents aren't thrown away when virtual address spaces are switched. For older CPUs or OSs that don't use this feature, TLB contents (excluding pages that were marked as "global") are invalidated. For CPU's data caches, these are only ever invalidated when you use the "WBINVD" or "INVD" instruction to explicitly invalidate them (and you should never use these instructions or need to use them). Things like trace caches and branch prediction is different for different CPUs.

Also; most CPUs do things "out of order" to help hide any delays caused by cache misses, TLB misses, branch mispredictions, etc (so if one instruction can't be done yet, the CPU tries to keeps itself busy doing other instructions). Some cores use hyper-threading to hide these delays even better (if one logical CPU can't do anything, then keep the core busy by executing more instructions from the other logical CPU). In either case the hardware may hide some or all of the penalties.

Finally; there's a severe difference in magnitudes here. For example, to avoid a task switch that might cause a few microseconds of delays, you're considering wasting entire CPUs for many seconds.

Cheers,

Brendan

mrstobbe · Post by **mrstobbe** » Wed Nov 20, 2013 10:33 pm

Brendan wrote:That depends. [... snip]

All of that makes sense (although, in terms of mem caching, a mem cache is implicitly invalid once a different task starts doing stuff with memory, right?). I didn't think about hyperthreading, but then that would just be n number of workers per CPU at that point. Preemption would have to happen under that condition, so I'll think about that some more.

Brendan wrote:Finally; there's a severe difference in magnitudes here. For example, to avoid a task switch that might cause a few microseconds of delays, you're considering wasting entire CPUs for many seconds.

Wait, what? Explain if you could please. I can't even remotely envision this at all.

Brendan · Post by **Brendan** » Wed Nov 20, 2013 10:40 pm

Hi,

Just a few random notes..

"Multi-tasking" is running multiple processes at the same time, and doesn't make any sense for a "single-process" OS. I think you mean "single-process with multi-threading and a limit of one thread per CPU".

"Much more complicated" and "much more elegant" are opposites. It's probably a bad idea to have an impossibility at the heart of your kernel.

The idea of a micro-kernel is to isolate drivers from each other and other processes (and the kernel). With drivers running at CPL=1 or 2 they can trash a process' code and data, and (by loading each others segments) trash each other. What you've described is a monolithic kernel with drivers running at CPL=1 or 2 (and not a micro-kernel at all). Please note that segmentation sucks (it's slow on 32-bit 80x86 and not supported on 64-bit 80x86 or anything else).

For critical systems, "single-process" means severe downtime for maintenance (you can't do backups, mount new file systems, update software, etc. in the background, and have to take the critical system offline to use other processes). It also means that you can't have a "keep alive" process (e.g. a process that monitors another process, and restarts the other process if it crashes).

For IRQ handling; with all IRQs (and all device drivers) limited to a single CPU you won't be able to handle the load of high speed networking (e.g. two or more gigabit ethernet cards running near max. bandwidth); which will probably make the OS "undesirable" for certain things (e.g. HPC).

I think the OS you were thinking about is BareMetal OS.

Cheers,

Brendan

Brendan · Post by **Brendan** » Wed Nov 20, 2013 10:57 pm

Hi,

mrstobbe wrote:
Brendan wrote:That depends. [... snip]
All of that makes sense (although, in terms of mem caching, a mem cache is implicitly invalid once a different task starts doing stuff with memory, right?).

"Invalidating" means the cache is made empty (all data it contained purged). What you're thinking of is "valid but not needed" (where cache is full of data and is not explicitly or implicitly invalid).

The end result is still cache misses. However, this situation may not happen (e.g. cache large enough to hold data from several tasks) and it may happen without any task switches involved (e.g. cache too small to hold one task's data).

mrstobbe wrote:
Brendan wrote:Finally; there's a severe difference in magnitudes here. For example, to avoid a task switch that might cause a few microseconds of delays, you're considering wasting entire CPUs for many seconds.
Wait, what? Explain if you could please. I can't even remotely envision this at all.

Ok, imagine a computer being used for a HTTP server. There's 4 CPUs and 4 threads, and all CPUs spend 95% of their time waiting for network and/or waiting for disk.

Now imagine you also want to run an FTP server too. You've got 4 CPUs that are 95% wasted, but you can't use that wasted CPU time for the FTP server because the OS can't handle that; so you have to go and buy a complete second computer so that the FTP server can waste 95% of 4 more CPUs.

To avoid wasting a tiny little bit of CPU time (with task switches), you're wasting massive truckloads of CPU time.

Cheers,

Brendan

mrstobbe · Post by **mrstobbe** » Wed Nov 20, 2013 11:19 pm

Brendan wrote:"Multi-tasking" is running multiple processes at the same time, and doesn't make any sense for a "single-process" OS. I think you mean "single-process with multi-threading and a limit of one thread per CPU".

I thought that was obvious in both the description and the details. You were confused about it? Think other people will be too? Should I rewrite everything in the original to call them "threads" instead of "workers" or "jobs"? (serious question... not sarcasm).

Brendan wrote:"Much more complicated" and "much more elegant" are opposites. It's probably a bad idea to have an impossibility at the heart of your kernel.

What I was trying to say was powerful in terms of API. I'll change the original.

Brendan wrote:The idea of a micro-kernel is to isolate drivers from each other and other processes (and the kernel). With drivers running at CPL=1 or 2 they can trash a process' code and data, and (by loading each others segments) trash each other. What you've described is a monolithic kernel with drivers running at CPL=1 or 2 (and not a micro-kernel at all).

Well, the idea was to not give them access to each other's memory space, just as they wouldn't have access to the kernel's... you're saying that's not possible? I've never played around with ring1-2. I also was not planing to give them each a segment, but do the standard thing of managing cr3 and rely on paging (similar to ring3 but with slightly higher privileges to do certain things). Worst case I can make them ring3... after all, CPU0's sole job is to be the classic preemptive multitasker and manage stuff.

Second follow up though... presuming driver isolation here, how is this not a microkernel?

Brendan wrote:Please note that segmentation sucks (it's slow on 32-bit 80x86 and not supported on 64-bit 80x86 or anything else).

Good to know.

Brendon wrote:For critical systems, "single-process" means severe downtime for maintenance (you can't do backups, mount new file systems, update software, etc. in the background, and have to take the critical system offline to use other processes). It also means that you can't have a "keep alive" process (e.g. a process that monitors another process, and restarts the other process if it crashes).

Well, modern infrastructures are distributed, load-balanced, and fault-tolerant. That's the use case here... out of 5 servers, one can easily be "down" without impacting services in the slightest (maybe for a handful of seconds, but you get what I'm saying). It's been a while since I've worked in an environment where that wasn't the case. The keep alive process thing... why couldn't that be a wrapped launcher for the process (like mysqld_safe?)? It would be sleeping until the process ends and never context switch. When the process ends, it could easily figure out why and react accordingly.

Brendon wrote:For IRQ handling; with all IRQs (and all device drivers) limited to a single CPU you won't be able to handle the load of high speed networking (e.g. two or more gigabit ethernet cards running near max. bandwidth); which will probably make the OS "undesirable" for certain things (e.g. HPC).

Hmmm... that could be a problem. Need to think about that more. The whole IRQ management part of this is still fuzzy in my head and I'd like to have a firmer grasp on it by the time the basics are done (I'm still working on simple/generic things like paging and clock management and the like).

Brendon wrote:I think the OS you were thinking about is BareMetal OS.

That's the one! Nice project. Definitely has potential particularly in the distributed computing realm.

Thanks,
Tyler

mrstobbe · Post by **mrstobbe** » Wed Nov 20, 2013 11:26 pm

Brendan wrote:The end result is still cache misses. However, this situation may not happen (e.g. cache large enough to hold data from several tasks) and it may happen without any task switches involved (e.g. cache too small to hold one task's data).

In modern use (depending on use of course), this seems incredibly likely to happen frequently.

Brendan wrote:Now imagine you also want to run an FTP server too.

Aaaaahhh... source of the confusion. I stated explicitly that this OS doesn't do more than one major thing. Period. It's only an HTTP server, or it's only an FTP server, or it's only a memcached server, or it's only a [enter something appropriate here] server. The FTP server doesn't seem like a good fit for this type of OS, but a high-volume HTTP server certainly does. Recommendations to help clarify that in the original?

EDIT: And to clarify a bit more... I understood what you where saying... I'm saying that the point of this OS is to be as idle as possible when idle, and to be as active as possible when active. I don't consider "down-time" to be "wasting seconds". Last time I checked down time was far from a bad thing. The energy consumption alone is a serious concern of most major server environments, and those same environments strive to segment everything exactly as I'm describing... one major purpose per server. At the same time, they want the best they can get under full-load.

Brendan · Post by **Brendan** » Thu Nov 21, 2013 2:17 am

Hi,

mrstobbe wrote:
Brendan wrote:"Multi-tasking" is running multiple processes at the same time, and doesn't make any sense for a "single-process" OS. I think you mean "single-process with multi-threading and a limit of one thread per CPU".
I thought that was obvious in both the description and the details. You were confused about it? Think other people will be too? Should I rewrite everything in the original to call them "threads" instead of "workers" or "jobs"? (serious question... not sarcasm).

I'm not too sure what it is now (see further down).

mrstobbe wrote:
Brendan wrote:The idea of a micro-kernel is to isolate drivers from each other and other processes (and the kernel). With drivers running at CPL=1 or 2 they can trash a process' code and data, and (by loading each others segments) trash each other. What you've described is a monolithic kernel with drivers running at CPL=1 or 2 (and not a micro-kernel at all).
Well, the idea was to not give them access to each other's memory space, just as they wouldn't have access to the kernel's... you're saying that's not possible? I've never played around with ring1-2. I also was not planing to give them each a segment, but do the standard thing of managing cr3 and rely on paging (similar to ring3 but with slightly higher privileges to do certain things). Worst case I can make them ring3... after all, CPU0's sole job is to be the classic preemptive multitasker and manage stuff.

Ah - I thought you meant "drivers running in their own segments; and all mapped into the single process' address space" (e.g. to avoid switching virtual address spaces completely). If you mean "drivers running in their own virtual address spaces", then just run them at CPL=3 and don't bother with CPL = 1 or 2 (you can still give them access to all IO ports by changing IOPL during task switches in this case, or only give them access to their own IO ports).

mrstobbe wrote:The keep alive process thing... why couldn't that be a wrapped launcher for the process (like mysqld_safe?)? It would be sleeping until the process ends and never context switch. When the process ends, it could easily figure out why and react accordingly.

If one process starts a second process and waits until the child process terminates, then that's 2 processes (where only one is given CPU time, but both share memory, have file handles, etc) and not a single process. Of course if you're planning to have drivers running in their own virtual address spaces (as processes) it's not really single process anyway; and you're effectively doing "multiple processes and multi-tasking, with different obscure limits on what different types of processes can do".

mrstobbe wrote:
Brendan wrote:For IRQ handling; with all IRQs (and all device drivers) limited to a single CPU you won't be able to handle the load of high speed networking (e.g. two or more gigabit ethernet cards running near max. bandwidth); which will probably make the OS "undesirable" for certain things (e.g. HPC).
Hmmm... that could be a problem. Need to think about that more. The whole IRQ management part of this is still fuzzy in my head and I'd like to have a firmer grasp on it by the time the basics are done (I'm still working on simple/generic things like paging and clock management and the like).

I'd assume that you wanted to keep IRQs away from other CPUs (and concentrate them on a single CPU) so that you could do some hard real-time thing on those other CPUs (even though SMM and power management on 80x86 will screw that up more than IRQ handling will). Because of the "single application" nature of the OS, everyone is just going to run it inside virtual machines to avoid wasting hardware resources, and their virtual machine is going to completely destroy any hard real-time thing you attempt.

mrstobbe wrote:
Brendan wrote:The end result is still cache misses. However, this situation may not happen (e.g. cache large enough to hold data from several tasks) and it may happen without any task switches involved (e.g. cache too small to hold one task's data).
In modern use (depending on use of course), this seems incredibly likely to happen frequently.

For modern use both are likely - several small processes that all fit in cache is likely (for some people, including you and your "device driver processes"); and a single process that uses so much data that it can't all fit in cache at once is also likely (for other people, including those people that need to handle so much data that it makes sense to have a dedicated computer for it).

I guess what I'm saying is that cache misses, TLB misses, etc. that are caused by task switching adds up to "negligible background noise" (irrelevant). Also note that if an OS is able to run multiple applications at once, you get the same "less task switching" benefit when the OS is only running one application; so the only thing you gain from having a "one application" limit is a "one application" limit (and no other advantage against a "one or more applications" OS).

mrstobbe wrote:
Brendan wrote:Now imagine you also want to run an FTP server too.
Aaaaahhh... source of the confusion. I stated explicitly that this OS doesn't do more than one major thing. Period. It's only an HTTP server, or it's only an FTP server, or it's only a memcached server, or it's only a [enter something appropriate here] server. The FTP server doesn't seem like a good fit for this type of OS, but a high-volume HTTP server certainly does. Recommendations to help clarify that in the original?

It was clear in the original that your OS won't do more than one major thing. It's also clear that an OS that won't do more than one major thing is only good for a niche that is so small it can be ignored.

mrstobbe wrote:EDIT: And to clarify a bit more... I understood what you where saying... I'm saying that the point of this OS is to be as idle as possible when idle, and to be as active as possible when active. I don't consider "down-time" to be "wasting seconds". Last time I checked down time was far from a bad thing. The energy consumption alone is a serious concern of most major server environments, and those same environments strive to segment everything exactly as I'm describing... one major purpose per server. At the same time, they want the best they can get under full-load.

RAM that is not being put to good use (e.g. caching something to improve performance) is wasted RAM. The worst possible situation is "all RAM is free" (which means all RAM is wasted instead of being used for anything useful).

CPUs that is are not being put to good use (e.g. doing less important work when there's no "most important work" to do) are wasted CPUs. The worst possible situation is "all CPUs idle" (which means all CPUs are being wasted).

If the point of this OS is to be as idle as possible when there's no "most important work" to do, then the point of the OS is to waste CPU time for no sane reason whenever there's no "most important work" to do.

For an example, if the computer is being used as a dedicated HTTP server, then it might have 100% CPU load at peak times and probably drop back to 25% load often, and may never go above "a daily average of 50% of CPUs wasted". That wasted CPU time could be used to do backups, install the updated software and content, defragment file systems, pre-cache/pre-process/pre-compress data, etc; so that less CPU time is wasted, downtime is reduced and the server performs better during peak time.

The same applies to a pool of computers. For example, a load balancer and a pool of 5 HTTP servers that don't waste CPU time for no reason is cheaper to purchase, cheaper to run and cheaper to maintain; while a load balancer, a pool of 7 HTTP servers that do waste CPU time plus 3 more "warm spares" to cover the downtime problem is just plain stupid.

Cheers,

Brendan

OSDev.org

OdinOS: I'd love some design feedback

OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback

Re: OdinOS: I'd love some design feedback