OSDev.org

Posted: **Sun Jul 12, 2015 5:43 pm**

embryo2 wrote:If the words "a lot of subtle tricks" mean just try-catch-finally blocks, then I strongly disagree. It introduces additional level of indentation and a few extra lines of code, but I just can't call it using words "subtle" or "tricks" (or even "a lot" of such mess).

And Java, like Rust, (or is it more correct to say "Rust like Java"?) also supports so called runtime exceptions, that aren't require a developer to write exception handlers.

You're missing the point, and clearly have never thought about exception safety before. Exception safety means more than just using finally blocks. It means being aware of every point at which your code could be interrupted by an exception, and making sure that such an event would not leave anything in a corrupted state.

Java runtime exceptions do not help you here. They are the problem: because they can be caught at any point in the call stack, they can easily expose state corrupted by non-exception-safe code. Rust panics, on the other hand, cannot be caught anywhere- they abort the entire thread. Combined with Rust's rules for how memory can and cannot be shared between threads, this means Rust programs do not need to worry about exception safety- corrupted state is impossible to observe.

embryo2 wrote:How something like C style union can force a developer to write an exception handler?

Rust's "unions" are actually extended enums, and the language does not allow access to their contents without checking which variant they are using. For example, using the result of a function returning a Result looks like this (simplified for clarity):

Code: Select all

fn open(path: Path) -> Result<File>;

match open("foo.txt") {
    Ok(file) => { /* file is usable in here */ },
    Err(e) => { /* file could not be opened, error information is in e */ },
}

You cannot simply say "let file = open("foo.txt");" and ignore any errors, because the only way to get the actual return value is through a switch on which variant was returned. Now, lest you cry about verbosity, there is a macro "try!" that explicitly checks for errors and returns them to the caller for you, which thus allows the programmer to avoid handling them while still forcing them to decide whether or not to do so:

Code: Select all

fn possibly_fail() -> Result<i32> {
    let file = try!(open("foo.txt")); // just pass errors back to the caller
    let x = try!(something_that_could_fail(file));
    let z = match try_something(x) {
        Ok(y) => use(y),
        Err(e) => { /* explicitly handle this error instead of passing it back to the caller */ },
    };
    Ok(y)
}

embryo2 wrote:
Rusky wrote:Another example that Brendan has brought up before is passing in error handling continuations to functions, so that callers are forced, at compile time, to provide handlers for each error condition.
Java way of hiding those additional parameters is more concise.

That's the entire problem with exceptions- they hide errors so things fail at runtime instead of forcing the programmer to decide what should be done with them. There are better ways to make things concise than to bury your head in the sand and hope someone farther up the call stack knows what to do.

Posted: **Sun Jul 12, 2015 11:07 pm**

Hi,

embryo2 wrote:
Brendan wrote:I mean, the OS starts another instance of the service "somewhere" (on any computer that is still part of the group of computers/cluster). This could be either local or remote. Note: it doesn't matter much where, except for performance. For performance, "where" is a compromise between CPU load and communication overhead; and I'll be adding a little information into the executable's header (e.g. "average amount of processing per request") so that the OS can make a more effective decision. For example; if something uses a massive amount of CPU time but has very little communication then you'd want it on whichever computer has the least work to do (regardless of communication costs), and if something doesn't use much CPU time but communicates a lot then you're going to want it "close" to whatever uses it instead (regardless of existing load on the computer).

If there's no binary for the service on the computer that's chosen; then the OS uses a distributed file system so the computer will just have to fetch the file. More specifically, the computer will try to fetch an "already optimised for this computer" native version of the executable, and if that doesn't exist it will try to fetch the "portable byte-code" version of the executable file and compile it for the specific computer. In either case it will try to cache the executable locally after it's been fetched (to avoid fetching/compiling if its needed again later).
Ok, I see your point. But this algorithm is the exactly the helper I was talking about in previous message. So, instead of "elegant and clean" messaging solution you will have a set of implementations of different helper algorithms, that handles all special cases by defining one algorithm per each case. In the example above we see the cases for service deployment and start-up mixed with load optimization job and the need for software repository. And for each case you have to invent and implement a helper algorithm.

Most of the stuff I mentioned above is just code to start a process, which is more complex in my case because of additional features that typical OS's lack (trying distribute load across multiple computers intelligently, executables as portable byte-code rather than native binaries). However; those additional features have nothing to do with fault tolerance or redundancy or messaging anything else that we've been talking about, and it's unfair to claim my approach to fault tolerance/redundancy/messaging is not clean or elegant because of things that have nothing to do with fault tolerance/redundancy/messaging.

As far as programmers writing applications/processes are concerned, the fault tolerance/redundancy is clean and elegant because they don't have to worry about any of it (unlike alternative approaches, where programmers writing applications/processes need to worry about it, and then screw it up half the time).

embryo2 wrote:
Brendan wrote:If the user never plugs the cable back in then the application won't recover successfully (because the OS lacks the ability to start a new instance of the service that the application requires).
Yes, the ability to start a backup service is a nice thing. But it has nothing to do with messaging, because the messaging is just one of possible transport protocols below the service level.
Brendan wrote:If the user does plug the cable in then it might recover successfully; but someone is going to have to write code to handle that (e.g. catch to exception and retry) and test to make sure it works properly (the code to handle it doesn't just magically appear out of thin air). Of course in my case the code to handle it doesn't magically appear out of thin air either; but it is built into the kernel and normal/application programmers don't need to do anything.
In my case it is the VM where the handling code is located. If a developer doesn't implement an exception handler, then the VM catches the exception outside of the root thread's method (function) and ensures that there always will be required bookkeeping.

Imagine you're writing a spreadsheet application. You've got a function to calculate the value in a cell, which sends the formula in that cell and the values that the formula depends on to an arbitrary precision maths library and gets the result. The arbitrary precision maths library fails due to a network time-out (because the user unplugged the network cable); and you haven't written any code to handle that failure at all. In this case you're trying to pretend that the VM will automatically recover and the application will behave as if the arbitrary precision maths library didn't fail.

There is only 2 possibilities here. Either you're lying and the application can't continue to operate normally in this case (most likely), or the VM has extra code to restart the arbitrary precision maths library and retry the operation. Note that "VM has extra code to handle it" could work; but in that case you'd be doing something that's essentially the same as what I'm doing (except the extra code is in the VM and not in the kernel), which means that you're complaining about my approach while using the same approach.

embryo2 wrote:
Brendan wrote:What you can do is minimise the amount of code that processes depend on, that could have bugs.
The minimization only is not a silver bullet. More complex solutions with a lot of code can be more safe than solutions with lesser complexity and code volume. For example a database plus it's client represent a lot of code in total, but such solution is much safer than a freshly invented file based storage, for example.

Wrong. You're using an extremely misleading/biased comparison (mature code vs. freshly invented) to make false assumptions.

If the simpler/smaller code is tested by the same amount of people for the same amount of time, then it will be less likely to have bugs/problems than that complicated/bloated alternative.

embryo2 wrote:
Brendan wrote:it still works (even with only 1 computer); it's just less effective. For example (for the "1 computer" case) it would still guard against transient faults, and still guard against RAM faults (because each instance of the service is using different physical pages of RAM); and if the computer has 2 or more CPUs it can guard against one CPU failing (even if that CPU fails while an instance of the service is using it and that instance of the service has to be terminated).
So, it works just partially, for some hardware or some special kinds of problems.

Yes; and this is completely different to not working at all in the "1 computer" case, and (even with reduced effectiveness) may still be desirable for cases where reliability is more important than performance.

embryo2 wrote:
Brendan wrote:The problem with (e.g.) RPC is that it's synchronous (designed to mimic the behaviour of function calls), which means that you can't just send 10 requests to 10 different computers and do other work while you're waiting for the replies to get all 11 computers doing useful work in parallel. To achieve that with RPC you'd have to spawn 10 new threads and do one "request+reply" RPC on each thread, which ends up causing more overhead (thread creation, thread switching, etc) and is a very ugly (e.g. needing locks and stuff to manage state) and is much more error prone.
In fact your solution also requires locks and stuff to manage state. Every time your message is posted the lock is required to ensure the queue structure (the state) is not corrupted by multiple threads. So, here we see the standard solution is on par with yours. Also the thread creation do not required because of the widely used thread pooling approach. And thread switching also should be implemented in your solution (just because it has threads).

Yes, my solution has a few locks in the kernel that normal programmers don't have to care about, so that normal programmers aren't forced to deal with the extra threads, the extra locks and the extra complexity that makes RPC significantly inferior.

embryo2 wrote:
Brendan wrote:Note that most distributed systems suck - e.g. they require manual configuration and assigned roles, and don't dynamically shift things around to cope with hardware failures or to balance load.
Cloud computing offers dynamic reconfiguration variants. And automatic load balancers are used everywhere. So, your variant isn't the best in this area.

Where can I download the "cloud computing" OS that lets all applications use all of the computers on my LAN, with almost no configuration at all?

embryo2 wrote:
Brendan wrote:if the OS detects that a user's mouse is due for cleaning it gets added as a low priority job in the "maintenance tool", and when admin/maintenance staff do that job they tell the OS it's been completed.
If an OS is able to automatically detect the need for mouse cleaning, then it should detect the change after the mouse was cleaned. And by the way, such detection isn't a trivial thing.

The OS can't detect when the mouse is actually dirty (the hardware lacks that capability). The mouse driver just keeps track of the total distance the mouse has been moved in its lifetime, and the OS detects when "total_distance_moved - distance_at_last_cleaning > threshold".

Note that the maintenance tool is mostly it's intended for larger companies with many users (e.g. where there actually is on-site maintenance people). The idea is that a maintenance person spends their day doing whatever the OS's maintenance tool tells them to do; and the OS automates everything it can to avoid the need to employ someone to manage the maintenance people.

Cheers,

Brendan

Posted: **Mon Jul 13, 2015 1:40 pm**

Rusky wrote:Exception safety means more than just using finally blocks. It means being aware of every point at which your code could be interrupted by an exception, and making sure that such an event would not leave anything in a corrupted state.

Well, do your words mean that a developer should be careless while writing a complex application? Or do you know a way of writing code without paying attention to all possible execution paths?

At this point I see just some critics about exceptions, but there's still no viable alternative.

Rusky wrote:Java runtime exceptions do not help you here. They are the problem: because they can be caught at any point in the call stack, they can easily expose state corrupted by non-exception-safe code. Rust panics, on the other hand, cannot be caught anywhere- they abort the entire thread. Combined with Rust's rules for how memory can and cannot be shared between threads, this means Rust programs do not need to worry about exception safety- corrupted state is impossible to observe.

Here you claim that Rust magically manages to roll back or commit whatever possible state of a program. And it does it without programmer's efforts. Nice case for AI, but without AI it seems that you overestimate the power of a programming language.

Rusky wrote:Rust's "unions" are actually extended enums, and the language does not allow access to their contents without checking which variant they are using.

Ok, now I see that your message was about the Rust, but it was unclear in it's original form. May be Rust has some interesting language constructs, but unfortunately I haven't studied it, so now I can't compare exception handling in Rust and other languages.

Rusky wrote:That's the entire problem with exceptions- they hide errors so things fail at runtime instead of forcing the programmer to decide what should be done with them. There are better ways to make things concise than to bury your head in the sand and hope someone farther up the call stack knows what to do.

Exceptions are a part of method declaration in Java, so, it is impossible to hide them. But it is possible not to catch successors of the RuntimeException. If a programmer doesn't use runtime exceptions, then he is always forced to do something with them by Java compiler, so, nobody buries the head and the situation is clear and controllable.

Posted: **Mon Jul 13, 2015 2:33 pm**

Brendan wrote:Most of the stuff I mentioned above is just code to start a process, which is more complex in my case because of additional features that typical OS's lack (trying distribute load across multiple computers intelligently, executables as portable byte-code rather than native binaries). However; those additional features have nothing to do with fault tolerance or redundancy or messaging anything else that we've been talking about, and it's unfair to claim my approach to fault tolerance/redundancy/messaging is not clean or elegant because of things that have nothing to do with fault tolerance/redundancy/messaging.

Your words were about messaging that makes the implementation of fault tolerance/redundancy elegant. My answer was about separation of concerns, the messaging is just a protocol, the redundancy is usable but requires helpers, fault tolerance can be implemented in another way and so on.

Brendan wrote:As far as programmers writing applications/processes are concerned, the fault tolerance/redundancy is clean and elegant because they don't have to worry about any of it (unlike alternative approaches, where programmers writing applications/processes need to worry about it, and then screw it up half the time).

Yes, the shift of the corresponding code on the side of the kernel or some other framework can help. But the complexity of possible situations prevents you from implementing a solution for all faults, so, a developer still needs the code that handles such cases. And even if your efforts relieve some burden from a developer but I'm still afraid of the situations when developer will be required to understand the internals of your solution just to be able to write an efficient handler of a fault. For example, for an application to behave correctly instead of failing every time when a file isn't found, a developer should understand how your solution will handle such situation (would it connect to the internet and ask the Google about the file?) and how to stop it from doing wrong things.

Your fault tolerance should be defined very strictly, so, that a developer has no chance to be confused by the way your system will behave.

Brendan wrote:Imagine you're writing a spreadsheet application. You've got a function to calculate the value in a cell, which sends the formula in that cell and the values that the formula depends on to an arbitrary precision maths library and gets the result. The arbitrary precision maths library fails due to a network time-out (because the user unplugged the network cable); and you haven't written any code to handle that failure at all. In this case you're trying to pretend that the VM will automatically recover and the application will behave as if the arbitrary precision maths library didn't fail.

I pretend that after the cable is plugged in the application will behave as if the arbitrary precision maths library didn't fail. But when the cable is unplugged there would be something like a message with some technical information (network error or something alike). And in many cases no additional code is required on the VM side neither on the application side.

May be with your additional functionality the set of cases without developer's intervention can be extended, but the set in case of my approach is already big enough for it to cover 95% of possible variants (may be with some slight developer attention).

Brendan wrote:Wrong. You're using an extremely misleading/biased comparison (mature code vs. freshly invented) to make false assumptions.

Ok, if you insist then here is another version - imagine one page of a code with 20 nested ifs and 100 pages of a code with prints only. Both variants can have the same number of bugs despite of the code size.

Brendan wrote:Where can I download the "cloud computing" OS that lets all applications use all of the computers on my LAN, with almost no configuration at all?

You have no need for downloading anything or even for the computers and your LAN, just because the cloud computing already has it available online (yes, you need one PC and some kind of internet connection, but that's all you need).

And more seriously - the Amazon's or Google's solutions have the means that allow all applications to use all cloud's computers. And I suppose the configuration process is also automated a lot, but may be not as much as your approach potentially can deliver.

Brendan wrote:The OS can't detect when the mouse is actually dirty (the hardware lacks that capability).

It can use something like pattern matching for detection of "non-standard" mouse behaviour. But it's not a trivial task.

Brendan wrote:The idea is that a maintenance person spends their day doing whatever the OS's maintenance tool tells them to do; and the OS automates everything it can to avoid the need to employ someone to manage the maintenance people.

Unfortunately, automation of such narrow areas like mouse cleaning without understanding of the biggest context of organization's maintenance requirements often leads to fragmentary and inefficient systems. But may be if an organization has clear understanding of the maintenance and sees your mouse cleaning fits the global picture, then it can be helpful.

Posted: **Mon Jul 13, 2015 8:46 pm**

Hi,

embryo2 wrote:
Brendan wrote:As far as programmers writing applications/processes are concerned, the fault tolerance/redundancy is clean and elegant because they don't have to worry about any of it (unlike alternative approaches, where programmers writing applications/processes need to worry about it, and then screw it up half the time).
Yes, the shift of the corresponding code on the side of the kernel or some other framework can help. But the complexity of possible situations prevents you from implementing a solution for all faults, so, a developer still needs the code that handles such cases. And even if your efforts relieve some burden from a developer but I'm still afraid of the situations when developer will be required to understand the internals of your solution just to be able to write an efficient handler of a fault. For example, for an application to behave correctly instead of failing every time when a file isn't found, a developer should understand how your solution will handle such situation (would it connect to the internet and ask the Google about the file?) and how to stop it from doing wrong things.

Your fault tolerance should be defined very strictly, so, that a developer has no chance to be confused by the way your system will behave.

You're using a very loose definition of "faults". To me, a failure is when software doesn't do something it should or does do something it shouldn't (e.g. crash). If a file isn't found then software (e.g. the VFS) should return a "file not found" error. In other words, returning a "file not found" error is correct behaviour and not a failure. Applications/processes are expected to handle correct behaviour (e.g. VFS returning a "file not found" error) and fault tolerance is never intended to guard against correct behaviour.

Also note that fault tolerance (the ability to tolerate faults) and fault immunity (the ability to be immune from faults) are related, but are also very different. Something with fault tolerance isn't immune to all faults and can still fail. Something with fault immunity will never fail under any circumstances (regardless of how unlikely). Fault immunity is impossible (e.g. if you have 1000 computers there's still an incredibly tiny chance that all 1000 computers will fail at the same time). I'm only providing fault tolerance.

embryo2 wrote:
Brendan wrote:Wrong. You're using an extremely misleading/biased comparison (mature code vs. freshly invented) to make false assumptions.
Ok, if you insist then here is another version - imagine one page of a code with 20 nested ifs and 100 pages of a code with prints only. Both variants can have the same number of bugs despite of the code size.

So you're saying that for a fair comparison (same quality control, same code maturity, same code size); complex code (e.g. the code in high performance VM that does JIT compiling and optimisation) is more likely to have bugs and security vulnerabilities than simpler code (e.g. the code you find in typical desktop applications)?

embryo2 wrote:
Brendan wrote:Where can I download the "cloud computing" OS that lets all applications use all of the computers on my LAN, with almost no configuration at all?
You have no need for downloading anything or even for the computers and your LAN, just because the cloud computing already has it available online (yes, you need one PC and some kind of internet connection, but that's all you need).

How do I plug 100 keyboards and 100 monitors into this "cloud"? When the cloud tells me networking is a bottleneck, and I just install some extra network cards in my cloud?

Mostly all I'm saying here is that "cloud" is intended for an extremely different use case. I want all the my computers on my LAN to work together (with or without any Internet connection), and I don't want to pay some third party for processing time. I want the typical ("one computer per user plus a server") office to shift to my OS to reduce hardware costs ("one computer per pair of users with no server"). I want this to work for mobile devices (laptops, tablets, smartphones) where (e.g.) if you're using a weak/low power tablet while you're walking around and happen to wander into wifi range the OS automatically shifts processing to the faster systems and the software you were already running starts running much faster (and if you walk out of wifi range the applications just start running slower without any "network connection lost" failures). I also want to do things like let users send running applications to each other (e.g. you open a word processor document, write half of a letter, then send the application "as is" to another user so they can finish writing the letter); and to have multi-user applications (where 10 programmers working on the project can all use the same IDE at the same time). For power management, I want the OS to automatically shutdown computers when they aren't needed and (using "wake on LAN") automatically start them again when they are needed.

I also want all of this to be as close as possible to "zero configuration". For example, if you buy 20 new computers with no OS on them at all; you should be able to plug them into your network and do nothing else; where those 20 new computers boot from network, automatically become part of the cluster, and automatically start doing work. Note that "as close as possible to zero configuration" does not imply "zero configuration". Some things; like partitioning disks, formatting file systems and installing the OS on the hard drives; won't be done automatically by the OS (e.g. because the OS can't know if the data that is on the hard disks is important or not and therefore won't/shouldn't automatically wipe the hard drives) and it will involve (e.g) a remote administrator clicking a few buttons on some sort of "cluster management" tool.

embryo2 wrote:
Brendan wrote:The OS can't detect when the mouse is actually dirty (the hardware lacks that capability).
It can use something like pattern matching for detection of "non-standard" mouse behaviour. But it's not a trivial task.

For existing mouse hardware, I very much doubt that this is possible without an unacceptable number of false positives and/or false negatives (and if its too unreliable it's just going to annoy people instead of being useful). It would be possible to design a mouse with special hardware that does reliably detect if/when cleaning is needed, and if that ever exists then the driver for that mouse will be able to use it instead of the "distance since last cleaned" estimation.

embryo2 wrote:
Brendan wrote:The idea is that a maintenance person spends their day doing whatever the OS's maintenance tool tells them to do; and the OS automates everything it can to avoid the need to employ someone to manage the maintenance people.
Unfortunately, automation of such narrow areas like mouse cleaning without understanding of the biggest context of organization's maintenance requirements often leads to fragmentary and inefficient systems. But may be if an organization has clear understanding of the maintenance and sees your mouse cleaning fits the global picture, then it can be helpful.

There would also be a way for normal users to add a "trouble ticket" to the system; for whatever the OS can't automatically detect (even if it's "my office chair broke!").

I really don't think it'd be hard to design a maintenance tool that suits almost everyone; and if people don't like my tool then they're able to suggest improvements, or replace any or all of the "pieces that communicate" with their own alternative piece/s; and they're also free to go back to using crappy OSs that give them nothing useful.

Cheers,

Brendan

Posted: **Tue Jul 14, 2015 1:39 am**

embryo2 wrote:Well, do your words mean that a developer should be careless while writing a complex application? Or do you know a way of writing code without paying attention to all possible execution paths?
...
Here you claim that Rust magically manages to roll back or commit whatever possible state of a program. And it does it without programmer's efforts.
...
Exceptions are a part of method declaration in Java, so, it is impossible to hide them. But it is possible not to catch successors of the RuntimeException.

People make mistakes, whether they're careless or not. Giving the language the ability to point out their mistakes before they release their program helps alleviate that problem.

My point is that Rust has two error handling modes. Only unrecoverable exceptions (called panics) are allowed to unwind the program in the middle, and they will always run all destructors, so the you can't catch the exception partway through the rollback. This means programmers don't have to care about exception safety, because the language takes care of it in a general way.

Java, on the other hand, allows you to invisibly pass exceptions on up the call stack. If the programmer doesn't check the documentation or method signature, all runtime exceptions become invisible, and adding "throws Exception" makes all the rest invisible. Adding "throws Exception" is not a specific enough choice- it affects the entire method, while Rust's "try!" macro only effects a single potential error. This is especially bad as the method gets modified over time and calls are added or removed.

Further, Rust's "Result" type is much more composable than Java exceptions, because it's part of the normal data flow, rather than in a side channel requiring separate syntactic structures. Results have methods like map, and_then, or_else, unwrap_or, etc. (https://doc.rust-lang.org/std/result/enum.Result.html) that make handling them much more succinct than exceptions. For example, instead of this multiline monstrosity:

Code: Select all

ResultType result;
try {
    result = compute_result();
} catch (SomeException) {
    result = default_value;
}
use(result);

You could just do this:

Code: Select all

use(compute_result().unwrap_or(default_value));

Posted: **Tue Jul 14, 2015 3:42 am**

@Brendan:

What would you think if a big company (e.g. Microsoft) thought that your ideas are so potential that they wanted to invest in them? They would give you:

Technical leadership. You would lead the project.
Long contract. Minimum of 10 years and you would be able to fail the project without any consequences.
Significant financial support. Practically you would be financially independent for the rest of you life even if the project failed.
Resources. You would have huge resources at hand if you wanted something implemented. You would be able to set the acceptable quality level. You would not need to worry about human resource management or anything.
Contacts. OEM vendors would ship some computer models with the new OS pre-installed. Some hardware manufacturers would need to write native drivers (based on some kind of OEM agreement).

In short; in the worst case your project would just fail the big exceptions. However, it would still be a well-recognized and real OS. There would be a small market niche in any case. Perhaps this sounds good but there would be a teeny-tiny detail: you would not own it.

Posted: **Tue Jul 14, 2015 5:50 am**

Hi,

Antti wrote:What would you think if a big company (e.g. Microsoft) thought that your ideas are so potential that they wanted to invest in them? They would give you:

Technical leadership. You would lead the project.

Long contract. Minimum of 10 years and you would be able to fail the project without any consequences.

Significant financial support. Practically you would be financially independent for the rest of you life even if the project failed.

Resources. You would have huge resources at hand if you wanted something implemented. You would be able to set the acceptable quality level. You would not need to worry about human resource management or anything.

Contacts. OEM vendors would ship some computer models with the new OS pre-installed. Some hardware manufacturers would need to write native drivers (based on some kind of OEM agreement).
In short; in the worst case your project would just fail the big exceptions. However, it would still be a well-recognized and real OS. There would be a small market niche in any case. Perhaps this sounds good but there would be a teeny-tiny detail: you would not own it.

I only really care about the design of it, and a few related things (like ensuring formal standardisation processes are put in place for file formats, messaging protocols, etc). Implementations of that design are just a means to an end; and it doesn't really matter who owns the initial implementation (or any subsequent implementations).

Cheers,

Brendan

Posted: **Tue Jul 14, 2015 9:26 am**

My posts are written at an unacceptable literacy level, e.g. "the big exceptions" should have been "the big expectations.

Brendan wrote:Implementations of that design are just a means to an end; and it doesn't really matter who owns the initial implementation (or any subsequent implementations).

This is a little bit surprising because I thought perfect implementations are one of your ambitions (given how much attention you have paid to the 80x86 platform). However, it really does make sense to think the design is the most valuable property. For example, who cares about the initial C compiler implementation? The answer to that is rather obvious.

Posted: **Wed Jul 15, 2015 11:21 am**

Brendan - this is a topic I'm interested in.

My wife is a graphic designer and she deals with the Pantone Matching System for printed media. She has one of these monitor calibrator things (http://www.xrite.com/i1display-pro) that measure the wavelengths outputted by the monitor to create a monitor profile, so what she sees on the printed output matches what her monitor shows. The main problem is that monitors are backlit, whereas printed media isn't and the colour on printed media depends on ambient light. In the end, you end up with 3 colour profiles - for the input media (the camera), the workstation monitor, and the printer. Since you're just dealing with screens it's a simpler problem, and X-Rite (the company that now owns Pantone) might have some kind of gadget for measuring your monitor output.

I've thought about describing colour in a universal way, and I've also thought about using CIE XYZ. The 'cleanest' (but not the most efficient) solution is to use XYZ for all internal colour representations, and do the conversion during the final stages (in the monitor or printer driver.)

My concerns;

A large portion of the CIE XYZ colour space is imaginary, in contrast with RGB where every possible colour value creates a unique colour (HSV also suffers this problem - a V of 0 is output as black, regardless of H and S.)
The blending algorithms are somewhat more difficult in this colour space. Imagine you're writing a 3D game and two light sources are shining on the same wall, you can easily do final colour = (wall colour * light a) + (wall colour * light b) in RGB. Transparency is final colour = (source a * 1/alpha) + (source b * alpha).
Performance. If you're watching a video or playing a real time 3D game, can you convert 1920*1080*60 pixels per second? I'd imagine for an 8-bit video game, you could simplify the problem by generating a lookup table for all 256 colours - but what if the screen is shared across multiple monitors? (A video wall or a duel-monitor workstation.)

There have also been RGB colour space standards (like sRGB, Adobe RGB), and many monitors and HDTVs are sRGB compliant, and I've thought the theory was that a colour on any two sRGB screens should be identical. My monitor has an sRGB mode, but it also lets me change the brightness/contrast/colour temperature (which obviously alters the output colour) while claiming to be in sRGB mode, so that throws out that theory.

Posted: **Thu Jul 16, 2015 12:58 am**

Hi,

MessiahAndrw wrote:My wife is a graphic designer and she deals with the Pantone Matching System for printed media. She has one of these monitor calibrator things (http://www.xrite.com/i1display-pro) that measure the wavelengths outputted by the monitor to create a monitor profile, so what she sees on the printed output matches what her monitor shows. The main problem is that monitors are backlit, whereas printed media isn't and the colour on printed media depends on ambient light. In the end, you end up with 3 colour profiles - for the input media (the camera), the workstation monitor, and the printer. Since you're just dealing with screens it's a simpler problem, and X-Rite (the company that now owns Pantone) might have some kind of gadget for measuring your monitor output.

The fact that your wife needs to do this calibration in the first place (and the fact that colours aren't automatically the same on all devices wherever possible) is appalling. It represents an unacceptable failure in the IT industry.

For monitors, EDID provides the information needed to characterise the monitor's colour profile and convert from a known/standardised colour space to whatever the monitor's colour space happens to be, and has provided this information since it was first designed (about 15 years ago). For cameras, scanners and printers, I haven't researched it properly; but do know that for USB cameras the USB Video Class specification does have a "Color Matching Descriptor" and do know that HP's "Printer Control Language" has a lot of stuff for colour management (support for multiple colour spaces including device independent colour spaces, ICC profiles, etc).

Basically, as far as I can tell, hardware is not to blame for the unacceptable failure in the IT industry and it's almost entirely software's fault.

MessiahAndrw wrote:My concerns;

A large portion of the CIE XYZ colour space is imaginary, in contrast with RGB where every possible colour value creates a unique colour (HSV also suffers this problem - a V of 0 is output as black, regardless of H and S.)

For a system that uses 3 primary colours there are only 2 possibilities:

it is able to represent all colours that humans can see and there is some wasted space for imaginary colours.
it's unacceptable as a device independent colour space (incapable of representing all colours that humans can see).

For the only acceptable case; at least 2 of the primary colours must be imaginary, and the amount of space wasted by imaginary colours can be reduced by using 3 imaginary primaries. If you carefully chose 3 imaginary primaries to minimise the amount of space wasted by imaginary colours, then you'd end up with something very similar to CIE XYZ.

In other words, while CIE XYZ does waste some space for imaginary colours, but this is impossible to avoid (while remaining acceptable as a device independent colour space) and the amount of wasted space is very close to the minimum possible.

MessiahAndrw wrote:
The blending algorithms are somewhat more difficult in this colour space. Imagine you're writing a 3D game and two light sources are shining on the same wall, you can easily do final colour = (wall colour * light a) + (wall colour * light b) in RGB. Transparency is final colour = (source a * 1/alpha) + (source b * alpha).

RGB is crippled and completely unacceptable for 3D rendering. It fails for "additive colour" (e.g. you can have 2 or more light sources that use colours that RGB is unable to represent where combining the light sources results in a colour that RGB can represent); and it also fails for "subtractive colour" (e.g. you can have a light source that uses a colour that RGB is unable to represent where that light passes through a filter and becomes a colour that RGB can represent).

However; it really doesn't matter what the render does or which colour space/s it uses - I'd still need some way to convert "renderer colour space" into "device specific colour space", which means I need some way to describe this colour space conversion. The nice thing is that most colour space conversions use matrices, and these conversion matrices can be multiplied. For example, if the monitor's description provided an "XYZ to monitor colour space conversion matrix", and if the renderer felt like using sRGB and had an "sRGB to XYZ conversion matrix", then both conversion matrices can be multiplied together to create an "sRGB to monitor colour space conversion matrix" that the renderer can use to convert its colour space into whatever the monitor uses without anything ever actually being converted to XYZ.

MessiahAndrw wrote:
Performance. If you're watching a video or playing a real time 3D game, can you convert 1920*1080*60 pixels per second? I'd imagine for an 8-bit video game, you could simplify the problem by generating a lookup table for all 256 colours - but what if the screen is shared across multiple monitors? (A video wall or a duel-monitor workstation.)

Converting XYZ to one of the RGB colour spaces costs the same as converting one RGB space to another RGB space (it's a single matrix multiplication in both cases).

For the total cost of the simple/brute force approach, a matrix multiplication is 9 multiplications and 6 additions per pixel (15 floating point operations). For 1920*1200*60 you'd be looking at 138240000 pixels per second, or 2.0736 billion floating point operations per second. Modern Intel CPUs range from about 10 GFLOPS to a few hundred GFLOPS, which means they'd be able to handle 5 times as much (or more).

However, please note that the simple/brute force approach isn't necessarily the approach I'd use. I have theories... Mostly, the results from rasterisation are naturally horizontal line segments, where each line segment is 1 pixel tall and any width (up to the full width of the screen), and each line segment has a starting colour and an ending colour (where the ending colour for one line segment is the next line segment's starting colour). It should be possible to keep the data in this "line segment" format during subsequent processing stages (HDR, accessibility and conversion to device specific colour space); so that the number of colour conversions depends on the number of line segments and not the number of pixels, where if the average line segment width is larger than one pixel (which is likely) the overhead of colour space conversion is less than it would've been for the simple/brute force approach.

Finally, don't forget that I will be planning "fixed frame rate variable quality" (and not using the typical "fixed quality, variable frame rate" approach). What this means is that (e.g.) if the video driver doesn't think it'll be able to do 1920*1200 pixels before the frame's deadline then it can reduce the resolution (e.g. do colour space conversion for 960*1200 pixels instead) and upscale after (and then, if the next frame is the same do a subsequent 1920*1200 pixel colour space conversion of the previous frame's data and re-display the previous frame at the higher quality before the user has enough time to notice that the initial frame was lower quality). Basically, for 1920*1200 at 60 frames per second it doesn't have to be able to process 1920*1200*60 pixels per second.

Essentially what I'm saying here is that the question isn't whether the CPU is able to process 1920*1600*60 pixels per second; the real question is whether I'm able to design and optimise the graphics pipeline to give "good enough quality" when every frame is different (e.g. 3D games) and to reach "max. possible quality" before the user has time to notice lower quality frames when most frames aren't different (e.g. desktop apps).

MessiahAndrw wrote:There have also been RGB colour space standards (like sRGB, Adobe RGB), and many monitors and HDTVs are sRGB compliant, and I've thought the theory was that a colour on any two sRGB screens should be identical. My monitor has an sRGB mode, but it also lets me change the brightness/contrast/colour temperature (which obviously alters the output colour) while claiming to be in sRGB mode, so that throws out that theory.

For 2 displays that both claim to be sRGB and are both in their default configuration, in theory a colour should be the same on both displays but in practice it's unwise to trust marketing departments and there's no guarantee that the colours actually will be the same.

Also, take a look at the sRGB gamut (from the wikipedia page):

That larger horseshoe shape represents all the colours that humans can see, and that pathetic little triangle represents how inadequate and crippled sRGB actually is in comparison.

Cheers,

Brendan

Posted: **Thu Jul 16, 2015 2:11 am**

Hi,

MessiahAndrw wrote:
The blending algorithms are somewhat more difficult in this colour space. Imagine you're writing a 3D game and two light sources are shining on the same wall, you can easily do final colour = (wall colour * light a) + (wall colour * light b) in RGB. Transparency is final colour = (source a * 1/alpha) + (source b * alpha).

I've done some checking; and as far as I know you can do the exact same operations on XYZ colours.

For example; if you add the 2 sRGB colours [0.2, 0.4, 0.6] and [0.7, 0.3, 0.1] together you get [0.9, 0.7, 0.7]. If you convert the colours from RGB into XYZ you get the XYZ colours [0.333784, 0.371900, 0.621726] and [0.414036, 0.370634, 0.144322], adding them together as XYZ gives you [0.74782, 0.742534, 0.766048], and converting from XYZ back into sRGB gives [0.900000, 0.699999, 0.700000].

For another example, for the sRGB colour [0.2, 0.4, 0.6] if you take 25% of it you get the result [0.05, 0.1, 0.15]. If you convert the colour from RGB into XYZ you get the XYZ colours [0.333784, 0.371900, 0.621726], taking 25% of that gives you [0.083446, 0.092975, 0.1554315], and converting from XYZ back into sRGB gives [0.05, 0.1, 0.15].

Basically, the result is identical for both cases, regardless of whether you do the operation on sRGB colours or with XYZ colours.

Note: I used this calculator to do the conversions between sRGB and XYZ; with gamma set to 1.0 and D65 reference white, so that neither gamma nor whitepoint adjustment interfere with the results.

Cheers,

Brendan

Posted: **Thu Jul 16, 2015 10:02 am**

Sorry for a bit late answer.

Brendan wrote:You're using a very loose definition of "faults". To me, a failure is when software doesn't do something it should or does do something it shouldn't (e.g. crash).

The "something" here is the loose part. Your approach is based on your understanding of the "something", sometime your understanding is limited to "file not found" error, but in other cases it flies away and suggests an OS which automatically heals user's misbehavior. So, the intrusion of the OS in application operations can vary from non existent to creating very high load on all your 100 computers. But what if a user just doesn't want all this high load mess?

Brendan wrote:
embryo2 wrote:Ok, if you insist then here is another version - imagine one page of a code with 20 nested ifs and 100 pages of a code with prints only. Both variants can have the same number of bugs despite of the code size.
So you're saying that for a fair comparison (same quality control, same code maturity, same code size); complex code (e.g. the code in high performance VM that does JIT compiling and optimisation) is more likely to have bugs and security vulnerabilities than simpler code (e.g. the code you find in typical desktop applications)?

Your bracketed examples are skewed towards your vision and distort my vision of the fair comparison. It is the whole system that should be compared with another system, but you are trying to compare randomly chosen parts of different systems.

Brendan wrote:
embryo2 wrote:You have no need for downloading anything or even for the computers and your LAN, just because the cloud computing already has it available online (yes, you need one PC and some kind of internet connection, but that's all you need).
How do I plug 100 keyboards and 100 monitors into this "cloud"? When the cloud tells me networking is a bottleneck, and I just install some extra network cards in my cloud?

I hope that your goal is not to plug 100 something into another something, but most probably your goal is to have some job done. The job is perfectly done by the cloud without any networking bottlenecks. You just tell the cloud to do something usual and cloud does it and returns some result, even if the result should be shown to 100 people.

Brendan wrote:Mostly all I'm saying here is that "cloud" is intended for an extremely different use case. I want all the my computers on my LAN to work together (with or without any Internet connection), and I don't want to pay some third party for processing time. I want the typical ("one computer per user plus a server") office to shift to my OS to reduce hardware costs ("one computer per pair of users with no server"). I want this to work for mobile devices (laptops, tablets, smartphones) where (e.g.) if you're using a weak/low power tablet while you're walking around and happen to wander into wifi range the OS automatically shifts processing to the faster systems and the software you were already running starts running much faster (and if you walk out of wifi range the applications just start running slower without any "network connection lost" failures).

The goal of efficiently using all available resources was always on the radar of all system designers. But the cost of distributing a job across all available devices is considered too high for it to be implemented. However, you can try to beat the heavily funded designers of world's top corporations. And I do not want to tell you that it is impossible. It is possible, but I doubt it is possible within some acceptable time frame for one developer. So, in the end you will have some incomplete system and many years behind, but you still will be able to claim that the goal is achieved in the form as you see it.

Brendan wrote:I also want to do things like let users send running applications to each other (e.g. you open a word processor document, write half of a letter, then send the application "as is" to another user so they can finish writing the letter);

It's just about sending snapshot of a document. The MS Word saves such snapshots for the user to have an ability to restore old variant, so you need to add network support to the MS Word and your goal is accomplished.

Brendan wrote:and to have multi-user applications (where 10 programmers working on the project can all use the same IDE at the same time).

I just can't imagine what 10 developers can do in the same IDE. Interfere and interrupt each other? The world has solutions for team work for decades (code repositories, for example), so why there should be another solution? What it is better for?

Brendan wrote:For power management, I want the OS to automatically shutdown computers when they aren't needed and (using "wake on LAN") automatically start them again when they are needed.

Yes, such a feature seems as viable for an OS, that manages many distributed computers.

Brendan wrote:I also want all of this to be as close as possible to "zero configuration". For example, if you buy 20 new computers with no OS on them at all; you should be able to plug them into your network and do nothing else; where those 20 new computers boot from network, automatically become part of the cluster, and automatically start doing work.

I see it viable in case of a new available computer, that is connected to the cluster. But in case of a new cluster there should be many user defined actions, because the OS just doesn't know the goal of the final cluster. So, the configuration task is always required for new setup, but can be avoided for additional computers. If you look at such cases separately you can see more ways of creating a convenient OS.

Brendan wrote:
embryo2 wrote:It can use something like pattern matching for detection of "non-standard" mouse behaviour. But it's not a trivial task.
For existing mouse hardware, I very much doubt that this is possible without an unacceptable number of false positives and/or false negatives (and if its too unreliable it's just going to annoy people instead of being useful).

I think the false results will be on par with the time based maintenance person call. And if we remember those erratic movements of a dirty mouse then it becomes obvious that they differ a lot from a standard mouse movement pattern. We can measure the distance between two adjacent mouse events and for dirty mouse the distance is almost always too big.

Brendan wrote:There would also be a way for normal users to add a "trouble ticket" to the system; for whatever the OS can't automatically detect (even if it's "my office chair broke!").

I really don't think it'd be hard to design a maintenance tool that suits almost everyone; and if people don't like my tool then they're able to suggest improvements, or replace any or all of the "pieces that communicate" with their own alternative piece/s;

Improvement suggestion is used by the humanity for thousands of years. But there's still no "a maintenance tool that suits almost everyone". Because it's not a software problem.

Posted: **Thu Jul 16, 2015 10:07 am**

Brendan wrote: Basically, as far as I can tell, hardware is not to blame for the unacceptable failure in the IT industry and it's almost entirely software's fault.

Nope, it's marketing's fault.

Brendan wrote: in theory a colour should be the same on both displays but in practice it's unwise to trust marketing departments and there's no guarantee that the colours actually will be the same.

Well, you knew already... Haha.

Posted: **Thu Jul 16, 2015 10:28 am**

Rusky wrote:People make mistakes, whether they're careless or not. Giving the language the ability to point out their mistakes before they release their program helps alleviate that problem.

So, now we are talking about compile time checks, right?

Rusky wrote:My point is that Rust has two error handling modes. Only unrecoverable exceptions (called panics) are allowed to unwind the program in the middle, and they will always run all destructors, so the you can't catch the exception partway through the rollback. This means programmers don't have to care about exception safety, because the language takes care of it in a general way.

But there should be the destructors for the bookkeeping to be done. What a difference between Rust's destructors and Java's catch clause? It is possible in Java to define uncaught exception handler, so the situation is absolutely similar to the described above.

Rusky wrote:Java, on the other hand, allows you to invisibly pass exceptions on up the call stack. If the programmer doesn't check the documentation or method signature, all runtime exceptions become invisible

Yes, the runtime exceptions can cause a pain in the neck, but to eliminate them we need to define a handler for each of them. The similar requirement is actual for Rust - the Rust developer should define all required destructors. In case of exception handlers the situation is more intuitive than in case of destructors, because the developer sees the problem (exception) and has all means to work accordingly. In case of destructors the situation is more subtle, because destructor works on an object without attracting attention to the exception (the root cause of the problem).

Rusky wrote:and adding "throws Exception" makes all the rest invisible.

In Java it is considered a bad practice to add "throws Exception" because it's too general and hides many possible variants of application behavior.

Rusky wrote:while Rust's "try!" macro only effects a single potential error.

So, it looks like a complete analogy to the one catch per one exception type in Java. The difference is only in the form of a textual representation.

Rusky wrote:Further, Rust's "Result" type is much more composable than Java exceptions, because it's part of the normal data flow, rather than in a side channel requiring separate syntactic structures.

The separation of paths here makes it easier to work with different tasks. And when you have the Result object you still need to separate the paths, but with additional language constructs like if or switch.

Rusky wrote:For example, instead of this multiline monstrosity:
Code: Select all
ResultType result;
try {
    result = compute_result();
} catch (SomeException) {
    result = default_value;
}
use(result);
You could just do this:
Code: Select all
use(compute_result().unwrap_or(default_value));

There's no need for Result in Java. When we work with the code in try clause we just assume that everything is ok, but in the separate section of the catch clause we pay attention to the cases when something is not ok. In case of you example with Rust the separate section is still required, but wasn't shown. So, in Java we have a clear separation of concerns (cases when everything is ok and cases when something is bad) while in Rust we see just the case when everything is ok, while lose the case when something goes bad way.

OSDev.org

Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces

Re: Concise Way to Describe Colour Spaces