What is osdev? Rants from the screenshot thread.

Brendan · Post by **Brendan** » Wed Jul 05, 2017 8:02 pm

Hi,

~ wrote:I just feel that if a team out of all of those out there wanted to know how to implement things at the level of CPU technology, it would be worthwhile.

You feel that if someone wanted to know how to implement things at the level of CPU technology, it would be worthwhile to waste their time with silly garbage?

~ wrote:It just seems to me that if a team was to create a massive amount of hand-made assembly routines with a style that resembled the usage of other instructions, maybe even using opcode-like byte structure for the parameters, those routines could be very durable, as much as the rest of native instructions, taking their logic to concise, minimal, optimized bits per routine.

They would just need to be routinely optimized as much as the instructions themselves. But if there is a part of the team that only works at the low level optimizations also with the intention to gain CPU-implementing technology (for hardware projects of theirs), then the high-level team would simply make the rest of the program, and if they know that something is too slow, they could easily direct hand-optimized assembly only to the most critical parts of the program, not necessarily to everything, and it could become more understandable just because of being in the context of something that urges optimization.

No; this is very wrong. You can optimise each little function as much as you like and it will still be worthless garbage because nothing will be able to optimise the interactions between functions (inlining, constant folding, common sub-expression elimination, tail call optimisation, etc).

~ wrote:It seems to me that if it was possible to design an optimizing CPU, then it's possible for a good team to create optimized code where it's really needed. Not many things truly require brutal optimization, only the most specialized algorithms. The rest can be gradually improved over time. It could be an interesting platform to re-explore.

It can make sense to spend a massive amount of time creating a "highly optimised by hand" piece of assembly for a very specific purpose (e.g. MPEG decoder); but these things can already be done and already are done (and often put in libraries and used from high level languages).

~ wrote:At least x86 assembly has been proven to be able to have portable code for 16, 32 and 64-bit as much as if it was C, so the possibility is there.

Sure, if you don't care about performance at all, you can make 80x86 assembly language portable between 16-bit, 32-bit and 64-bit, just by not using anything that isn't supported in all CPU modes; and all assemblers support this (without macros or other ugly hacks).

If you do care about performance then you need to optimise for specific cases; because some instructions are faster/slower on different CPUs, some CPUs do "micro-op fusion" and some don't, some have "loop stream detection" and some don't, etc; and also because some CPUs support features/instructions that others don't (SSE, AVX, etc).

For one example; for something like (e.g.) "find the 2 points in an array of points which are the closest to each other" you'd need about 100 variations where each variation is optimal for a specific CPU in a specific mode.

~ wrote:And with the current bulk of software it would probably be beneficial to do this at least for making the machine execute a considerably shorter stream of code.

I don't see how the current bulk of software could benefit from becoming extremely inefficient and slow.

Cheers,

Brendan

Sik · Post by **Sik** » Wed Jul 05, 2017 8:08 pm

I think the problem here is that we're asking the wrong question.

The issue is not so much whether making applications is OS development but rather whether they're related enough to be considered on-topic (otherwise we should ask if making drivers counts as OS development, since they may be swapped by other drivers). Also let's not forget that often people will add new features to their OS as new applications demand them, in many cases showcasing an application is really a way to showcase new functionality in the OS itself (e.g. the web browser screenshots are often a way to show how the OS gained internet support or how the rich text functionality in the GUI is progressing).

I'm fine with people talking about their own applications as long as they're related to improving their OS (regardless of whether the underlying OS itself has been improved or if it's just a new offering that comes with it).

Brendan · Post by **Brendan** » Wed Jul 05, 2017 8:16 pm

Hi,

~ wrote:That's the sort of things that still make me wonder how programmers were so good to implement super-optimized games for the Atari 2600 like the 3D Tic-Tac-Toe. Nowadays, with all of the supercomputers (that's what they really are), optimizing compilers, high-level languages, nobody on the Internet has been able to reimplement another 3D Tic Tac Toe that is so hard to defeat as the original Atari 2600 game and in so little space, a perfect example of true optimization of the actual logic and its correctness and efficiency in less than 2 Kilobytes of 6507 and peripherals code. If it was done before, now it can more easily.

No; people can still do this, they just don't want to.

The fact is that end user expectations have changed, and now end users expect software to make use of the (incredibly powerful) hardware that the end user has paid for. This includes full 3D graphics, high quality sound, playing with friends over the Internet, etc. It also includes device independence (e.g. software that still works when you upgrade your sound card, or replace your keyboard with a touch-screen, or...).

For an example, this is the kind of graphics users would expect for a modern version of the same "3D tic-tac-toe" game now:

Cheers,

Brendan

hgoel · Post by **hgoel** » Wed Jul 05, 2017 9:27 pm

You only need to look at the graphical quality of modern games to see that this '3d tic tac toe' thing is trivial for modern hardware, thus there is no reason for developers to go the extra mile and try to shrink the game down to Atari 2600 levels. Optimization only matters to the point that is necessary, any further time spent on it is wasted time and money that could be spent on something else. Additionally, the limitations on hardware have changed too, CPU memory isn't as limited and some waste is tolerable in exchange for convenience, or on GPUs, often it's faster to compute a value every time instead of performing a memory access from a precomputed table.

I don't deny that every programmer ought to know details of how stuff works at the CPU level, with some assembly in the mix too, but only for completeness and a basic understanding of what factors matter relating to application performance. Anything more is a waste for almost every application.

~ · Post by ~ » Wed Jul 05, 2017 9:42 pm

Just try to play the Atari 2600 version against a 3D-accelerated version of the game, I haven't found any game that is capable of beating the Atari 2600 version at level 8 not even nearly. In other words, all the potential of the machine and its modern extra complexity is unused and wasted if a programmer cannot trivialize a problem like this more easily.

Don't forget that the people who made that game have the degree of knowledge and optimizations of the people who implement 3D-accelerated hardware today.

The logic of that game isn't trivial considering that the core logic will have to be stored in less than 2 Kilobytes, the same degree of optimization required by complex critical real-time graphics renderization.

It cannot have a precomputed table because there are too many extremely hard positions that it can block dynamically. As I said, not even modern versions can defeat it.

For the looks of the game, it could probably serve as an example of how to handle some level of paging or memory management in a way that will never fail. The game manual says that the first to place a piece in the game can always win if that first player plays a perfect game. So it would never fail if part of its logic is implemented as paging or other memory management, considering free and used pages as X's and 0's, and considering memory leaks as bad moves.

_________________________________________________________
_________________________________________________________
_________________________________________________________
And the first optimization example in this post by Brendan seems to use some algebraic method for making possible the applied optimization.

hgoel · Post by **hgoel** » Wed Jul 05, 2017 10:15 pm

Uh, saving 2 kilobytes is absolutely nothing for a modern cutting edge realtime graphics, heck I have yet to see a modern graphics technique even worry about memory usage. I'm far too young to know much about the game you're citing (the Atari 2600 is over twice as old as I am), but if you're arguing that the reason modern implementations don't have nearly as good AI as the the original is because all the performance is wasted on abstractions, then you are sorely mistaken. If you're arguing that the AI in modern implementations of the game isn't as good because the programmers aren't as good, then how does writing it in assembly matter? If the algorithm isn't good enough, switching to assembly isn't the solution.

In fact, just looking at the wikipedia page for the game https://en.wikipedia.org/wiki/3D_tic-tac-toe it's clear that the game has been solved, leaving the only explanation for why a modern implementation can't beat the original is to be sub-optimal on purpose to allow humans to actually win.

I'm not trying to be rude, but many of your perceived issues with software are almost 20 years out of date, in a world where games that look like

are the cutting edge and materials are modeled based on physical measurements, there is absolutely no reason for a human to waste his/her time trying to pack things into as little memory as possible. As long as the code is fast enough to meet the requirements of the application, it is good.

LtG · Post by **LtG** » Wed Jul 05, 2017 11:30 pm

Tilde, you do realize we already have that "optimized assembly", right? The optimizing C compiler optimizes to code for us, no need to have people do that each time manually, and when there's a need you can still do hand optimizations. For any non-trivial application the level of complexity means that you can spend 50 years (asm) or 1-5 years (C) creating the same app, and at the end the C compiler will almost certainly produce code that is:
- Smaller in size
- Uses less memory
- Produces results faster

And as an added bonus it's ready to use 45-49 years earlier..

Not sure if I should mention this, I seriously don't want to feed your "delusions", but there was a 3D shooter game for Windows made in under 96KiB called .kkrieger. Here's a screenshot:

So compare your 2KiB tic-tac-toe to "my" 96KiB 3D FPS game, which is more impressive?

glauxosdever · Post by **glauxosdever** » Thu Jul 06, 2017 1:06 am

Hi,

LtG wrote:For any non-trivial application the level of complexity means that you can spend 50 years (asm) or 1-5 years (C) creating the same app, and at the end the C compiler will almost certainly produce code that is:
- Smaller in size
- Uses less memory
- Produces results faster

You forgot to mention that assembly code is very likely to have more bugs than the equivalent C code. Because of C being higher level and because of the compiler being able to detect some errors (actually it should try to detect more errors than it does currently, but that's another story), bugs in C code are minimised.

On the other side, it's true that C facilitates usage of complex data structures (e.g. multi-level pointers).This is obviously good for code clarity, but the programmer should also be aware of their underlying inefficiency and try to minimise their use when they are not needed.

Regards,
glauxosdever

dozniak · Post by **dozniak** » Thu Jul 06, 2017 1:10 am

~ wrote:That's the sort of things that still make me wonder how programmers were so good to implement super-optimized games for the Atari 2600 like the 3D Tic-Tac-Toe. Nowadays, with all of the supercomputers (that's what they really are), optimizing compilers, high-level languages, nobody on the Internet has been able to reimplement another 3D Tic Tac Toe that is so hard to defeat as the original Atari 2600 game and in so little space

An optimizing C++ compiler can make a better Atari 2600 code than you can do manually.

Check out Jason Turner video as a simple example of what's possible.

~ · Post by ~ » Thu Jul 06, 2017 2:51 am

Many optimizations like the one pointed out previously are just algebra with different degrees of complexity to reduce constants and variables. Beyond that, the Intel/AMD manuals indicate how to optimize code. If there are different optimizations for different CPUs, we could start by looking at which ones differ between CPU models and create pseudoinstruction mnemonics that would expand to optimized sequences. Those optimizations could probably be done slightly portable or fully portable after some time.

They would still be of absolutely no help if the programmer doesn't actually know what is the best algorithm for a task, like in this case. It seems obvious that only the Atari 2600 version of 3D Tic Tac Toe contains the optimized algorithm which would be unbeatable if we removed the randomization code that prevents it from playing a perfect game every single time.

I think that I'd do good to always inspect the generated code by the compiler to learn the actual tricks to optimize code, so I can manually write increasingly more optimum code.

And the x86 assembly, any assembly in fact, that one uses for a long time, becomes obvious as to where it can be made more portable, predictable, brief, so probably an optimizing assembler or a GCC translated to assembly could be done.

The best seems to be to learn and understand all of the existing optimization tricks in a documented way. The compiler must only be able to apply the cases it knows, but without human intervention, the assembly will never be as optimized as a given algorithm allows, but for that the actual tricks must be pointed out in a clear manner, not just rely on the fact that the compiler optimizes without knowing exactly how. With that knowledge then portable code that keeps optimizations could be produced, accumulated, improved and maintained purely with an assembler, and then the compiler would be optional. Human optimizations for carefully selected key algorithms could probably always end up being superior over time if we really know the optimization algorithms themselves and turn them into portable pseudoinstructions built from optimized sequences of native instructions. The optimizations could even become part of the official algorithm in the same way than a mathematical formula, so it would also be worthwhile, study the compiler and look for information sources to apply those tricks manually.

LtG · Post by **LtG** » Thu Jul 06, 2017 4:18 am

Tilde, I don't mean to be rude or anything, but what exactly is the point you're trying to communicate here? It seems you're trying to argue that coding assembly will yield better code? If you are, then you're plain wrong.

~ wrote:Many optimizations like the one pointed out previously are just algebra with different degrees of complexity to reduce constants and variables. Beyond that, the Intel/AMD manuals indicate how to optimize code. If there are different optimizations for different CPUs, we could start by looking at which ones differ between CPU models and create pseudoinstruction mnemonics that would expand to optimized sequences. Those optimizations could probably be done slightly portable or fully portable after some time.

All of that is done already today, we call it C compiler. The C language is portable and the compiler knows how to optimize, problem solved.

Essentially _all_ of the optimizations are somewhat different for each CPU, all modern x86 CPUs do tons of weird stuff that you can't reasonably learn, understand and apply to any meaningful code base. And even if you could (you can't, seriously) it would essentially prohibit most future changes/improvements because it would essentially be spaghetti code. You see, if we create proper code in C and then compile it to asm it's okay for the compiler to produce "spaghetti" asm because the C code stays sane and understandable. Each different CPU has different pipelines, different amounts of cache, possibly different cache line sizes, different number of execution units, etc..

Long are the days when Intel published how many cycles some instruction takes because no such figure exists today, some instructions take 1/3 of a cycle and if there's TLB and cache misses it can take hundreds of cycles. Are you seriously suggesting that at every point in code you're going to analyze the full impact to everything in the system and that you that without mistakes, for every assembly instruction and you'll actually produce something within the next million years?

~ wrote:I think that I'd do good to always inspect the generated code by the compiler to learn the actual tricks to optimize code, so I can manually write increasingly more optimum code.

In practice you'll never be able to recreate what the compiler can do, for the same reason that you'll never be able to do 4 billion simple additions per second. The amount of work a compiler can do in a second is far beyond anything you can reasonably do so leaving the optimizations to the compiler is the correct thing to do.

If a compiler doesn't optimize your code as well as you can in some tiny specific place then you can hand optimize that, of course better yet, improve the optimizer so in the future it can..

~ wrote: Human optimizations for carefully selected key algorithms could probably always end up being superior over time if we really know the optimization algorithms themselves and turn them into portable pseudoinstructions built from optimized sequences of native instructions.

You can't do that, or rather the way to do that is to recreate C (or preferably a better language)... The optimization isn't about knowing which "asm" sequences would be fastest, it's that on specific CPUs a specific instruction sequence will be faster due to XYZ and that's different for each CPU model, let alone manufacturers, not even thinking about different architectures.

~ wrote: The optimizations could even become part of the official algorithm in the same way than a mathematical formula, so it would also be worthwhile, study the compiler and look for information sources to apply those tricks manually.

So now your algo which is 10 lines of asm turns into 50k lines of asm; 10 lines for each possible CPU. And that's one really simple algo, good luck maintaining that and doing that for every algo/function that is ever needed.. That's why we have higher level languages and compilers that optimize.

Oh, and the (compiler) optimizations will only get better and hand optimizing assembly for x86 will only get more difficult..

edit.
PS. If you want to learn asm and optimizations for fun (or to create your own compiler, etc), by all means, do it. There's nothing wrong with that, but don't try to suggest seriously that anyone in their right mind should start coding in asm, except for a few tiny things.

linuxyne · Post by **linuxyne** » Thu Jul 06, 2017 7:19 am

~ wrote:... It seems obvious that only the Atari 2600 version of 3D Tic Tac Toe contains the optimized algorithm which would be unbeatable if we removed the randomization code that prevents it from playing a perfect game every single time...

I think that you have conflated the concepts of an optimised algorithm and an optimised binary code.
Also, it is very likely that removing the randomsation code will result in an unsustainable explosion of the search tree.

~ wrote: The compiler must only be able to apply the cases it knows, but without human intervention, the assembly will never be as optimized as a given algorithm allows, but for that the actual tricks must be pointed out in a clear manner, not just rely on the fact that the compiler optimizes without knowing exactly how. With that knowledge then portable code that keeps optimizations could be produced, accumulated, improved and maintained purely with an assembler, and then the compiler would be optional. Human optimizations for carefully selected key algorithms could probably always end up being superior over time if we really know the optimization algorithms themselves and turn them into portable pseudoinstructions built from optimized sequences of native instructions. The optimizations could even become part of the official algorithm in the same way than a mathematical formula, so it would also be worthwhile, study the compiler and look for information sources to apply those tricks manually.

1) If the human intervention deals with the generated binary code alone, then the algorithm does not change at all. Such intervention is required to not alter the (externally visible program order) behaviour as represented by the programmer, no matter how inefficient the behaviour is.

If I wrote bubble sort, human intervention to optimise its assembly won't change it to quicksort. Such interventions (for e.g. binary code optimisers) do not become part of the bubble sort algorithm, because its properties are independent of its implementation. They may, however, speed up the execution of a particular implementation, and it may be the fastest implementation ever, but it will still remain bubble sort.

2) If the human intervention deals with the algorithm, then there is no need to intervene at the level of assembly - optimise on paper first and then implement. See the optimisations to quicksort. One can then even take advantage of a binary code optimiser to produce a slim and fit binary.

3) If instead you intend to develop an algorithm which can optimise other algorithms, you may want to invest in the study of and research into theoretical computer science.

hgoel · Post by **hgoel** » Thu Jul 06, 2017 9:42 am

There's also the fact that optimization is an inherently logical process, thus is easily performed by a computer, so not actually doing so is going against the entire point of software, to automate any action that doesn't need abilities unique to humans.

mallard · Post by **mallard** » Fri Jul 07, 2017 5:43 am

hgoel wrote:There's also the fact that optimization is an inherently logical process, thus is easily performed by a computer, so not actually doing so is going against the entire point of software, to automate any action that doesn't need abilities unique to humans.

Although at the same time, perfect program optimisation (i.e. take arbitrary code and reduce it to the smallest/fastest/most memory-efficient equivalent) is known to be a mathematically undecidable problem, so there's always the possibility that a human will find an optimisation that the compiler does not.

Additionally, the compiler has to keep all observable* behaviour of the program the same when optimising. A human programmer knows that some of the observable behaviour is not actually required behaviour (i.e. some side effects don't matter) and can optimise in ways that do change the observable behaviour.

Finally, note that there are 3 different kinds of optimisation; making code smaller (less total instructions), making code faster (less instructions executed) and reducing memory usage. A human may well know which of those to prioritise better than the compiler does, again resulting in the human being able to "beat" the compiler within a particular domain.

So, yes, in the vast majority of cases, a modern optimising compiler will do a better job than an average programmer, an expert programmer may well be able to do better in some cases. Of course, the vast majority of code is infrequently executed and even speeding it up by 1000x has no noticeable impact on performance...

*Within the scope of what the language defines.

LtG · Post by **LtG** » Fri Jul 07, 2017 6:14 am

mallard wrote:
hgoel wrote:There's also the fact that optimization is an inherently logical process, thus is easily performed by a computer, so not actually doing so is going against the entire point of software, to automate any action that doesn't need abilities unique to humans.
Although at the same time, perfect program optimisation (i.e. take arbitrary code and reduce it to the smallest/fastest/most memory-efficient equivalent) is known to be a mathematically undecidable problem, so there's always the possibility that a human will find an optimisation that the compiler does not.

Get a billion x86-64 CPU's and let them crunch it for a year, at which point the "possibility" of a human finding a better optimization stops being relevant. Though I don't think anyone is suggesting that you shouldn't profile, check the bottlenecks and manually optimize those =)

mallard wrote: Additionally, the compiler has to keep all observable* behaviour of the program the same when optimising. A human programmer knows that some of the observable behaviour is not actually required behaviour (i.e. some side effects don't matter) and can optimise in ways that do change the observable behaviour.

In that case the original code was wrong and included behavior that was unwanted to begin with. But of course in practice that does happen, here too I think we would need better tooling to highlight such cases so the programmer can "fix" the code at which point the optimizer wins again.

mallard wrote: Finally, note that there are 3 different kinds of optimisation; making code smaller (less total instructions), making code faster (less instructions executed) and reducing memory usage. A human may well know which of those to prioritise better than the compiler does, again resulting in the human being able to "beat" the compiler within a particular domain.

A human _may_ know better, but I'd argue that these days that's somewhat rare and probably is going to be increasingly rare going forward. How realistic do you think it is for a human to be able to account for cache usage? For instance, sometimes smaller code might be faster even if it uses more "cycles" because it fits better in cache. Trying to account for that manually for any non-trivial code is likely going to be nearly impossible unless, again, you look at a tine part of the code (selected thru profiling) and then manually optimize it. Though in this case too the "correct" fix would be to fix the optimizer so you get the benefit for all of your code.

I do realize that "fixing the optimizer" is likely unrealistic, so in practice we're stuck with manually optimizing small pieces.

Note, all I've said above is more from a theoretical perspective and doesn't in all cases match reality, but that's just a problem with the optimizers and should reduce over time.

OSDev.org

What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.

Re: What is osdev? Rants from the screenshot thread.