Dawn

Kevin · Post by **Kevin** » Wed May 03, 2017 2:54 am

Geri wrote:is this good? does anybody have comparable datas from his system?

It shouldn't be hard for you to run the same kind of test on Linux. I think you'll see a big difference.

I never measured something like this on tyndur, because I already know that it implements most of those operations in the most naive way I can think of, so of course it will perform badly.

Geri · Post by **Geri** » Wed May 03, 2017 9:08 am

Kevin: i dont want to mess on the file system of my daily work linux. maybe i will install one to virtual box to see.

Geri · Post by **Geri** » Sun Jul 02, 2017 3:02 pm

The operating system is now available for free. The free version was limited to 7 cores. This limit is removed. License key management also removed from the operating system fully.

http://DawnOS.tk

dozniak · Post by **dozniak** » Mon Jul 03, 2017 11:42 am

If you don't mind to fix the typos on your website:

"You can choose beethwen the folowing payment methods:"

Geri · Post by **Geri** » Tue Jul 04, 2017 6:05 am

true, i will fix it

Geri · Post by **Geri** » Tue Aug 01, 2017 10:09 am

2017, aug 1. - Opensource emualtors
New, opensource emulator released for mobile phones (droid).
A new, opensource minimalistic example-emulator also released.

so i finally put the emulator together on android / arm.
90% of the bugs were the result of the crappy design of arm cpu (no unaligned memory access)

as the arm code is not optimized, the performance is poor.

emulators for android opensourced.
i also created a minimalistic opensource emulator using sdl2 for mouse input and opengl es for graphics output, it may compiles under desktop linux too.

Schol-R-LEA · Post by **Schol-R-LEA** » Wed Aug 09, 2017 12:52 am

Geri wrote:as the arm code is not optimized, the performance is poor.

No, Geri, it is poor because you are compiling C to an OISC, and any idiot can see that combining C and OISC is ludicrous.

I tried to prod you into looking at other types of languages, hoping you'd notice this - more than once, actually - but apparently you didn't get the message, so screw subtlety, here's the unvarnished fact: even if by some miracle you found someone who was willing and able to put the time, effort, and $5 billion into building a non-toy OISC CPU, and by some even bigger miracle it didn't run like drugged sloth, it won't matter with regard to Dawn because your C compiler will never produce performant code for OISC.

Why? Because the primary - if not sole - advantage of C is that its primitive operations closely mirror the instruction sets of conventional register machines, the PDP-11 in particular. In fact, it is safe to say that C has never compiled so efficiently for any other processor as it did for the one it was designed for, and many of the operators - especially the accumulating-assignment, indirection, and bitwise operators - exist solely to exploit features of the PDP-11, features which didn't exist in most other contemporary ISAs in 1968.

C is designed to compile efficiently on a register machine. Compiling it to an implicit-state machine without registers, without multiple direct, indirect, and immediate addressing modes (even RISCs generally have at least one of each, though usually presented as separate assembly language instructions for each mode), and with only a single universal instruction, while obviously possible, leads to a horrible abstraction inversion - you lose efficiency with C over a more abstract language, because you need to implement the C operations in terms of dozens of instructions rather than one or two, while a compiler for something closer to the OSIC model might be able to find more efficient expressions of the operations. Using C on an OISC is simply a bad idea.

Insisting on both OISC and C is a flat-out contradiction, an impedance mismatch of epic proportions. That you didn't see that contradiction even after I pointed it out to you tells me that you have no idea what you are doing.

alexfru · Post by **alexfru** » Wed Aug 09, 2017 1:58 am

Schol-R-LEA wrote:C is designed to compile efficiently on a register machine. Compiling it to an implicit-state machine without registers, without multiple direct, indirect, and immediate addressing modes (even RISCs generally have at least one of each, though usually presented as separate assembly language instructions for each mode), and with only a single universal instruction, while obviously possible, leads to a horrible abstraction inversion - you lose efficiency with C over a more abstract language, because you need to implement the C operations in terms of dozens of instructions rather than one or two, while a compiler for something closer to the OSIC model might be able to find more efficient expressions of the operations. Using C on an OISC is simply a bad idea.

This reminds me of Radix economy. Geri's "invention" is akin to using a huge base (long instruction) for just a handful of values (how many different kinds of useful instructions there are).

Geri · Post by **Geri** » Wed Aug 09, 2017 3:03 am

Schol-R-LEA wrote:No, Geri, it is poor because you are compiling C to an OISC, and any idiot can see that combining C and OISC is ludicrous[

ok, then i repeat: the arm code is not optimized, the performance is poor.

and no, i will not optimize on arm, as i dont know that architecture.

(on x86 it goes somewhat usably, even if that code is just having tiny optimizations.)

there is nothing wrong with subleq and c compilation to it.

Schol-R-LEA · Post by **Schol-R-LEA** » Wed Aug 09, 2017 12:09 pm

Geri wrote:there is nothing wrong with subleq

As a theoretical tool for discussing computability? There certainly isn't. As an interesting hobby problem? Sure. As a practical computing platform? I think Gene Amdahl would disagree.

Geri wrote:and c compilation to it.

I think you need to reassess this, assuming you actually gave it any thought at all.

Seriously, I am trying to help, here. You just aren't looking at what I am saying. Consider the difference in how you would write a loop to compute an array containing a series of factorials in C, in x86 assembly, and in OISC state transitions, and consider how you would change the compiler to emit the same code as the hand-written assembly, versus how you would need to change it to emit the same state transitions for OISC as the hand-written version.

Even with no significant optimizations, the C compiler's original output targeting x86 would probably be quite close to the hand-written version. The same would almost certainly not be the case for OISC, unless you were deliberately mimicking the x86 style of assembly programming.

In fact, if you would like me to, i would be happy to help you work that out. I think you would find it eye-opening. Let's start with the C code:

Code: Select all

/* factArray() - takes a pointer to an empty int array
and an array size 'n', and populates the array
with the factorials from 0 to n-1.
*/
unsigned long long* factArray(unsigned long long dest[], unsigned int n)
{
    unsigned int a;
    dest[0] = 1;

    for (a = 1; a < n; a++)
    {
        dest[a] = a * dest[a - 1];  /* computes a! */
    }

    return dest;
}

hgoel · Post by **hgoel** » Wed Aug 09, 2017 1:54 pm

Schol-R-Lea, I think by now you probably ought to give up and just ignore Geri's ignorance and his unwillingness to address his ignorance. A person who does not understand and does not wish to understand the realities of modern processor design cannot be reasoned with.

Schol-R-LEA · Post by **Schol-R-LEA** » Wed Aug 09, 2017 4:14 pm

You are probably right. Still, I did want to try to get through his thick skull one more time.

As a hint on the previous problem, a more efficient version of the C code might be:

Code: Select all

/* factArray() - takes a pointer to an empty int array
and an array size 'n', and populates the array
with the factorials from 0 to n-1.
*/
unsigned long long* factArray(unsigned long long dest[], unsigned int n)
{
  unsigned long long *curr, *end, a_1, a;
  dest[0] = a_1 = 1;

  for (curr = dest + 1, end = dest + n, a = 1; curr < end; curr++, a++)
  {
    a_1 *= a;  /* computes a! */
    *curr = a_1;
  }

  return dest;
}

Now, you may well be wondering why in <insert your deity here>'s name you would do this, but in point of fact, this sort of pointer juggling is the classic C programming style, mainly because it avoids a lot of recalculation of the indices and offsets and uses fewer memory-to-memory operations.

I set up a test harness that calls the function 100 million times, and compiled both versions, once each with and without full optimization (using GCC v.5.4) Let's see how it performs on my laptop (Thinkpad T410, i5 @ 2.5GHz, Linux Mint 18.2 x86-64).

Code: Select all

Each sample counts as 0.01 seconds.

First version, unoptimized:
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
 98.55      7.15     7.15 100000000    71.45    71.45  factArray
  2.23      7.31     0.16                             main

First version, with full optimizations on
100.82      5.59     5.59                             factArray
  0.55      5.62     0.03                             frame_dummy

Second version, no optimizations
 92.28      6.43     6.43 100000000    64.32    64.32  factArray
  8.69      7.04     0.61                             main

Second version, full optimizations on
 99.10      1.37     1.37                             factArray
  2.20      1.40     0.03                             frame_dummy

As you can see, the second version performs slightly but than the first when both have optimizations off; the first is slightly better with optimizations than both of those, but the second with full opts blows all of them out of the water - a sevenfold increase over the original with no optimizations.

So why is this relevant? Because the second version is much closer to what any sensible x86 assembly programmer would write. I haven't looked at the source output for these yet, because I want to finish writing the hand-coded assembly version first, but the point is, even a small change in the way the code is written makes a huge difference in performance. The size differences aren't significant, in this case, as most of the executable size in all cases is in the linked static libraries; they all are around 12 KiB. However, there are similar size optimizations which can exist for realistic code which would be reflected in many cases as well.

Mind you, this isn't even close to realistic code, but it does serve to illustrate my point, which is that all of these differences come down to how often the program accesses memory, versus how often it can do all the operations in registers. Without looking, I can reasonably surmise that the differences in the optimized and unoptimized versions of the second version will come down to how the extensively the compiler applied register painting to reduce memory accesses.

What does all of this mean with regards to OSIC? Simple: on an OISC, I would expect that the tweaked version would run about the same as the original, or perhaps slightly worse. This isn't because of the limitations of OISC (not entirely, at least), but has more to do with the assumptions C makes about the underlying hardware. C is designed for a CPU with enough registers to hold most of the temporary values, and a hardware stack for those that won't fit in them; the language design itself optimizes for that use case.

Getting C to compile efficiently on something without explicit hardware registers, like a stack machine (think the JVM, though hardware-based stack machines have existed) or an OISC, is an uphill struggle because the language sacrifices abstraction in favor of mirroring register hardware in order to squeeze more performance out of the code. While this has worked out on most hardware designed after it was developed, it simply is a poor fit for a system without explicit registers and a hardware stack.

In other words, if you really think that OISC is better hardware, you need to find better software development techniques that take advantage of it. Conventional approaches to programming won't do, at least not for the low-level system software.

Notturno · Post by **Notturno** » Wed Aug 09, 2017 4:49 pm

Guys calm down. Geri is not stupid or ignorant. He is simply religiously devoted to Subleq. Sooner or later this religious devotion will go off and he will realize the shortcomings of Subleq.

Schol-R-LEA · Post by **Schol-R-LEA** » Wed Aug 09, 2017 5:40 pm

Regardless, I do want to finish what I've been working out, and hear what he has to say about it.

Here is the hand-coded assembly version I wrote:

Code: Select all

// factArray in hand-coded assembly language
        
        .globl factArray
        .extern factArray

factArray:
        // in the x86-64 calling convention the first six
        // arguments are passed in registers, so there is no
        // need to build a stack frame.
        // %rdi == ptr to array of 64-bit quadwords
        // %rsi == array size
        //
        // %rax is the return value
        // %rbx is the value being computed
        // %rcx is the number whose factorial is being computed
        // %rdi is the pointer to the current insertion point
        // %rsi is the endpoint of the array

        // set the starting point aside to return it later
        mov %rdi, %rax
        
        // find an offset to the endpoint of the array
        // then compute the actual endpoint
        imulq $8, %rsi
        addq %rdi, %rsi

        // initialize the accumulated value and the counter
        movq $1, %rbx           
        movq $0, %rcx
        jmp .loop_test

.populate_loop:
        inc %rcx
        imulq %rcx, %rbx

.loop_test:
        movq %rbx, (%rdi)
        addq $8, %rdi
        cmp %rdi, %rsi
        jg .populate_loop

.loop_exit:     
        ret

And here is the gprof results for it:

Code: Select all

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
 89.39      1.78     1.78                             factArray
 11.68      2.01     0.23                             main

As you can see, this is close to, but not quite as fast as, the optimized results for the second C version. Let's take a look at the source output for that:

Code: Select all

	.p2align 4,,15
	.globl	factArray
factArray:
.LFB0:
	.cfi_startproc
	movl	%esi, %esi
	leaq	8(%rdi), %rcx
	movq	%rdi, %rax
	leaq	(%rdi,%rsi,8), %rdx
	movq	$1, (%rdi)
	cmpq	%rcx, %rdx
	jbe	.L6
	subq	%rdi, %rdx
	movl	$1, %ecx
	leaq	-9(%rdx), %rsi
	movl	$1, %edx
	shrq	$3, %rsi
	addq	$2, %rsi
	.p2align 4,,10
	.p2align 3
.L3:
	imulq	%rdx, %rcx
	movq	%rcx, (%rax,%rdx,8)
	addq	$1, %rdx
	cmpq	%rsi, %rdx
	jne	.L3
.L6:
	rep ret
	.cfi_endproc

While this seems pretty different on the surface, it is actually quite similar, just doing basically the same things using slightly different instructions and with some tweaks to things such as instruction alignments.

compare this to the other unoptimized version, and the other versions (opt and unopt)

version 1, no opt:

Code: Select all

		.globl	factArray
	.type	factArray, @function
factArray:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movq	%rdi, -24(%rbp)
	movl	%esi, -28(%rbp)
	movq	-24(%rbp), %rax
	movq	$1, (%rax)
	movl	$1, -4(%rbp)
	jmp	.L2
.L3:
	movl	-4(%rbp), %eax
	leaq	0(,%rax,8), %rdx
	movq	-24(%rbp), %rax
	addq	%rax, %rdx
	movl	-4(%rbp), %ecx
	movl	-4(%rbp), %eax
	subl	$1, %eax
	movl	%eax, %eax
	leaq	0(,%rax,8), %rsi
	movq	-24(%rbp), %rax
	addq	%rsi, %rax
	movq	(%rax), %rax
	imulq	%rcx, %rax
	movq	%rax, (%rdx)
	addl	$1, -4(%rbp)
.L2:
	movl	-4(%rbp), %eax
	cmpl	-28(%rbp), %eax
	jbe	.L3
	movq	-24(%rbp), %rax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

version 1, opt:

Code: Select all

		.p2align 4,,15
	.globl	factArray
	.type	factArray, @function
factArray:
.LFB0:
	.cfi_startproc
	testl	%esi, %esi
	movq	%rdi, %rax
	movq	$1, (%rdi)
	movl	$1, %edx
	je	.L7
	.p2align 4,,10
	.p2align 3
.L5:
	leal	-1(%rdx), %edi
	movl	%edx, %ecx
	addl	$1, %edx
	movq	%rcx, %r8
	imulq	(%rax,%rdi,8), %r8
	cmpl	%edx, %esi
	movq	%r8, (%rax,%rcx,8)
	jnb	.L5
.L7:
	rep ret
	.cfi_endproc

version 2, no opt:

Code: Select all

	.globl	factArray
	.type	factArray, @function
factArray:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movq	%rdi, -40(%rbp)
	movl	%esi, -44(%rbp)
	movq	$1, -24(%rbp)
	movq	-40(%rbp), %rax
	movq	-24(%rbp), %rdx
	movq	%rdx, (%rax)
	movq	-40(%rbp), %rax
	addq	$8, %rax
	movq	%rax, -32(%rbp)
	movl	-44(%rbp), %eax
	leaq	0(,%rax,8), %rdx
	movq	-40(%rbp), %rax
	addq	%rdx, %rax
	movq	%rax, -8(%rbp)
	movq	$1, -16(%rbp)
	jmp	.L2
.L3:
	movq	-24(%rbp), %rax
	imulq	-16(%rbp), %rax
	movq	%rax, -24(%rbp)
	movq	-32(%rbp), %rax
	movq	-24(%rbp), %rdx
	movq	%rdx, (%rax)
	addq	$8, -32(%rbp)
	addq	$1, -16(%rbp)
.L2:
	movq	-32(%rbp), %rax
	cmpq	-8(%rbp), %rax
	jb	.L3
	movq	-40(%rbp), %rax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

Now, I won't pretend that I am an expert on OISC, or even on x86, but I am pretty certain that the way your compiler would implement these would be to emit blocks of code corresponding to the individual C operations. While this makes sense when said operations have exact or near-exact analogs in the instruction set, it makes no sense to do this if each operations requires a page of code to implement, and a jury-rigged 'procedure call' operation to invoke.

For an implicit-state machine, imitating a register machine is a dead end approach. You need to be generating code that works with the state machine, not against it. The factorial operation I've been could probably be generated as less than a two pages of OISC states - but only if it is written in a notation that doesn't try to mimic something else when doing so.

I can't promise you will find a better programming model for OISC, nor am I convinced that doing so will be sufficient to make OISC practical as a hardware platform, but even just trying to find one would be well worth your trouble if you are unwavering in your decision to follow this path. Even if it fails, it will have contributed to your knowledge of what is and isn't practical with an OISC.

Geri · Post by **Geri** » Thu Aug 10, 2017 4:11 am

Schol-R-LEA:

-when you say that a hand optimized binary (or in this case, an assembly code) is somewhat more clock efficient than C, you obviously have right, but nowdays people are using high-level languages, like C. therefore, promoting of the using of a low level language is a false dilemma.

-what you are saying before, is that a more basic low-level language would suit the subleq better than actually a C compiled code. which is not true: if you would write your algorithms directly in some higher subleq variant, your code would need huge time to be written, and the result still would be just as garbage as a C compiler outputted code, even if its minimally slower.

-basically, its futile if you compare the possibilities of writing a pure and fast assembly code on x86 with the lack of such possibilities on subleq, as this only can be a theocretical viewpoint due to the fact people are using C (and other high-level languages).

-meanwhile you are replied this to my unoptimized emulator port running on arm when compared to running the emu on x86. so your comment has nothing to do with the comment you replied to. but lets investigate it further:

-c compilers are generating garbage on all platform, for example, a hand-created naive bswap on arm is 100 opcode, meanwhile you can do it from 10 if you write it in assembly. so i dont see how c would kill the performance specifically on osics, where it can certainly kill the performance everywhere else just as simply in some cases (yes, this will be just as crappy on subleq). .

-the difference bethween gcc/clang/whatever/other generic compilers, and bethween the C compiler in Dawn is that the compiler in Dawn is designed only to be generating SUBLEQ bytecode, it was designed to generate efficient SUBLEQ code, code that suits subleq well (with the aknowledgement of not having registers, instructions, hardware accelerated stack or offsets) and therefore will compile optimized code enough.

-its 2017, the performance of a platform is decided only by how big the performance of programs written in high-level-languages. optimized programs, of course, where the creators are aware of the platform, and writing good and fast codes. but still high-level codes. even if there would been a way to create a low-level language for subleq, nobody would learn it and use it in practice, as programmers will use high level languages.

-if you compare an emulated x86/arm and dawn bytecode (emulated - as there are no dawn compatible fpga-s yet to do clock by clock comparisons with), the overall performance is somewhat similar in the most cases. you can find and do testcases where one would be much faster than the another one (like your fibonacci counter posted above would be faster on x86) - but the overall performance of a typical software would be just as fast everywhere. displaying a right click menu in dawn os will consume just as many cpu cycles as displaying a right click menu on an ARM linux system (like with will consume 600 million clock cycles or so on both platforms), there is no significant differences.

OSDev.org

Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn

Re: Dawn