Ok, so I'm feeling lucky: Neon pre-alpha preview 1

This forums is for OS project announcements including project openings, new releases, update notices, test requests, and job openings (both paying and volunteer).
Post Reply
mystran

Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

It seems I'm feeling lucky at this point.
So here's a kernel, that is largely rewritten, and doesn't do all that it used to do, but what it does it hopefully does better.

Namely: no userspace
Because no userspace, there isn't much, since we've a microkernel.

But I've cheated a bit, and added a keyboard handler that dumps scancodes, and a console driver (serial debug output isn't enabled for the preview build).

It also won't run on 486 (build with large pages support, not autodetected yet). So at least Pentium please.

What to expect then?

Well, I've got these testing kludges in place:
- A "ticker" on the top left corner of the screen. Should turn every second.
- A "busyness" indicator on the top right corner of the screen. Dot for idle, star for busy.
- A CPU hog. Sleeps 5 seconds, then does awful amount of busylooping. Repeat 3 times. Then causes a (friendly) panic. Notice that the hog runs at lower priority so if you notice any slowdowns while it's running (see the busy-indicator) then I'd be interested.
- If you hit the keyboard, you should get scancodes on the screen. If you hit it enough, you should see the screen scrolling. And yes, it assumes 80x25, so if that's not what we have then it'll probably do funny things with the screen.
- If it crashes before hogtask tells it's done, then I'd be interested.

Both a kernel image and a floppy image are provided. The kernel should be booted with multiboot compliant bootloader. The floppy image boots it with GRUB (surprise).

http://www.cs.hut.fi/~tvoipio/files/neo ... 0050616.gz
http://www.cs.hut.fi/~tvoipio/files/neo ... 616.img.gz
Take the one below with better pagefault reporting instead.
AxelDominatoR

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by AxelDominatoR »

Tested on Bochs, it seems to run as espected, except that the star "busyness" indicator is always on.

Unfortunately I cannot test on real hardware ( don't have floppy reader on my notebook )

Nice OS :D

Axel
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by Brendan »

Hi,


Some results...

[tt]COMPUTER E - Cyrix 6x86 (pentium) 133Mhz
1.44 Mb floppy
2 GB hard drive
CD-ROM using panasonic/creative connector
Unknown PCI video card
NE2000 compatible ethernet card

COMPUTER H - 160Mhz Cryix/Pentium
16Mb Memory
1.44 Mb floppy
514Mb Hard disk
CDrom using IDE interface
Cirrus Logic 5430/40 PCI video card
ESS ES688 Audiodrive sound card
NE2000 compatible ethernet card

COMPUTER N - AMD-K6(tm) 3D processor 300 Mhz
64 Mb memory
24x CD-rom
1335Mb hard disk
S3 video card
ethernet card[/tt]

Rebooted during boot (after GRUB loads stage 2) - these CPUs probably might not support large page sizes, but it didn't even get to the GRUB boot menu(!?).



[tt]COMPUTER O - Pentium Pro 200 MHz
128 Mb RAM
Adaptec AHA-2940 SCSI controller
Compaq ST32550W SCSI hard-drive
IBM DCAS-34330W SCSI hard-drive
IBM DGHS09U SCSI hard-drive
Sony CDU-76S SCSI CD-ROM
Ethernet on motherboard[/tt]

On the first boot it loaded OK, but crashed after the virtual manager init:
- Memory available: 0x07EF4000 bytes

Kernel panic in vm.c: PAGE FAULT!
0x0010146D <- ip ... stack -> 0x00105F64
0x0010006D <- ip ... stack -> 0x00000000
Kernel Halted.
Further attempts at booting had the same results..



[tt]COMPUTER Q (Compaq Proliant 1600) - Dual Pentium II 400 MHz
384 Mb RAM
Dual-channel SCSI on motherboard
5 * 4.5 Gb SCSI hard-drives
IDE CD-ROM
Ethernet on motherboard

WORK MACHINE - Pentium IV 1.6 Ghz
256 Mb RAM
40 Gb hard drive
8 Gb hard drive
NVIDEA RIVA TNT2 Model64 Pro video card
NETGEAR FA311/FA312 PCI Ethernet card[/tt]

For these computers everything worked correctly. Busy for a little, then 5 seconds of idle (where keyboard scan codes are displayed) followed by the intentional panic :).


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
mystran

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

Brendan wrote: Rebooted during boot (after GRUB loads stage 2) - these CPUs probably might not support large page sizes, but it didn't even get to the GRUB boot menu(!?).
Crap. I think I know what this is for. Forgot that the GRUB on my floppy image is compiled for athlon and refuses to run on anything pre-pentiumpro. Oh, well.. have to fix that at some point. I'll fix it when I've got the large-pages detection working so that there's some point in testing it with old computers.
[tt]COMPUTER O - Pentium Pro 200 MHz
128 Mb RAM
Adaptec AHA-2940 SCSI controller
Compaq ST32550W SCSI hard-drive
IBM DCAS-34330W SCSI hard-drive
IBM DGHS09U SCSI hard-drive
Sony CDU-76S SCSI CD-ROM
Ethernet on motherboard[/tt]

On the first boot it loaded OK, but crashed after the virtual manager init:
- Memory available: 0x07EF4000 bytes

Kernel panic in vm.c: PAGE FAULT!
0x0010146D <- ip ... stack -> 0x00105F64
0x0010006D <- ip ... stack -> 0x00000000
Kernel Halted.
Further attempts at booting had the same results..
Interesting. Could you send me the memory map that the kernel prints. It is theoretically possible that I'm doing something that Pentium Pro doesn't support, but even the global pages stuff is only for optimization and isn't relied on so.

What makes this panic rather weird is that the message (and stack trace) indicates that the page fault comes from the usermode pagefault handler. Now, there shouldn't be any usermode at all (ATM), so why on earth am I getting a pagefault with the user-bit set in status code???

Oh, and seems I should make the stack-dumper understand interrupt frames as well.
[tt]COMPUTER Q (Compaq Proliant 1600) - Dual Pentium II 400 MHz
384 Mb RAM

WORK MACHINE - Pentium IV 1.6 Ghz
256 Mb RAM

For these computers everything worked correctly. Busy for a little, then 5 seconds of idle (where keyboard scan codes are displayed) followed by the intentional panic :).
Good.

Thanks a lot for testing.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by Brendan »

Hi,
mystran wrote:Interesting. Could you send me the memory map that the kernel prints.
Yes :)
Interrupt setp. IDT at: 0x00104000
Virtual manager init:
- Building initial page directory
- Page directory at 0x00109000
- Enabling paging..done
- Enabling global pages...done
- Current context: 0x00109000
- Multiboot memory map:
- address: 0x00000000 length: 0x0009FC00 type: 0x01
- address: 0x0009FC00 length: 0x00000400 type: 0x02
- address: 0x000E0000 length: 0x00020000 type: 0x02
- address: 0x00100000 length: 0x07F00000 type: 0x01
- address: 0xFFFE0000 length: 0x00020000 type: 0x02
- Memory available: 0x07EF4000 bytes

Kernel panic in vm.c: PAGE FAULT!
0x0010146D <- ip ... stack -> 0x00105F64
0x0010006D <- ip ... stack -> 0x00000000
Kernel Halted.
mystran wrote:It is theoretically possible that I'm doing something that Pentium Pro doesn't support, but even the global pages stuff is only for optimization and isn't relied on so.
The Pentium Pro supports the following features:

[tt] FPU      0x00000001
VME      0x00000002
DE      0x00000004
PSE      0x00000008
TSC      0x00000010
MSR      0x00000020
PAE      0x00000040
MCE      0x00000080
CX8      0x00000100
APIC      0x00000200
SEP      0x00000800
MTRR      0x00001000
PGE      0x00002000
MCA      0x00004000
CMOV      0x00008000[/tt]


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
mystran

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

Could you try this floppy image instead. I added some var dumping into the page fault placeholder, so that it prints where the code was executing when it faults, and what was the faulting address.

http://www.cs.hut.fi/~tvoipio/files/neo ... 617.img.gz

Other than the extra printing it's the same.

Now, it's strange: I get the 'user' bit set in #PF errorcode when I intentionally cause a pagefault in kernel. Have to figure out why that happens. It is possible that I'm doing something wrong in the entry, but that part of the code used to work just fine, and is a few years old now... so..
mystran

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

AxelDominatoR wrote: Tested on Bochs, it seems to run as espected, except that the star "busyness" indicator is always on.
Are you sure about this?

The "awful amount of busylooping" takes a lot longer than the sleep when I'm running it under bochs on this 2GHz Athlon64, so with anything slower, it's going to run a long time.

Could take it away actually, or replace it with something like:

while true:
sleep(3)
t0 = get_ticktime
while(get_ticktime - t0 < 3sec): pass

Then it would be CPU-speed independent.. I'll do that for the next test version I think.

edit: Oh, and please notice that it will leave the busyness indicator 'on' when it panics, even if in reality it will stop the machine.
mystran

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

Ah. Part of the problem solved. I was testing the R/W bit and not the user bit.

So what we know is that it was a write that caused the fault. Which is hardly much.
mystran

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

At least it works better for average machine than for my own. For some reason my desktop tripplefaults after successful initialization, and a "spurious IRQ7" (which it doesn't attempt to handle in any way).
mystran

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

Ok, so I find it strange that the kernel DID work on so many machines :D

The code that was supposed to zero pre-allocated kernel page tables, was zeroing some totally random piece of memory, and the only reason it worked on the computers that it worked on, was that the entries for the heap-region allocated HAPPENED to have the present bit zero. Oh well.

The exact point of failure would vary based on what the computer would have in the piece of memory relevant.

Could you Brendan test if it still crashes on that unfortunate box?

http://www.cs.hut.fi/~tvoipio/files/neo ... 7-2.img.gz

If that still fails, then .. hmmh.. back to drawing board I guess.

Oh: there's no intentional panic anymore. Instead the hogtask polls timer to busyloop 5 seconds, then sleep 5 seconds, and so on.
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by Brendan »

Hi,

I re-tested COMPUTER O (the Pentium Pro 200 MHz) and it all works :).

I'm getting Hogtask displaying "Hogtask 5sec loop: i = 0x76??????" every 5 seconds, where "??????" is different each time. The busy indicator is toggling between '.' and '*' around 2.5 times a second too...

It looks like my OS is the only one broken now - 26 hours of debugging and still counting :-[.


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
AxelDominatoR

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by AxelDominatoR »

Tried your last release on Bochs 2.2
The busyness indicator works ok now :)

Great work!

Axel
mystran

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by mystran »

Brendan wrote: I re-tested COMPUTER O (the Pentium Pro 200 MHz) and it all works :).
Great. And thanks a lot.
I'm getting Hogtask displaying "Hogtask 5sec loop: i = 0x76??????" every 5 seconds, where "??????" is different each time. The busy indicator is toggling between '.' and '*' around 2.5 times a second too...
Toggling on and off with more or less equal delays, or blinking to * once in a while? You get some *-blinks when it schedules the ticker task, which updates the rotating thing. The * should toggle on after 5 seconds, then after another 5 seconds you get the "hogtask..." where i is simply the number of iterations it run in the busyloop until the timer claimed 5 seconds had elapsed.

Although it will overwrap on a fast machine, it was there to see if priorities work properly so it's not important). Then it should sleep (with busyness '.') for another 5 seconds before busylooping again.

If you get the debug print every 5 seconds (and not every 10 seconds) then there's a timing problem, but I'll have to work the timer a bit in the future anyway, so I'm not going to care at this point really.
It looks like my OS is the only one broken now - 26 hours of debugging and still counting :-[.
What is the problem? Tripple fault, or something less serious? For tripple faults I've found it VERY useful to dump debugging outputs to serial port, and record that using a null-modem cable with another computer. Also useful for getting the whole log out of Bochs too (just set a comport to output into a file :))
User avatar
Brendan
Member
Member
Posts: 8561
Joined: Sat Jan 15, 2005 12:00 am
Location: At his keyboard!
Contact:

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by Brendan »

Hi,
mystran wrote: Toggling on and off with more or less equal delays, or blinking to * once in a while? You get some *-blinks when it schedules the ticker task, which updates the rotating thing. The * should toggle on after 5 seconds, then after another 5 seconds you get the "hogtask..." where i is simply the number of iterations it run in the busyloop until the timer claimed 5 seconds had elapsed.
It's a '.' for about 2.5 seconds then changes to '*', and after about 2.5 seconds changes back to '.' again. More technically it's "running at 0.2 HZ with a 50% duty cycle".

mystran wrote:
It looks like my OS is the only one broken now - 26 hours of debugging and still counting :-[.
What is the problem? Tripple fault, or something less serious? For tripple faults I've found it VERY useful to dump debugging outputs to serial port, and record that using a null-modem cable with another computer. Also useful for getting the whole log out of Bochs too (just set a comport to output into a file :))


I wish it was something like a triple fault, where there's a specific point of failure...

The problem is that it only locks up on one of my computers and runs fine on everything else. The point where it locks up has little to do with the code running at the time - it's an IRQ/timing/scheduler/IO APIC bug.

I think somehow it fails to read RTC register C to get/clear pending RTC interrupts and this is why IRQ8 stops, but I still don't know what's causing this (current theory is that the IDT entry for IRQ8 gets overwritten during IO APIC initialization, but this doesn't explain why it works for a while before stopping). I also haven't figured out why this would cause the OS to lock up, as IRQ8 isn't really used for much (and the scheduler and it's timer IRQ keeps ticking).

The reason it's been taking me so long to find is that the kernel does a lot of things to reduce latency and lock contention - a fully interruptable/pre-emptable multi-CPU kernel with a pile of lockless algorithms to complicate everything further. To make it worse I'm fairly sure there's other unrelated bugs somewhere that only occur on multi-CPU computers. The only computer I have that locks up is a multi-CPU computer - a dual Pentium II Compaq Proliant that sounds like a jumbo jet taking off, weighs almost as much as I do and reboots slower than anything I've ever owned due to a 5 disk SCSI RAID array.

I am getting closer though - I've got a version that does work without problems. Unfortunately, I had to put code to read RTC register C (to get/clear pending RTC interrupts) into the IRQ handler for IRQ 0 (the local APIC timer) to stop it locking up, and then had to remove the free page cleaning code from the idle threads to stop it causing critical errors. Neither of these solutions is acceptable, but it's a start...


Cheers,

Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
distantvoices
Member
Member
Posts: 1600
Joined: Wed Oct 18, 2006 11:59 am
Location: Vienna/Austria
Contact:

Re:Ok, so I'm feeling lucky: Neon pre-alpha preview 1

Post by distantvoices »

Have given Neon a test in a VirtualPC Emulation.

It boots up correctly, starts the hogtask and works happily. No delays whatsoever when the keyboard handler gets busy priting scancodes. It happily switches between star and dot in the upper right corner of the screen and some bar is circling in the upper left corner. Nice signs of activity I say.

@brendan: does your kernel/os work with qemu? (for giving it a test - virtualPC refuses to take your fd-image). as for heavy and endless bug searching: The more it evolves, the more hidden bugs you will find. *gg* don't despair. They usually show up when you don't expect them at all, and as sudden you find solutions for such problems. *gg* Just go for a hike 'n a chat.
... the osdever formerly known as beyond infinity ...
BlueillusionOS iso image
Post Reply