OSDev.org

Posted: **Sun Jan 01, 2012 6:53 am**

Brendan wrote:When a reset is requested by the user (e.g. they close all their applications and ask the OS to reset the computer so they can boot a different OS); the system shouldn't be unstable. When a reset is requested by software (e.g. maybe some sort of kernel upgrade thing?) then the system still shouldn't be unstable.

The only case where the OS wouldn't be stable is if the OS was unstable before the reset was requested. For example, if your kernel crashes and the "kernel panic" requests a reboot. Hopefully this never happens in official releases because you're potentially screwed regardless of what you do.

There is no user involved in RESETing the systems we have deployed. They have no keyboard, so nobody can do a user requested reset. What end-users eventually do as a last resort is to turn-off power.

The resets that happens are of the following types:

1. Remote-resets from our management system. If that fails, you need to call service-personell to the site (expensive).

2. Automatic resets because of program bugs in application or kernel. These are triggered by faults (typically page fault or protection fault). When these faults happen in the application, the system state generally stable. When they happen in kernel, the system might be unstable. Currently, most of these faults are in the application, but some are in device-drivers.

3. Kernel panics. These happens in the scheduler, so the system is generally unstable. We don't know how many there are of these as they leave no traces other than reinstalls. Probably quite uncommon.

4. Upgrades. When the application or kernel is upgraded, the system must be restarted.

5. Hard-resets that turn off the power for the complete system when some external component has hung-up. This is done by dedicated hardware from an OS-driver.

Posted: **Sun Jan 01, 2012 7:06 am**

Owen wrote:The ACPI reset vector is, guess what, an I/O port or memory write! You query ACPI for it at system startup and note it down.

That is correct. You could hardcode it into the RESET-code segment, and then it would be safe to use. However, the current release we deploy does not have the ACPI-driver loaded, and so knows nothing about ACPI reset. But I might very well build this into future releases.

Edit: Writing to a physical memory address is more problematic though. It assumes you setup a virtual memory address at boot-time that points to the reset-memory address, and then record the virtual address. I suppose that would also be relatively safe if done like that.

Owen wrote:However, you touch my nonexistent keyboard controller and you'll suddenly find that your code is no longer executing; indeed, you'll find yourself completely hung. After all, my machine doesn't have a keyboard controller, nor does it claim to, so if you don't note down whether ACPI said there was an 8042 or not you'll bork my machine.

That seems like a fatal design-flaw. Code should not hang because it touches unknown IO-addresses!

Posted: **Sun Jan 01, 2012 8:57 am**

rdos wrote:
Owen wrote:However, you touch my nonexistent keyboard controller and you'll suddenly find that your code is no longer executing; indeed, you'll find yourself completely hung. After all, my machine doesn't have a keyboard controller, nor does it claim to, so if you don't note down whether ACPI said there was an 8042 or not you'll bork my machine.
That seems like a fatal design-flaw. Code should not hang because it touches unknown IO-addresses!

You touch the I/O address, the ICH captures that write and prods the ACPI embedded controller, and the embedded controller does who-knows-what because it wasn't programmed to respond to that address because this is a legacy free machine. What doesn't happen is the embedded controller signaling the completion of the IO transaction, and so that core is hung, forever. Also the embedded controller's IO port interface is now locked up so the next time another core tries to interact with the embedded controller, that hangs.

This is all speculation, of course, but the situation remains: You're interacting with the embedded controller in ways it was never expecting to be interacted with.

And that the machine hangs is, at the end of the day, fine, because you shouldn't be touching random IO addresses anyway (Note that it is well within the rights of the system's ACPI tables to tell you that there is no ISA legacy hardware whatsoever! I expect to see that bit being set more and more often in the coming years)

Posted: **Sun Jan 01, 2012 9:16 am**

Owen wrote:You touch the I/O address, the ICH captures that write and prods the ACPI embedded controller, and the embedded controller does who-knows-what because it wasn't programmed to respond to that address because this is a legacy free machine. What doesn't happen is the embedded controller signaling the completion of the IO transaction, and so that core is hung, forever. Also the embedded controller's IO port interface is now locked up so the next time another core tries to interact with the embedded controller, that hangs.

It shouldn't. It should just ignore the write. For a read, it could just return random junk. That is how this was done on older systems. It is easy to envision how "buggy" device-drivers could try to interact with non-existent IO-addresses. Ponder that you look for PCI-serial adapters, and test various PCI bars. The BIOS might "forget" to initialize all the PCI bars correctly, so they contain random junk. If some hardware hangs because of this, that hardware is seriously broken!

Owen wrote:This is all speculation, of course, but the situation remains: You're interacting with the embedded controller in ways it was never expecting to be interacted with.

IMO, an embedded controller that hangs because of unknown IO-accesses is seriously broken. Legacy-free or not.

Owen wrote:And that the machine hangs is, at the end of the day, fine, because you shouldn't be touching random IO addresses anyway (Note that it is well within the rights of the system's ACPI tables to tell you that there is no ISA legacy hardware whatsoever! I expect to see that bit being set more and more often in the coming years)

Perhaps. However, legacy-free hardware should be gracious as to what it does with unknown IO-accesses. Whether these fall in the legacy-range or not. ACPI is such a complex environment, and so is PCI, that we almost except various BIOSes to produce incorrect information. If the hardware is not gracious about this, and the software doesn't do sanity-checks, we'll see a lot of hangs for no apparent reason.

Posted: **Sun Jan 01, 2012 9:37 am**

Saying hardware "should" do something is all well and good... but in practice,

The only behavior that you can rely on is that of recent versions of Windows
Most BIOSes are broken if your response from _OS is not "Microsoft Windows NT" (they go into some "This OS supports *nothing* mode")
Most BIOSes think you are ancient if you do not answer to the _OSI calls defined by the Windows WHQL ACPI specifications (For example, if you want to see the HPET, you will often have to respond affirmative to _OSI(“Windows 2006”), because otherwise the BIOS will go into Windows XP compatibility mode and hide it (so it doesn't show up as an "unknown device" in the XP device manager)). If you don't respond to any of the WHQL ACPI methods, the BIOS will probably again go into "This OS supports *nothing* mode"

In particular, most systems are designed for running one, and only one, operating system: Some recent version of Windows. In many regards, this machine is better than most, for it was designed for running two (Of course, thats hardly surprising; it is, after all, a Mac).

Buggy drivers shouldn't be allowed to interact with just any address. This is why the IO permission bitmap was invented...

Posted: **Sun Jan 01, 2012 9:50 am**

Hi

Note: I re-ordered things for clarity...

rdos wrote:
Brendan wrote:When a reset is requested by the user (e.g. they close all their applications and ask the OS to reset the computer so they can boot a different OS); the system shouldn't be unstable. When a reset is requested by software (e.g. maybe some sort of kernel upgrade thing?) then the system still shouldn't be unstable.
1. Remote-resets from our management system. If that fails, you need to call service-personell to the site (expensive).

A remote user is a user. A remote reset from your management system is a request by the (remote) user to reset. The system shouldn't be unstable in this case.

rdos wrote:4. Upgrades. When the application or kernel is upgraded, the system must be restarted.

The system shouldn't be unstable in this case.

rdos wrote:5. Hard-resets that turn off the power for the complete system when some external component has hung-up. This is done by dedicated hardware from an OS-driver.

The system itself (rather than the external component) shouldn't be unstable in this case.

rdos wrote:
Brendan wrote:The only case where the OS wouldn't be stable is if the OS was unstable before the reset was requested. For example, if your kernel crashes and the "kernel panic" requests a reboot. Hopefully this never happens in official releases because you're potentially screwed regardless of what you do.
2. Automatic resets because of program bugs in application or kernel. These are triggered by faults (typically page fault or protection fault). When these faults happen in the application, the system state generally stable. When they happen in kernel, the system might be unstable. Currently, most of these faults are in the application, but some are in device-drivers.

Unstable crap shouldn't be running on production systems to begin with (and if reset fails on a development/test machine it's not like anyone is going to care much).

rdos wrote:3. Kernel panics. These happens in the scheduler, so the system is generally unstable. We don't know how many there are of these as they leave no traces other than reinstalls. Probably quite uncommon.

Unstable crap shouldn't be running on production systems to begin with.

Cheers,

Brendan

Posted: **Sun Jan 01, 2012 11:22 am**

Owen wrote:Saying hardware "should" do something is all well and good... but in practice,

The only behavior that you can rely on is that of recent versions of Windows

Most BIOSes are broken if your response from _OS is not "Microsoft Windows NT" (they go into some "This OS supports *nothing* mode")

Most BIOSes think you are ancient if you do not answer to the _OSI calls defined by the Windows WHQL ACPI specifications (For example, if you want to see the HPET, you will often have to respond affirmative to _OSI(“Windows 2006”), because otherwise the BIOS will go into Windows XP compatibility mode and hide it (so it doesn't show up as an "unknown device" in the XP device manager)). If you don't respond to any of the WHQL ACPI methods, the BIOS will probably again go into "This OS supports *nothing* mode"

Really ugly.

I wonder if this is why HPET isn't listed on some of my machines, even if it is present? I also wonder why I seem to have no processor-nodes. Maybe this is because ACPI doesn't believe I handle processors? What is the magic way to achieve a list of processors?

Owen wrote:Buggy drivers shouldn't be allowed to interact with just any address. This is why the IO permission bitmap was invented...

The IO permission bitmap is for non-kernel only, and doesn't apply to my device-drivers that run at ring 0. Additionally, the IO permission bitmap is part of the TSS, so if you use software task-switching, you might not even have per-thread IO permission bitmaps.

Posted: **Sun Jan 01, 2012 11:28 am**

Brendan wrote:Unstable crap shouldn't be running on production systems to begin with (and if reset fails on a development/test machine it's not like anyone is going to care much).

Well, if you like "blue screens" or hangup systems better, certainly. It is actually a great feature to be able to do an automatic reboot instead of showing a "blue screen" or "your application encountered a pagefault at x0FF0567F5, press any key to continue".

The worse case we had were USB-serial converter hangups at one sight which the auto-reboot couldn't detect. Go figure if the end-customers were angry about this!

Posted: **Sun Jan 01, 2012 2:11 pm**

rdos wrote:
Owen wrote:Saying hardware "should" do something is all well and good... but in practice,

The only behavior that you can rely on is that of recent versions of Windows

Most BIOSes are broken if your response from _OS is not "Microsoft Windows NT" (they go into some "This OS supports *nothing* mode")

Most BIOSes think you are ancient if you do not answer to the _OSI calls defined by the Windows WHQL ACPI specifications (For example, if you want to see the HPET, you will often have to respond affirmative to _OSI(“Windows 2006”), because otherwise the BIOS will go into Windows XP compatibility mode and hide it (so it doesn't show up as an "unknown device" in the XP device manager)). If you don't respond to any of the WHQL ACPI methods, the BIOS will probably again go into "This OS supports *nothing* mode"

Really ugly.

I wonder if this is why HPET isn't listed on some of my machines, even if it is present? I also wonder why I seem to have no processor-nodes. Maybe this is because ACPI doesn't believe I handle processors? What is the magic way to achieve a list of processors?

Perhaps: lie that you're "Microsoft Windows NT" in response to _OS(), and then start declaring that you conform to the various Windows ACPI models (Starting with Windows 2000, and declaring everything up to and including Windows 7 - following Microsoft's document on how to detect what version of Windows is running using _OSI). Also make sure you're returning affirmative for various ACPI features (e.g. "3.0 Thermal Zone")

Posted: **Mon Jan 02, 2012 5:58 am**

Owen wrote:Perhaps: lie that you're "Microsoft Windows NT" in response to _OS(), and then start declaring that you conform to the various Windows ACPI models (Starting with Windows 2000, and declaring everything up to and including Windows 7 - following Microsoft's document on how to detect what version of Windows is running using _OSI). Also make sure you're returning affirmative for various ACPI features (e.g. "3.0 Thermal Zone")

I added "Processor Devices" and "3.0 Thermal Zone". ACPICA aldready pretends to be Windows NT, and all versions of Windows seems to be listed (including Windows 7).

Anyway, I found the processor devices, but I hadn't listed them before as they are a different object type. I only listed devices, not processors.

OSDev.org

[Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?

Re: [Newb question]How to shutdown and reboot the computer?