Machine identification
-
- Member
- Posts: 93
- Joined: Mon Nov 24, 2008 9:13 am
Machine identification
Hi all,
today I had some discussion on how to identify machines. After a while of discussing we had some ideas but none of them we found to be a really good one. So I'd like to pass the question on to you, our great brains here at the forums.
This is the scenario: We have an (x86-64) kernel, which should be configured with one out of two options. Depending on the machine the kernel is run, we want it to figure out which option to use. More precise, we currently have exactly one box - let's call it box X - that should run the kernel with option A (and some other machines - let's call them Y-boxes - that sould run with option B).
The question is: How can we determine that the kernel is running on machine X so our kernel should use option A.
We first thought about checking the processor's serial number, but this feature is disabled by the BIOS and there is no BIOS setup switch that turns it back on. And since disabling the serial number reporting is a write-once setting, we could not re-enable it ourselves.
There are ideas about pluggging in some random PCI card, that the Y-boxes don't use and to scan for it - if we find the PCI device, then it should be machine X; in absence of this specific device, we can assume to be on one of the Y-boxes. But hey, that's really not a nice solution, is it?
One of the nicer things would be to just check the MAC address of the NIC(s). But therefore we need to load a driver for these devices. Unluckily, we need to find out about on which box we are running in a very early stage, where drivers are not loaded or something (i.e. in the bootloader or short after it).
We thought about using some NVRAM places of the BIOS (CMOS), which are not used, to signal "this is box X"/"this is box Y". But - is this a proper solution. Will the BIOS complain about failed checksums or would the BIOS simple zero-out those ranges that are "unused" or something. I honestly don't know, I only don't trust such things.
So - which way would you do this (automatically, without user interaction like asking "If this is machine X, press X - otherwise Y")? Any ideas?
Your help would be much appreciated! Thanks in advance!
--TS
(Note: Yes it really has its uses, even though it might not seem so. Besides, I find it interesting enough to think about it and I'm wondering how others would solve it.)
today I had some discussion on how to identify machines. After a while of discussing we had some ideas but none of them we found to be a really good one. So I'd like to pass the question on to you, our great brains here at the forums.
This is the scenario: We have an (x86-64) kernel, which should be configured with one out of two options. Depending on the machine the kernel is run, we want it to figure out which option to use. More precise, we currently have exactly one box - let's call it box X - that should run the kernel with option A (and some other machines - let's call them Y-boxes - that sould run with option B).
The question is: How can we determine that the kernel is running on machine X so our kernel should use option A.
We first thought about checking the processor's serial number, but this feature is disabled by the BIOS and there is no BIOS setup switch that turns it back on. And since disabling the serial number reporting is a write-once setting, we could not re-enable it ourselves.
There are ideas about pluggging in some random PCI card, that the Y-boxes don't use and to scan for it - if we find the PCI device, then it should be machine X; in absence of this specific device, we can assume to be on one of the Y-boxes. But hey, that's really not a nice solution, is it?
One of the nicer things would be to just check the MAC address of the NIC(s). But therefore we need to load a driver for these devices. Unluckily, we need to find out about on which box we are running in a very early stage, where drivers are not loaded or something (i.e. in the bootloader or short after it).
We thought about using some NVRAM places of the BIOS (CMOS), which are not used, to signal "this is box X"/"this is box Y". But - is this a proper solution. Will the BIOS complain about failed checksums or would the BIOS simple zero-out those ranges that are "unused" or something. I honestly don't know, I only don't trust such things.
So - which way would you do this (automatically, without user interaction like asking "If this is machine X, press X - otherwise Y")? Any ideas?
Your help would be much appreciated! Thanks in advance!
--TS
(Note: Yes it really has its uses, even though it might not seem so. Besides, I find it interesting enough to think about it and I'm wondering how others would solve it.)
Re: Machine identification
well first off, to figure out what the CPU is, you need to read the values in the various CPUID return values. The Intel manuals have a ton of info regarding the CPUID instruction.
I'll probably provide more later, but I'm pressed for time.
I'll probably provide more later, but I'm pressed for time.
Website: https://joscor.com
- Combuster
- Member
- Posts: 9301
- Joined: Wed Oct 18, 2006 3:45 am
- Libera.chat IRC: [com]buster
- Location: On the balcony, where I can actually keep 1½m distance
- Contact:
Re: Machine identification
Well, what's the point of having one X and several Y specifically based on some index - it either means that the computers are significantly different for a reason (and you can just check for the thing that sets them apart), or they're identical and it technically doesn't matter which one is X and which ones are Y as long as you have exactly one.
You may want to try if using an election algorithm might be better suited for the task for that very reason.
You may want to try if using an election algorithm might be better suited for the task for that very reason.
Re: Machine identification
Hi,
SMBIOS might provide a valid/unique serial number and/or might have a valid/unique UUID in the System Information structure. However, I wouldn't expect either of these things to be present (it's much easier for BIOS/motherboard manufacturers not to bother).
If the computers have TPM chips, then these chips could be used to identify the special computer. I've only read a little about the TPM chip though (I'm not to sure about exact details of how this is done).
Ethernet card MAC addresses can be a good idea. If you know the ethernet cards are present in all machines and are the same type, then you only need a little code to get the MAC address (and don't need a full device driver).
Hard drives have serial numbers and model numbers (in the "Identify Drive Information" structure) which could be used like the ethernet card's MAC.
You could store some sort of magic marker on any storage device (for example, on one computer you could fill a sector on the hard disk with prime numbers and use this sector to determine if the computer is special or not).
You could create some sort of "dongle". Old fashioned dongles used the parallel port or the serial port, where software tests that certain pins are connected together or behave in a specific way. Depending on your requirements you might be able to do the same with a USB device instead (e.g. get a cheap USB flash stick and remove a specific pin on the main IC, then test for a faulty USB device). In this case you'd might need to regularly test if the device is still present so that people can't boot the special computer then shift the dongle to a different computer.
Another way would be to use a special boot CD, where the special computer needs to boot from the special boot CD and then lock the media in the drive (so that the CD can't be removed while the OS is running).
Mostly it depends why you need one computer to be treated differently (e.g. how tamper-proof it needs to be and how you plan to deal with hardware upgrades) and what hardware is present.
Cheers,
Brendan
I'll assume that the computers are identical...Hyperdrive wrote:So - which way would you do this (automatically, without user interaction like asking "If this is machine X, press X - otherwise Y")? Any ideas?
SMBIOS might provide a valid/unique serial number and/or might have a valid/unique UUID in the System Information structure. However, I wouldn't expect either of these things to be present (it's much easier for BIOS/motherboard manufacturers not to bother).
If the computers have TPM chips, then these chips could be used to identify the special computer. I've only read a little about the TPM chip though (I'm not to sure about exact details of how this is done).
Ethernet card MAC addresses can be a good idea. If you know the ethernet cards are present in all machines and are the same type, then you only need a little code to get the MAC address (and don't need a full device driver).
Hard drives have serial numbers and model numbers (in the "Identify Drive Information" structure) which could be used like the ethernet card's MAC.
You could store some sort of magic marker on any storage device (for example, on one computer you could fill a sector on the hard disk with prime numbers and use this sector to determine if the computer is special or not).
You could create some sort of "dongle". Old fashioned dongles used the parallel port or the serial port, where software tests that certain pins are connected together or behave in a specific way. Depending on your requirements you might be able to do the same with a USB device instead (e.g. get a cheap USB flash stick and remove a specific pin on the main IC, then test for a faulty USB device). In this case you'd might need to regularly test if the device is still present so that people can't boot the special computer then shift the dongle to a different computer.
Another way would be to use a special boot CD, where the special computer needs to boot from the special boot CD and then lock the media in the drive (so that the CD can't be removed while the OS is running).
Mostly it depends why you need one computer to be treated differently (e.g. how tamper-proof it needs to be and how you plan to deal with hardware upgrades) and what hardware is present.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
Re: Machine identification
Brendan partially touched on this already, but ...
There are 10 bytes in the MBR (the very first sector) of your boot hard disk that are specifically dedicated to a "OS-specified Disk ID number". You could just set a specific value there for machine X, and test it during boot. You have to load the MBR of your boot disk into memory anyway.
There are 10 bytes in the MBR (the very first sector) of your boot hard disk that are specifically dedicated to a "OS-specified Disk ID number". You could just set a specific value there for machine X, and test it during boot. You have to load the MBR of your boot disk into memory anyway.
- Firestryke31
- Member
- Posts: 550
- Joined: Sat Nov 29, 2008 1:07 pm
- Location: Throw a dart at central Texas
- Contact:
Re: Machine identification
I personally would just go with the hard drive serial, because that means if the hard drive isn't what you expected, it's not the same physical disk, and therefore isn't an X. If all of the machines are identical, then you could move the hard disk in the event of another piece of hardware failing, and if X has a special piece of hardware that the Ys don't, then you can move that too. Of course, if X is the only one with the special hardware, you should just check for that instead.
Owner of Fawkes Software.
Wierd Al wrote: You think your Commodore 64 is really neato,
What kind of chip you got in there, a Dorito?
-
- Member
- Posts: 93
- Joined: Mon Nov 24, 2008 9:13 am
Re: Machine identification
The problem is we have a distributed OS for clusters. And it's very likely that all machines are technically identical (i.e. same hardware configuration). But we want one node to do special things (i.e. persistency for our transactional distributed memory).Combuster wrote:Well, what's the point of having one X and several Y specifically based on some index - it either means that the computers are significantly different for a reason (and you can just check for the thing that sets them apart), or they're identical and it technically doesn't matter which one is X and which ones are Y as long as you have exactly one.
That's exact the thing we'd like to do. But, unfortunately, we have to elect the one box in very early stages, i.e. no drivers loaded (no NIC operating) and no network stack. So we have to "elect" the box in some other way...Combuster wrote:You may want to try if using an election algorithm might be better suited for the task for that very reason.
The thing is, the boxes could have all the same CPU. Sorry, I should have been more clear in my post.01000101 wrote:well first off, to figure out what the CPU is, you need to read the values in the various CPUID return values. The Intel manuals have a ton of info regarding the CPUID instruction.
Perfect ...Brendan wrote:I'll assume that the computers are identical...
That seems to be exactly what we want. I'll check if our BIOSes support it.Brendan wrote:SMBIOS might provide a valid/unique serial number and/or might have a valid/unique UUID in the System Information structure. However, I wouldn't expect either of these things to be present (it's much easier for BIOS/motherboard manufacturers not to bother).
Right, nice idea, that's what TPM is (partially) intended for. Sadly, I doubt we have TPM chips in our boxes, but I'll check...Brendan wrote:If the computers have TPM chips, then these chips could be used to identify the special computer. I've only read a little about the TPM chip though (I'm not to sure about exact details of how this is done).
Good point. We indeed use the same NIC in all of our machines. So we could strip down the code to just loading the MAC address. We probably want to support more NICs later, so it would break by then. But for now it's fine.Brendan wrote:Ethernet card MAC addresses can be a good idea. If you know the ethernet cards are present in all machines and are the same type, then you only need a little code to get the MAC address (and don't need a full device driver).
Nice, ideas too...Brendan wrote:Hard drives have serial numbers and model numbers (in the "Identify Drive Information" structure) which could be used like the ethernet card's MAC.
You could store some sort of magic marker on any storage device (for example, on one computer you could fill a sector on the hard disk with prime numbers and use this sector to determine if the computer is special or not).
Hehe, we thought about that, too. A simple loopback on the serial/parallel port would do the trick. Unfortunately our (single) serial port is occupied for logging purposes. Our parallel port may be free (I don't know by now, if our mainboards provide a parallel port, though).Brendan wrote:You could create some sort of "dongle". Old fashioned dongles used the parallel port or the serial port, where software tests that certain pins are connected together or behave in a specific way. Depending on your requirements you might be able to do the same with a USB device instead (e.g. get a cheap USB flash stick and remove a specific pin on the main IC, then test for a faulty USB device). In this case you'd might need to regularly test if the device is still present so that people can't boot the special computer then shift the dongle to a different computer.
Very cool - nice idea. Our cluster machines don't have CD drives. But granted, I hadn't thought about that all...Brendan wrote:Another way would be to use a special boot CD, where the special computer needs to boot from the special boot CD and then lock the media in the drive (so that the CD can't be removed while the OS is running).
Thanks to you all, who joined the discussion. There were some nice ideas, that could help a lot.
Just wondering, is the "Let's use some unused space in CMOS" a good or bad idea? And why? Personally - I don't know, but I've a bad feeling about it..
Thanks again to you all.
--TS
Re: Machine identification
If all the machines are identical, and you need one to be unique, why not just make it not identical? Add another piece of hardware to it, anything would do (got an old Soundblaster lying around?) and put it in. For one machine, that would be an easy thing to do.
-
- Member
- Posts: 93
- Joined: Mon Nov 24, 2008 9:13 am
Re: Machine identification
Yes, I mentioned this solution in my original post. That is a possible way. But I think it's not as nifty as some UUID from somewhere... But you are right - it's a very practical solution.JackScott wrote:If all the machines are identical, and you need one to be unique, why not just make it not identical? Add another piece of hardware to it, anything would do (got an old Soundblaster lying around?) and put it in. For one machine, that would be an easy thing to do.
--TS
Re: Machine identification
Hi,
If the OS is installed on the hard drive of each computer, then you can have a special version and a normal version, where the person who installs the OS decides which version to install on which computer.
If the OS boots from network this is even easier (it adds about 3 lines to "dhcpd.conf"). Just tell the DHCP server to give one computer (the one with a special MAC address) a special boot loader. Of course if you were using PXE to boot from network then you'd be able to ask PXE for the MAC address too (no network card driver needed). Because you didn't know this already I'm going to assume you aren't booting from network.
So, why aren't you booting from network? Imagine someone has a large distributed cluster of computers that all run (essentially) exactly the same OS software and they want to upgrade the OS. Would they want to run around with a boot floppy or something and update every computer in the cluster by hand; or would it be easier to send a command to all the computers telling them to shutdown, then change some files on the boot server and send a special "wake on LAN" packet to start all the computers again? I'm guessing that for large clusters upgrading the OS could cost hours of downtime, or it could cost a few minutes of downtime without anyone leaving their desk...
Then there's the idea of having one computer for "special things". If the OS is distributed, then why aren't these special things also distributed? If these special things can't be distributed then you'll probably need some sort of central server to handle the load (e.g. a special computer with 4 network cards instead of one, RAID instead of a single hard drive, etc), but then you'd have one computer with different hardware (e.g. if there's more than one network card then assume the computer is special).
Maybe you can find a spare bit that's not covered by a checksum, or maybe you can find a spare bit and figure out how your BIOS calculates it's checksum. There are some bits in the RTC registers (which typically aren't covered by a checksum) that are marked as "unused" by Ralph Brown's Interrupt List (for example there's 4 bits in RTC Status Register C), but Ralph Brown's Interrupt List is getting old and there's no guarantee that the BIOS or the RTC hardware doesn't actually use these bits.
Cheers,
Brendan
Ok - there's several things here that don't make sense...Hyperdrive wrote:The problem is we have a distributed OS for clusters. And it's very likely that all machines are technically identical (i.e. same hardware configuration). But we want one node to do special things (i.e. persistency for our transactional distributed memory).
If the OS is installed on the hard drive of each computer, then you can have a special version and a normal version, where the person who installs the OS decides which version to install on which computer.
If the OS boots from network this is even easier (it adds about 3 lines to "dhcpd.conf"). Just tell the DHCP server to give one computer (the one with a special MAC address) a special boot loader. Of course if you were using PXE to boot from network then you'd be able to ask PXE for the MAC address too (no network card driver needed). Because you didn't know this already I'm going to assume you aren't booting from network.
So, why aren't you booting from network? Imagine someone has a large distributed cluster of computers that all run (essentially) exactly the same OS software and they want to upgrade the OS. Would they want to run around with a boot floppy or something and update every computer in the cluster by hand; or would it be easier to send a command to all the computers telling them to shutdown, then change some files on the boot server and send a special "wake on LAN" packet to start all the computers again? I'm guessing that for large clusters upgrading the OS could cost hours of downtime, or it could cost a few minutes of downtime without anyone leaving their desk...
Then there's the idea of having one computer for "special things". If the OS is distributed, then why aren't these special things also distributed? If these special things can't be distributed then you'll probably need some sort of central server to handle the load (e.g. a special computer with 4 network cards instead of one, RAID instead of a single hard drive, etc), but then you'd have one computer with different hardware (e.g. if there's more than one network card then assume the computer is special).
Just because a serial port is being used for logging doesn't necessarily mean that all of the serial port's pins are being used for logging - I'd assume that the "ring indicator" and "carrier detect" lines aren't being used (maybe some sort of pass-through dongle, where you can plug anything except a modem in?)...Hyperdrive wrote:Hehe, we thought about that, too. A simple loopback on the serial/parallel port would do the trick. Unfortunately our (single) serial port is occupied for logging purposes. Our parallel port may be free (I don't know by now, if our mainboards provide a parallel port, though).Brendan wrote:You could create some sort of "dongle". Old fashioned dongles used the parallel port or the serial port, where software tests that certain pins are connected together or behave in a specific way. Depending on your requirements you might be able to do the same with a USB device instead (e.g. get a cheap USB flash stick and remove a specific pin on the main IC, then test for a faulty USB device). In this case you'd might need to regularly test if the device is still present so that people can't boot the special computer then shift the dongle to a different computer.
Hmmm - you change one bit in the CMOS, and when you reboot the BIOS has a checksum failure and restores defaults (and wipes out your special bit)...Hyperdrive wrote:Just wondering, is the "Let's use some unused space in CMOS" a good or bad idea? And why? Personally - I don't know, but I've a bad feeling about it..
Maybe you can find a spare bit that's not covered by a checksum, or maybe you can find a spare bit and figure out how your BIOS calculates it's checksum. There are some bits in the RTC registers (which typically aren't covered by a checksum) that are marked as "unused" by Ralph Brown's Interrupt List (for example there's 4 bits in RTC Status Register C), but Ralph Brown's Interrupt List is getting old and there's no guarantee that the BIOS or the RTC hardware doesn't actually use these bits.
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 93
- Joined: Mon Nov 24, 2008 9:13 am
Re: Machine identification
Hi,
(That is now getting a bit off-topic. Maybe we should discuss this in the design forum. I know you had some thoughts about DSM and transactional memory, too...)
--TS
I'd rather say its more of a not so common setup and I haven't explained it in much detail.Brendan wrote:Ok - there's several things here that don't make sense...Hyperdrive wrote:The problem is we have a distributed OS for clusters. And it's very likely that all machines are technically identical (i.e. same hardware configuration). But we want one node to do special things (i.e. persistency for our transactional distributed memory).
Hm, you are absolutely right. We boot from network (see below).Brendan wrote:If the OS is installed on the hard drive of each computer, then you can have a special version and a normal version, where the person who installs the OS decides which version to install on which computer.
We have just one image to load. Not a "normal image" and a "special image". The "special" functionality is always present, because it is the same OS but operating in another mode. So - yes that would be a perfect solution, but not for us.Brendan wrote:If the OS boots from network this is even easier (it adds about 3 lines to "dhcpd.conf"). Just tell the DHCP server to give one computer (the one with a special MAC address) a special boot loader. Of course if you were using PXE to boot from network then you'd be able to ask PXE for the MAC address too (no network card driver needed). Because you didn't know this already I'm going to assume you aren't booting from network.
Fully agreed. We more or less work this way. It's even a bit cooler: Our OS image resides in our distributed shared memory. When a node starts it searches for an already existing DSM in the cluster. If found, it can easily load the image from there. If not found, it will get the image using other ways and spawns a new DSM, placing the image there. If we want to upgrade, someone has to recompile and replace the image in DSM. Then we only need to restart. In detail this is not as simple as just said, but I hope you get the idea.Brendan wrote:So, why aren't you booting from network? Imagine someone has a large distributed cluster of computers that all run (essentially) exactly the same OS software and they want to upgrade the OS. Would they want to run around with a boot floppy or something and update every computer in the cluster by hand; or would it be easier to send a command to all the computers telling them to shutdown, then change some files on the boot server and send a special "wake on LAN" packet to start all the computers again? I'm guessing that for large clusters upgrading the OS could cost hours of downtime, or it could cost a few minutes of downtime without anyone leaving their desk...
Well, yes and no. The "special things" are the persistency part of our DSM. If some node fails for whatever reason and hangs, we could lose some data (that currently resided at this node). So what we do is making snapshots of consistent states by monitoring what is going on in the DSM and periodically updating our snapshot. For storing the snapshots we use hard disks.Brendan wrote:Then there's the idea of having one computer for "special things". If the OS is distributed, then why aren't these special things also distributed? If these special things can't be distributed then you'll probably need some sort of central server to handle the load (e.g. a special computer with 4 network cards instead of one, RAID instead of a single hard drive, etc), but then you'd have one computer with different hardware (e.g. if there's more than one network card then assume the computer is special).
(That is now getting a bit off-topic. Maybe we should discuss this in the design forum. I know you had some thoughts about DSM and transactional memory, too...)
Nice point. The idea is not that bad.Brendan wrote:Just because a serial port is being used for logging doesn't necessarily mean that all of the serial port's pins are being used for logging - I'd assume that the "ring indicator" and "carrier detect" lines aren't being used (maybe some sort of pass-through dongle, where you can plug anything except a modem in?)...Hyperdrive wrote:Hehe, we thought about that [a dongle], too. A simple loopback on the serial/parallel port would do the trick. Unfortunately our (single) serial port is occupied for logging purposes. Our parallel port may be free (I don't know by now, if our mainboards provide a parallel port, though).
Exactly what I thought about it. Thanks for affirming my argumentsBrendan wrote:Hmmm - you change one bit in the CMOS, and when you reboot the BIOS has a checksum failure and restores defaults (and wipes out your special bit)...Hyperdrive wrote:Just wondering, is the "Let's use some unused space in CMOS" a good or bad idea? And why? Personally - I don't know, but I've a bad feeling about it..
Maybe you can find a spare bit that's not covered by a checksum, or maybe you can find a spare bit and figure out how your BIOS calculates it's checksum. There are some bits in the RTC registers (which typically aren't covered by a checksum) that are marked as "unused" by Ralph Brown's Interrupt List (for example there's 4 bits in RTC Status Register C), but Ralph Brown's Interrupt List is getting old and there's no guarantee that the BIOS or the RTC hardware doesn't actually use these bits.
--TS
Re: Machine identification
Check the media serial number of the primary hard drive 0x80 using int 0x13h subfunction 0x25h. It should be unique to each machine, then do a simple test. If it matches its box X, if not its one of the Y boxes.
Re: Machine identification
Hi,
Normally when you boot from network (using PXE) the BIOS downloads one file at 0x7C00 and jumps to it. This means you're either limited to about 840 KiB or you need a boot loader that downloads the rest. In any case the cost of having a separate file for the DSM is always less than about 840 KiB of disk space on the TFTP server. As an alternative, my OS gets the MAC address from PXE and uses that as part of the file name to download some files. For example, a computer with the MAC address 01:23:45:67:89:AB would attempt to download a boot script called "0123456789AB.bsc" from the TFTP server.
Regardless of what your answer is, "<something>" can know if the computer is the DSM or not (e.g. the user could have told your installer to set or clear a "DSM enabled" flag when they originally installed it), and "<something>" can tell everything/anything else if the computer is the DSM or not. If "<something>" is overwritten when the image is downloaded/installed then the code that downloads/installs could set or clear a "DSM enabled" flag in the image before installing it (so that next time you boot "<something>" still knows if the computer is the DSM or not).
I'm also curious how well your system scales - how many computers can you have in a cluster before you reach a "break even" point (where adding more computers makes performance worse)? For "transactional distributed shared memory" with a single computer used for synchronization, I have a feeling the break even point is closer to 4 computers than 40...
Cheers,
Brendan
The difference between the "normal image" and the "special image" could be one bit that's either clear (for a client node) or set (for a DSM node).Hyperdrive wrote:Hm, you are absolutely right. We boot from network (see below).Brendan wrote:If the OS is installed on the hard drive of each computer, then you can have a special version and a normal version, where the person who installs the OS decides which version to install on which computer.
We have just one image to load. Not a "normal image" and a "special image". The "special" functionality is always present, because it is the same OS but operating in another mode. So - yes that would be a perfect solution, but not for us.Brendan wrote:If the OS boots from network this is even easier (it adds about 3 lines to "dhcpd.conf"). Just tell the DHCP server to give one computer (the one with a special MAC address) a special boot loader. Of course if you were using PXE to boot from network then you'd be able to ask PXE for the MAC address too (no network card driver needed). Because you didn't know this already I'm going to assume you aren't booting from network.
Normally when you boot from network (using PXE) the BIOS downloads one file at 0x7C00 and jumps to it. This means you're either limited to about 840 KiB or you need a boot loader that downloads the rest. In any case the cost of having a separate file for the DSM is always less than about 840 KiB of disk space on the TFTP server. As an alternative, my OS gets the MAC address from PXE and uses that as part of the file name to download some files. For example, a computer with the MAC address 01:23:45:67:89:AB would attempt to download a boot script called "0123456789AB.bsc" from the TFTP server.
You turn a computer on and the BIOS loads <something> from <hard disk | USB | network | floppy> and jumps to it. What is "<something>" and where does the BIOS get it from?Hyperdrive wrote:Fully agreed. We more or less work this way. It's even a bit cooler: Our OS image resides in our distributed shared memory. When a node starts it searches for an already existing DSM in the cluster. If found, it can easily load the image from there. If not found, it will get the image using other ways and spawns a new DSM, placing the image there. If we want to upgrade, someone has to recompile and replace the image in DSM. Then we only need to restart. In detail this is not as simple as just said, but I hope you get the idea.Brendan wrote:So, why aren't you booting from network? Imagine someone has a large distributed cluster of computers that all run (essentially) exactly the same OS software and they want to upgrade the OS. Would they want to run around with a boot floppy or something and update every computer in the cluster by hand; or would it be easier to send a command to all the computers telling them to shutdown, then change some files on the boot server and send a special "wake on LAN" packet to start all the computers again? I'm guessing that for large clusters upgrading the OS could cost hours of downtime, or it could cost a few minutes of downtime without anyone leaving their desk...
Regardless of what your answer is, "<something>" can know if the computer is the DSM or not (e.g. the user could have told your installer to set or clear a "DSM enabled" flag when they originally installed it), and "<something>" can tell everything/anything else if the computer is the DSM or not. If "<something>" is overwritten when the image is downloaded/installed then the code that downloads/installs could set or clear a "DSM enabled" flag in the image before installing it (so that next time you boot "<something>" still knows if the computer is the DSM or not).
And if the DSM fails? For example, what happens if the DSM crashes (software problem), what happens if the DSM's hard disk dies (hardware problem), and what happens if someone unplugs the DSM's network cable (network problem)?Hyperdrive wrote:Well, yes and no. The "special things" are the persistency part of our DSM. If some node fails for whatever reason and hangs, we could lose some data (that currently resided at this node). So what we do is making snapshots of consistent states by monitoring what is going on in the DSM and periodically updating our snapshot. For storing the snapshots we use hard disks.Brendan wrote:Then there's the idea of having one computer for "special things". If the OS is distributed, then why aren't these special things also distributed? If these special things can't be distributed then you'll probably need some sort of central server to handle the load (e.g. a special computer with 4 network cards instead of one, RAID instead of a single hard drive, etc), but then you'd have one computer with different hardware (e.g. if there's more than one network card then assume the computer is special).
I'm also curious how well your system scales - how many computers can you have in a cluster before you reach a "break even" point (where adding more computers makes performance worse)? For "transactional distributed shared memory" with a single computer used for synchronization, I have a feeling the break even point is closer to 4 computers than 40...
Cheers,
Brendan
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.
-
- Member
- Posts: 93
- Joined: Mon Nov 24, 2008 9:13 am
Re: Machine identification
Hi,
well let me put some things straight here. I hope explaining the concepts make everything a bit clearer.
In traditional approaches for distributed applications you would pass messages around over the network. The problem is, the larger your application becomes, the more complex your protocol and the application implementation will be.
We want to provide a much more simple paradigm: shared memory. Just write to a memory location, that is shared by all (and by saying "all", I mean "all"). Every task can simply access memory (read/write), no matter on which physical box it runs. The OS takes care, that everyone sees the same values. It does so by catching accesses and sending messages over the network to the other nodes, informing them about the state change. So we have a shared memory in a distributed fashion - a distributed shared memory (DSM). It's just about abstracting things - away from sending messages.
How does it work - the big picture
If "someone" accesses a page that isn't locally present, we ask the other nodes in the cluster for that page (a page request). Someone will send us the data and we map it in. Then the access can be successfully done. That's just the same like demand paging, except that we don't fetch the page in question from disk, but from somewhere in the cluster.
We allow many copies of the same page to exist. Maybe there are many nodes that have accessed a page by a read and so everyone of them gets a copy. At the time a page is modified, all copies are invalidated - only the copy of the writer will remain valid. We don't force the page to be physically stored at a specific node. Instead, the page can "migrate" - the node, which modified the page last and invalidated all other copies, stores the page.
So, all in all, we have a DSM with the granularity of pages, where replicating and migrating is allowed and updates are done via a "write invalidate"-protocol.
How is consistency of the data enforced?
We have "transactional consistency"....
All of our tasks are transactional with the well known ACID-charecteristics (like in databases). Every time a task is run, there's implicitly a begin-of-transaction (BOT). End when they yield execution (or terminate) we have an implicit end-of-transaction (EOT). Everything the task does between BOT and EOT is recorded: All pages that were touched by reading are recorded in a "read set". Similarly all pages touched by a write are recorded in a "write set". If some task modifies a page, we catch the access (page fault), create a copy of the page (a "shadow copy") and then give write access to that page.
Everyone can modify any page at any time. There are no locks - no "pessimistic synchronisation". We instead do "optimistic synchronisation", because we believe that collissions (see below) are rare and in case they occur, we can afford the time to do a roll-back.
Because two different transactional tasks may experience inconsistent data (remember, we do not lock), we have to validate ("Are there inconsistencies?"). It goes like this: On EOT we send out the write set, so everyone can see which pages the task has modified. If some task can match an entry in the received write set with an entry in its own read set, then there is a possibility that it saw wrong data - a so called "collission" occured. The validating transaction and the "colliding" transactions now have to choose a winner. All loosers will be aborted and all of their effects will be rolled back. The (only) winner can do a commit (it's just not aborted/no roll back). (In the current implementation we chose "first wins", that means the validating transaction always wins and all conflicting transactions are aborted. That leads to some fairness issues, which could be solved by implementing some strategies.)
See how it works? Maybe I wasn't very clear (that's very often my probem... sorry), but believe me: It really works...
Some more things
There's no single "synchronizer". The synchronisation is entirely distributed. So there's no bottleneck and no single point of failure in this regard (synchronisation).
On top of the raw TDM (transactional distributed (shared) memory), we maintain a object-oriented heap ("distributed heap storage", DHS).
All of our code (tasks, and even parts of the OS) is in the TDM. And all data is in the DHS on top of it. So yes, "everything" is in the TDM. All nodes will use the image in the TDM as "the OS" and only need memory management, page fault handler and some networking stuff to participate in the TDM operation.
What about fault tolerance?
If a machine goes down (for whatever reason), there may be a data loss (all pages that are only stored on that machine). That's not very nice. So we have a "page server". The page server is, lika all nodes, a simple TDM participant. He knows the protocol and the workings of the TDM, like all the others node. But he's more like a passive node. From time to time he does a snapshot of the TDM. And that snapshot is guaranteed to be a consistent.
If a machine fails, it may take data with it into hell. But - the page server has a consistent snapshot of just a few seconds before this. We then do a fall back to this state. The page server invalidates everything out there. All the nodes will now experience some page faults and request the pages - the page server has valid copies and does what it's named after: it serves the pages.
All you loose is a few seconds of work (depending on the snapshot frequency). It's just done again.
Questions?
Whoa, that was a lengthy post. I hope I got all this across to you - at least the ideas behind it. Feel free to ask, if you have any questions...
--TS
well let me put some things straight here. I hope explaining the concepts make everything a bit clearer.
In traditional approaches for distributed applications you would pass messages around over the network. The problem is, the larger your application becomes, the more complex your protocol and the application implementation will be.
We want to provide a much more simple paradigm: shared memory. Just write to a memory location, that is shared by all (and by saying "all", I mean "all"). Every task can simply access memory (read/write), no matter on which physical box it runs. The OS takes care, that everyone sees the same values. It does so by catching accesses and sending messages over the network to the other nodes, informing them about the state change. So we have a shared memory in a distributed fashion - a distributed shared memory (DSM). It's just about abstracting things - away from sending messages.
How does it work - the big picture
If "someone" accesses a page that isn't locally present, we ask the other nodes in the cluster for that page (a page request). Someone will send us the data and we map it in. Then the access can be successfully done. That's just the same like demand paging, except that we don't fetch the page in question from disk, but from somewhere in the cluster.
We allow many copies of the same page to exist. Maybe there are many nodes that have accessed a page by a read and so everyone of them gets a copy. At the time a page is modified, all copies are invalidated - only the copy of the writer will remain valid. We don't force the page to be physically stored at a specific node. Instead, the page can "migrate" - the node, which modified the page last and invalidated all other copies, stores the page.
So, all in all, we have a DSM with the granularity of pages, where replicating and migrating is allowed and updates are done via a "write invalidate"-protocol.
How is consistency of the data enforced?
We have "transactional consistency"....
All of our tasks are transactional with the well known ACID-charecteristics (like in databases). Every time a task is run, there's implicitly a begin-of-transaction (BOT). End when they yield execution (or terminate) we have an implicit end-of-transaction (EOT). Everything the task does between BOT and EOT is recorded: All pages that were touched by reading are recorded in a "read set". Similarly all pages touched by a write are recorded in a "write set". If some task modifies a page, we catch the access (page fault), create a copy of the page (a "shadow copy") and then give write access to that page.
Everyone can modify any page at any time. There are no locks - no "pessimistic synchronisation". We instead do "optimistic synchronisation", because we believe that collissions (see below) are rare and in case they occur, we can afford the time to do a roll-back.
Because two different transactional tasks may experience inconsistent data (remember, we do not lock), we have to validate ("Are there inconsistencies?"). It goes like this: On EOT we send out the write set, so everyone can see which pages the task has modified. If some task can match an entry in the received write set with an entry in its own read set, then there is a possibility that it saw wrong data - a so called "collission" occured. The validating transaction and the "colliding" transactions now have to choose a winner. All loosers will be aborted and all of their effects will be rolled back. The (only) winner can do a commit (it's just not aborted/no roll back). (In the current implementation we chose "first wins", that means the validating transaction always wins and all conflicting transactions are aborted. That leads to some fairness issues, which could be solved by implementing some strategies.)
See how it works? Maybe I wasn't very clear (that's very often my probem... sorry), but believe me: It really works...
Some more things
There's no single "synchronizer". The synchronisation is entirely distributed. So there's no bottleneck and no single point of failure in this regard (synchronisation).
On top of the raw TDM (transactional distributed (shared) memory), we maintain a object-oriented heap ("distributed heap storage", DHS).
All of our code (tasks, and even parts of the OS) is in the TDM. And all data is in the DHS on top of it. So yes, "everything" is in the TDM. All nodes will use the image in the TDM as "the OS" and only need memory management, page fault handler and some networking stuff to participate in the TDM operation.
What about fault tolerance?
If a machine goes down (for whatever reason), there may be a data loss (all pages that are only stored on that machine). That's not very nice. So we have a "page server". The page server is, lika all nodes, a simple TDM participant. He knows the protocol and the workings of the TDM, like all the others node. But he's more like a passive node. From time to time he does a snapshot of the TDM. And that snapshot is guaranteed to be a consistent.
If a machine fails, it may take data with it into hell. But - the page server has a consistent snapshot of just a few seconds before this. We then do a fall back to this state. The page server invalidates everything out there. All the nodes will now experience some page faults and request the pages - the page server has valid copies and does what it's named after: it serves the pages.
All you loose is a few seconds of work (depending on the snapshot frequency). It's just done again.
Questions?
Whoa, that was a lengthy post. I hope I got all this across to you - at least the ideas behind it. Feel free to ask, if you have any questions...
--TS
Re: Machine identification
I have a quick question. Does this work via 56K modems, or do you require more like 10-gigabit fibre optic ethernet to do this? I'm suspecting the latter. If you have time (and are able to legally), an overview of the hardware you're planning to use would be quite interesting.