OS-less programming

tolgame · Post by **tolgame** » Thu May 26, 2011 2:20 am

Hi everybody,
First post!
I'm looking for a way to run a program on my computer directly from boot, like I would with a microcontroller. So basically a bootable USB stick which goes directly to run the program, and the output gets written back onto the stick. If I can get keyboard and simple console working easily, that would be fantastic, but even this is not essential. The purpose is for scientific computing, so the whole point is number crunching performance.
Question 1) has anyone heard of a framework to do this? I have not found one but perhaps I don't know how to look.
Question 2) if not, what do you think about the feasibility?
I have done a bit of reading, and from what I understand I can use BIOS functions in real mode to boot from the USB stick and recognize it as a hard drive, then go into long mode for my calculations, then drop back into unreal mode to write to the stick. I can also drop back into unreal mode for keyboard/display. (or, replace unreal mode with EFI calls for an EFI board) Other than that, multiprocessing is done by writing commands to IOAPIC, and the rest is just the program. Am I missing any key points, or oversimplifying anything?
Thanks in advance,
Tolgame

MDM · Post by **MDM** » Thu May 26, 2011 2:23 am

There are no frameworks when you first boot -,-

That'd be hell to program a program of any size on. Also if you are dropping down into unreal/real mode it'd be slow.

shikhin · Post by **shikhin** » Thu May 26, 2011 2:37 am

Hello,

MDM wrote:Also if you are dropping down into unreal/real mode it'd be slow.

You could fix it simply by using virtual real mode, but I believe if you really want speed (to focus on number crunching), writing your own driver or porting one is the way to go!

tolgame wrote:First post!

Welcome aboard!

tolgame wrote:If I can get keyboard and simple console working easily, that would be fantastic, but even this is not essential.

Simple console and keyboard are quite easy. If you look around on the wiki and some tutorials, you'd find excellent resources, to help you do that.

tolgame wrote:Question 1) has anyone heard of a framework to do this? I have not found one but perhaps I don't know how to look.

/me doesn't understand what you mean by 'framework' in this context?

tolgame wrote:Question 2) if not, what do you think about the feasibility? I have done a bit of reading, and from what I understand I can use BIOS functions in real mode to boot from the USB stick and recognize it as a hard drive, then go into long mode for my calculations, then drop back into unreal mode to write to the stick.

Rather than dropping back to real mode, something more feasible would be to use virtual real mode, or perhaps a Real Mode emulator. More on our wiki sir!

tolgame wrote:Am I missing any key points, or oversimplifying anything?

Well, I might never tell more, unless you say what type of number crunching, the usage of this "thingy", etc.

Regards,
Shikhin

bluemoon · Post by **bluemoon** » Thu May 26, 2011 3:24 am

tolgame wrote:The purpose is for scientific computing, so the whole point is number crunching performance.

IMO this is not a strong good reason to roll your own OS, but it's your choice.

Usually, the effort spend to write a minimum OS (or even bare-metal hello world plus necessary memory management facility so that you can implement a good computing application, and you mentioned some storage IO) could be used to improve the calculation algorithm itself.

I don't think "overhead" induced in any modern OS (eg scheduling, interrupts etc) is perceptible for a carefully designed computing intensive application.

And for reality there is chance that your own OS would impose a larger overhead compared to modern OS.

tolgame wrote:Other than that, multiprocessing is done by writing commands to IOAPIC, and the rest is just the program. Am I missing any key points, or oversimplifying anything?

AFAIK there is much more beside just start the engine. Some processor require changing speed stepping, and push the turbo button while balancing thermal monitor (over-clock a while then give a core a tea-break) for maximum overall performance. If it's not doing right the board can freeze the CPU instead of starting a fire. You may check with Turbo Boost for more on this.

tolgame · Post by **tolgame** » Thu May 26, 2011 3:50 am

Wow, thanks for the fast replies!

/me doesn't understand what you mean by 'framework' in this context?

Maybe I am misusing a reserved word, sorry. I certainly don't mean framework in the sense that MDM understood. What I mean is that I doubt I am the first person to want to do this, and that there might be examples of how to do this, or tools to produce the bootable USB (specific to this purpose, I mean). I was just wondering if anyone's heard about such a thing before.

Rather than dropping back to real mode, something more feasible would be to use virtual real mode, or perhaps a Real Mode emulator. More on our wiki sir!

Thank you for the tip! I will look into it. I am not enthused about writing my own USB driver.
As an aside, "virtual real mode" is not in the wiki, which might be why I missed it when I was absorbing real/protected/long mode stuff from the wiki. But Google and Wikipedia helped me to realize that "virtual real mode" is the same as "virtual 8086 mode", which is in the wiki.
But the switching would only be to set program parameters at startup, to give some sort of visual feedback while it's running so you know the program hasn't crashed, and a message that says "done!" at the end, so I think any context switching penalty would be negligible.

That'd be hell to program a program of any size on.

Well, I might never tell more, unless you say what type of number crunching

My goal right now is monte carlo (= guess and check) simulation of macromolecules. This entails an enormous number of calculations but not an enormous size output file because most conformations are rejected. Mostly its just trying different conformations (xyz coordinates) and measuring the bonding energy, etc. I am currently doing it (successfully) in a Scheme environment, which was fast to write since it was my first programming language, but obviously is just crap for using the hardware. The obvious next step would be to switch to a professional simulation package or write in visual studio with thread building blocks, but that just doesn't capture my excitement the same way as rolling my own (I guess you know what I mean if you are on this forum). But if OS developers are telling me that it's too hard, I'll listen!
One thing I am still a bit confused about, for example, is whether I would be able to use a compiler. In principle I don't see why one couldn't use a high-level language to order the CPU around (albeit without librarys, and with assembler or bytecode inserted for booting and inter-processor communication). But perhaps most compilers are designed around outputting object files, which wouldn't work here?
Also, I think debugging should in principle be straightforward using a virtual machine which shows you the state of all the registers etc. But again I haven't done this with anything larger than a single PIC many years ago.
What do you think?
After I wrote this:

Not feasible. There are simpler ways to achieve the same.

IMO this is not a strong good reason to roll your own OS, but it's your choice.

Alrighty, I guess it's harder than I thought. I take it back then

boot linux replacing init with your own binary: kernel vmlinuz init=/bin/mycrunch

Then I will look into this.
Thanks again for all your help.

schilds · Post by **schilds** » Thu May 26, 2011 4:36 am

If it's parallelisable, you could try implement your code to work on a cluster, or possibly even with programmable graphics hardware (shaders). Doing so would probably be just as interesting as writing your own OS.

If you still really want to write code for a bare machine, I would still suggest some kind of networked solution with the client running on one machine with a normal OS (so it can make use of normal gui + storage mechanisms) and your bare machine(s) booting up just a network driver (rather than i/o for disk, screen, keyboard, etc.) to receive computations and send back results. Well, I guess this just brings us back to cluster computing.

OSwhatever · Post by **OSwhatever** » Thu May 26, 2011 5:01 am

tolgame wrote:Hi everybody,
First post!
I'm looking for a way to run a program on my computer directly from boot, like I would with a microcontroller. So basically a bootable USB stick which goes directly to run the program, and the output gets written back onto the stick. If I can get keyboard and simple console working easily, that would be fantastic, but even this is not essential. The purpose is for scientific computing, so the whole point is number crunching performance.
Question 1) has anyone heard of a framework to do this? I have not found one but perhaps I don't know how to look.
Question 2) if not, what do you think about the feasibility?
I have done a bit of reading, and from what I understand I can use BIOS functions in real mode to boot from the USB stick and recognize it as a hard drive, then go into long mode for my calculations, then drop back into unreal mode to write to the stick. I can also drop back into unreal mode for keyboard/display. (or, replace unreal mode with EFI calls for an EFI board) Other than that, multiprocessing is done by writing commands to IOAPIC, and the rest is just the program. Am I missing any key points, or oversimplifying anything?
Thanks in advance,
Tolgame

This is basically what DPMI did (fall back to BIOS calls in 16-bit mode, buffers below 1MB etc) which was a layer under Windows 95 and other programs. Sure you can use BIOS calls for IO but it will sure be limited and don't expect any rocket speed performance. If you quickly need to be able to use storage devices, this could be a way. Don't expect that the BIOS calls are thread safe so if you're going to add multiprocessor, you have to mutex the BIOS calls. Also if you hijack the BIOS default interrupt setup, I'm not sure if the BIOS calls will work properly.

There are DPMI TSR's with open source available out there, so you can always look how they solved things. A well known "DOS extender" is "PMODE" for example.

xenos · Post by **xenos** » Thu May 26, 2011 5:20 am

If you want to go for scientific programming / number crunching performance, you might want to consider parallel programming. There are lots of frameworks to do that:

Using parallel computers / multi-core CPUs / computer clusters:
- MPI
- PVM

Using graphics cards:
- CUDA
- OpenCL

I'm currently using CUDA for scientific computing (simulation of galaxy cluster formation after the Big Bang) and it's simply a great boost in computing performance, much better than anything one could achieve with "OS-less programming".

tolgame · Post by **tolgame** » Thu May 26, 2011 6:54 am

Again I want to thank you all, it's all phenomenal advice. Some of it I knew, some I didn't, but all relevant and helpful.

I certainly don't want to set my computer on fire. And, I agree that the parallel computing strategies suggested especially by XenOS are a very logical step forward, and probably my best bet.

The reason why I have been avoiding those solutions can be summed up best by a nice sentence at the top of OpenMP article on Wikipedia:

(OpenMP) consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

While I know that for getting work done, it is probably best for a non-expert like me to use libraries made by experts, I have always been happiest (and often most productive, as a result) when I could really see what I was doing. If I go the MPI direction, I can see myself learning a lot of tricks about, well, compiler directives and environment variables, and perhaps still not really know how the computer is doing it, if it's even running on all cores like I expect it to, etc. For example, I'm a little embarrassed at how long it took me to figure out how multi-processor systems even worked at the basic level. It took an awful lot of searching and reading to finally learn that each processor has an IOAPIC memory-mapped which handles this. Initially I thought there must be separate instructions to handle inter-processor communication, or something, and I was looking through the x86 instruction set to find them. This may be solely a testament to my incompetence, but personally I think it also is indicative of how insulated programmers can get from the actual computer if they don't make the effort. Pretend you don't know it already and go on google/wikipedia (or the OSdev wiki even) and try to learn how multiprocessing is implemented, and you might see what I mean - it's not so easy to find a clear explanation.
I can also imagine requiring increasingly complicated debugging setups to figure out how the program is running if I start using these libraries. Am I wrong about this?

So, my revised question is, is it possible to do scientific computing while "seeing what you are doing"? That is, to be able to watch the execution play out through a VM or a good debugger, see the cores working as I told them to, and being able to debug at that level? If this is possible, I guess that would (in principle) gain me the performance advantages of CUDA or MPI while also satisyfing my urge to see what's going on. Perhaps now you see why I had in mind this OS-less program (not to be stubborn, I believe you that it was a bad idea).
Anyway, if it's not possible to have my cake and eat it too, I'll just go for one of the suggestions already made.

schilds · Post by **schilds** » Thu May 26, 2011 7:33 am

If you want to work at a really low level, while writing a OS-less program for a single machine may be a bad idea, it becomes easier if you accept you're going to be using multiple machines, since other than the controlling machine (which you can just run a normal OS with all usual tools on) all you need to do with most of the machines is boot up a network driver that accepts instructions from the controlling machine and returns results (no need for disk, keyboard, screen, etc).

I wrote a cheap and nasty serial bootloader: https://github.com/schilds. It does the bare minimum, is not suitable as is (though there are serial to tcp/ip devices available) and I'm not entirely sure it's correct/safe/good, however you could do something similar with tcp/ip instead.

This way you could still get down to all the low level stuff and know exactly how you're dividing up the computation without having to write any kind of OS. Also, this would allow for a fairly flexible partition between stuff you'd prefer to write at a high level vs a low level, since any computation you can't be bothered partitioning up and generating code for to run on the cluster, you can write with the usual tools to run on the controlling machine.

xenos · Post by **xenos** » Thu May 26, 2011 9:08 am

tolgame wrote:So, my revised question is, is it possible to do scientific computing while "seeing what you are doing"?

I can try to give a rough answer to this question only for CUDA since it's the only framework I used for actual programming so far. (I also have some code here that uses MPI, but I haven't written that on my own.)

Using CUDA, one can in some sense "see what's happening". The principle of CUDA is the following: Your program consists of two types of functions. The first function type runs on the CPU, the second type runs on the GPU (i.e., the graphics card). Both types of functions are defined in a C source file, and you tell the compiler which of them should run on the CPU and which should run on the GPU. Typically, there is a "main" function which runs on the CPU and loads some data into the graphics memory, and then calls some GPU function. The GPU consists of several cores (8 ~ 1024, depending on the graphics card), and now all of them start executing the GPU function in parallel. On each core, the "core ID" is passed as another parameter to the GPU function. The GPU function can use it as an index to grab some data from the graphics card memory, crunch it and write the result back to graphics card memory. When The GPU has finished processing data, control returns to the CPU function, which copies the results from graphics card memory back to ordinary RAM.

There's a lot of documentation how CUDA works and a lot of example programs floating around. I would recommend the CUDA C Programming Guide as it explains the basic concepts in a bit more details.

IanSeyler · Post by **IanSeyler** » Thu May 26, 2011 11:44 am

This is similar to our goals for BareMetal Node except we are using Ethernet to initially load the system via PXE and also to transfer the program/data/results.

http://www.returninfinity.com/baremetalnode.html

tolgame · Post by **tolgame** » Fri May 27, 2011 1:17 am

I am taking a closer look at CUDA now, it is indeed better than I had thought in terms of seeing what is going on. I will also surely check out Baremetal, which from the website seems to be almost exactly what I was looking for (but apparently more elaborate). I will post back when I have something more intelligent to say.
Again, thanks for all the replies, I'm really glad I posted here.

bifferos · Post by **bifferos** » Fri May 27, 2011 11:17 am

tolgame wrote: I'm looking for a way to run a program on my computer directly from boot, like I would with a microcontroller.

I read that first part and immediately thought of Libpayload (http://www.coreboot.org/Libpayload). That's certainly a 'bare metal' framework, and it gives you a fair amount of functionality, although the hardware support is limited to what Coreboot supports, obviously.

regards,
Biff.

tolgame · Post by **tolgame** » Sat May 28, 2011 8:19 am

Hi Biff,
After looking around the website and documentation, I can't tell how coreboot deals with multiprocessor systems, or how I could control them from the context of a payload program. Do you know? Would I have to include one of the previously recommended multiprocessing libraries and the associated dependencies in the payload program?

OSDev.org

OS-less programming

OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming

Re: OS-less programming