The Terrific Tell and Talk Thread
The Terrific Tell and Talk Thread
A picture is worth a thousand words, or 3396 posts. But "As the Chinese say, 1001 words is worth more than a picture." This is the thread for your 1001 words!
Have you recently added a new feature to your OS, or accomplished some milestone? Well we'd love to hear about it! This aims to be an in-depth counterpart to the screenshot thread, where here you can go into detail about what you worked on. We encourage you to talk about challenges you faced, fun or interesting things you learned in the process. Even how you managed to solve the problem.
The motive here is to stimulate conversation not on abstract ideas or designs, but things you yourself got working in your OS. Where you can discuss the pros and cons of your choices, alternate ideas and share results and ideas with the community. Whether you want to expose yourself to alternate views, inspire ideas in others, or are just proud of some really cool thing you did, we want your words!
To those working on their own OS, we encourage you to start with an introduction to your project. Tell us a little bit about it, and over time share your progress.
To those just curious or in-between projects, feel free to join in and ask questions and get involved.
Have you recently added a new feature to your OS, or accomplished some milestone? Well we'd love to hear about it! This aims to be an in-depth counterpart to the screenshot thread, where here you can go into detail about what you worked on. We encourage you to talk about challenges you faced, fun or interesting things you learned in the process. Even how you managed to solve the problem.
The motive here is to stimulate conversation not on abstract ideas or designs, but things you yourself got working in your OS. Where you can discuss the pros and cons of your choices, alternate ideas and share results and ideas with the community. Whether you want to expose yourself to alternate views, inspire ideas in others, or are just proud of some really cool thing you did, we want your words!
To those working on their own OS, we encourage you to start with an introduction to your project. Tell us a little bit about it, and over time share your progress.
To those just curious or in-between projects, feel free to join in and ask questions and get involved.
Re: The Terrific Tell and Talk Thread
I guess I will start this thread .
In the last two weeks, I did an overhaul of my microkernel's memory management. The features that I support now are:
In the last two weeks, I did an overhaul of my microkernel's memory management. The features that I support now are:
- Evicting pages from the page cache on a least recently used (LRU) basis
- Memory locking: locking pages so that they cannot be evicted
- Writeback of dirty pages back to disk
- And, finally, copy on write (CoW)
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: The Terrific Tell and Talk Thread
Cool! So this page cache is for pages originating from files? Or more generally to handle swap and shared memory? Whose able to lock pages, I imagine its limited to kernel stuff?Korona wrote:I guess I will start this thread .
In the last two weeks, I did an overhaul of my microkernel's memory management. The features that I support now are:Out of all those features, CoW is the most interesting one. Managarm actually did support CoW some years ago but I had to disable it due to the lack of support for memory locking. The interaction of CoW and locking is crucial for correctness: locked paged have to be copied eagerly during fork(); is this is not honored, the CoW mechanism would change the physical address of a page while it is already locked (and that makes programs not see the result of DMA operations or miss futex wakeups). I'm glad that this works now. It only gave a moderate improvement in performance (copying memory is really not that slow) but a huge improvement in memory consumption (from about 1GiB to 600MiB to boot to Weston).
- Evicting pages from the page cache on a least recently used (LRU) basis
- Memory locking: locking pages so that they cannot be evicted
- Writeback of dirty pages back to disk
- And, finally, copy on write (CoW)
Aside: I like seeing another C++ OS that embraces Modern™ C++ and not just C with classes, at least from a quick look. And I may have spent way too long laughing at one line, imagining a Jersey guy saying "friggin' make a shared cow chain, get smarter and allocate a shared cow mapping already
Re: The Terrific Tell and Talk Thread
Thanks . Yes, the page cache is for pages originating from files (it's basically the pages that get mapped to userspace on mmap()). I do not have swapping yet, but the mechanisms will probably be similar. Right now, only the kernel locks memory, but I will also make this available for userspace drivers and to implement mlock().pat wrote:Cool! So this page cache is for pages originating from files? Or more generally to handle swap and shared memory? Whose able to lock pages, I imagine its limited to kernel stuff?
Aside: I like seeing another C++ OS that embraces Modern™ C++ and not just C with classes, at least from a quick look. And I may have spent way too long laughing at one line, imagining a Jersey guy saying "friggin' make a shared cow chain, get smarter and allocate a shared cow mapping already
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
- bellezzasolo
- Member
- Posts: 110
- Joined: Sun Feb 20, 2011 2:01 pm
Re: The Terrific Tell and Talk Thread
I've been thinking about NUMA and caching in view to doing a proper pmmngr.
Getting NUMA information is conceptually simple, just consult SRAT if it exists. Ofc, there's the DSDT processor objects, but my understanding is that they shouldn't be required at boot.
Caching is a bit more interesting, seeing as ARM big.LITTLE configurations can have cache lines of different sizes. The best approach that I can think of to store this is to store an object representing each cache, and have each CPU object refer to it. Of course, getting the complete heirarchy needs the code to run on each processor, as described in an Intel document. The problem with big.LITTLE is that the CPUs are switched on the fly, so code that depends on line size can break, no matter how much conceptual information you posess.
The solution of course is to work in lcms of all cache line sizes in the system (after all, powers of two are fairly popular). AMP can get messy!
Getting NUMA information is conceptually simple, just consult SRAT if it exists. Ofc, there's the DSDT processor objects, but my understanding is that they shouldn't be required at boot.
Caching is a bit more interesting, seeing as ARM big.LITTLE configurations can have cache lines of different sizes. The best approach that I can think of to store this is to store an object representing each cache, and have each CPU object refer to it. Of course, getting the complete heirarchy needs the code to run on each processor, as described in an Intel document. The problem with big.LITTLE is that the CPUs are switched on the fly, so code that depends on line size can break, no matter how much conceptual information you posess.
The solution of course is to work in lcms of all cache line sizes in the system (after all, powers of two are fairly popular). AMP can get messy!
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
https://github.com/ChaiSoft/ChaiOS
Re: The Terrific Tell and Talk Thread
Okay, I have a few minutes and will contribute to this thread. I think this thread is probably a good idea, so that the ScreenShot thread doesn't get over-run with non-screen shot posts.
Anyway, I have had a few generous readers send in bug reports which have got me working on some of my stuff.
For example, a fellow reader sent in a bug report just today stating that I had a bug in my MBR.asm source at line 162.
Rather than being: he suggested that it should be: This is a good point. However, and for the purpose of this thread, the code at that link is a bit old. I have since added TPM support to simply create the hash value for the boot. By doing so, because of the 512-byte size restraint, I had to size optimize the other code which, unknowingly to me, fixed the issue he pointed out. (smile).
Another reader tried my OS on a couple of his 64-bit machines and was (very) unsuccessful. Now, before I go to much further, my OS is targeted to 32-bit machines so I didn't think too much about until I got looking. Come to find out, the way I was organizing Mem-Mapped devices was causing an issue and returning a physical memory address of +0x80 passed the value it should have for certain devices. Unknown why, it just was. After a re-write of this Mem-Mapped organizer code, we started seeing a much more successful run of my OS on his machines.
However, the initial point of his intrigue was to see if my OS would recognize his xHCI controller and attached devices. Come to find out, my OS does not successfully enumerate the devices on his laptop. I have not tried my OS on a 64-bit machine with a 64-bit capable xHCI controller, so I needed to check that I was clearing the high order dword of all my addresses. After looking over my code, I found a few other small bugs/issues as well.
Unfortunately, I still haven't found the issue why it isn't working on his machine(s). Since my code works on all controllers I have tested it with, noting that they are all 32-bit only controllers on 32-bit machines, I have to guess that it is something to do with it being a 64-bit machine with a 64-bit capable controller, though it could be something else as well. Thank you to this reader's help, I have been able to narrow it down to a few things and hope to have it fixed soon. (My code retrieves the first 8 bytes of the descriptor, but fails when trying to set the address of the device)
I send him an email stating that I have a new bootable image for him to try, he boots it, takes a picture of the screen, and sends me a JPEG of the screen where the error is shown. It has been very helpful, and I thank him for it.
Granted, it would be a lot easier if he could send me the log file, but since his keyboard is attached to the xHCI root hub, and this root hub is the current issue, there is no way for him to type to save the log, nor can I hard code it to save the log since the target media device is also attached to this root hub. uuurrrgggg. :-) He has been very kind and patient by sending the pics.
I also have an issue with my EHCI code that I cannot put my finger on. It only happens every once in a while.
On a different note, the thing I am currently working on is PCI IRQ routing. I am trying to re-write my code so that from the device driver, I can request an IRQ number from the system software, accounting for PCI interrupt Pin and Interrupt Line values, PCI bridges, etc. For example, I recently hard coded an IRQ value of 5 to a PCI device and couldn't figure out why I wasn't getting interrupts. Come to find out, the value of 5 was not part of the Pin/Line combination of the PCI bridge/host it was attached to. No physical line to use IRQ 5. :-)
Well, this is my story and progress. If any of you wish to try my stuff, a (USB) bootable image can be found at http://www.fysnet.net/zips/fysos.zip (7.0 Meg) or the stripped down version that I have been sending to the reader mentioned above can be found at http://www.fysnet.net/temp/tempfysos.zip (450k). To repay the favor, if you have a (USB) bootable image, let me know and I will test it out on a few of my machines.
Thank you all for comments, the hosts of this forum, and any contributors here. This is a fun and interesting hobby.
Ben
- http://www.fysnet.net/osdesign_book_series.htm
P.S. The current build in the two links given have the graphics turned off. i.e.: It is text only. This makes it much faster and easier to test something that doesn't require graphics. After I fix the issue with the xHCI I spoke of above, I will re-build it with the graphics code where you can then run the GUI if you wish.
Anyway, I have had a few generous readers send in bug reports which have got me working on some of my stuff.
For example, a fellow reader sent in a bug report just today stating that I had a bug in my MBR.asm source at line 162.
Rather than being:
Code: Select all
jne short part_done
Code: Select all
jne short part_walk_next
Another reader tried my OS on a couple of his 64-bit machines and was (very) unsuccessful. Now, before I go to much further, my OS is targeted to 32-bit machines so I didn't think too much about until I got looking. Come to find out, the way I was organizing Mem-Mapped devices was causing an issue and returning a physical memory address of +0x80 passed the value it should have for certain devices. Unknown why, it just was. After a re-write of this Mem-Mapped organizer code, we started seeing a much more successful run of my OS on his machines.
However, the initial point of his intrigue was to see if my OS would recognize his xHCI controller and attached devices. Come to find out, my OS does not successfully enumerate the devices on his laptop. I have not tried my OS on a 64-bit machine with a 64-bit capable xHCI controller, so I needed to check that I was clearing the high order dword of all my addresses. After looking over my code, I found a few other small bugs/issues as well.
Unfortunately, I still haven't found the issue why it isn't working on his machine(s). Since my code works on all controllers I have tested it with, noting that they are all 32-bit only controllers on 32-bit machines, I have to guess that it is something to do with it being a 64-bit machine with a 64-bit capable controller, though it could be something else as well. Thank you to this reader's help, I have been able to narrow it down to a few things and hope to have it fixed soon. (My code retrieves the first 8 bytes of the descriptor, but fails when trying to set the address of the device)
I send him an email stating that I have a new bootable image for him to try, he boots it, takes a picture of the screen, and sends me a JPEG of the screen where the error is shown. It has been very helpful, and I thank him for it.
Granted, it would be a lot easier if he could send me the log file, but since his keyboard is attached to the xHCI root hub, and this root hub is the current issue, there is no way for him to type to save the log, nor can I hard code it to save the log since the target media device is also attached to this root hub. uuurrrgggg. :-) He has been very kind and patient by sending the pics.
I also have an issue with my EHCI code that I cannot put my finger on. It only happens every once in a while.
On a different note, the thing I am currently working on is PCI IRQ routing. I am trying to re-write my code so that from the device driver, I can request an IRQ number from the system software, accounting for PCI interrupt Pin and Interrupt Line values, PCI bridges, etc. For example, I recently hard coded an IRQ value of 5 to a PCI device and couldn't figure out why I wasn't getting interrupts. Come to find out, the value of 5 was not part of the Pin/Line combination of the PCI bridge/host it was attached to. No physical line to use IRQ 5. :-)
Well, this is my story and progress. If any of you wish to try my stuff, a (USB) bootable image can be found at http://www.fysnet.net/zips/fysos.zip (7.0 Meg) or the stripped down version that I have been sending to the reader mentioned above can be found at http://www.fysnet.net/temp/tempfysos.zip (450k). To repay the favor, if you have a (USB) bootable image, let me know and I will test it out on a few of my machines.
Thank you all for comments, the hosts of this forum, and any contributors here. This is a fun and interesting hobby.
Ben
- http://www.fysnet.net/osdesign_book_series.htm
P.S. The current build in the two links given have the graphics turned off. i.e.: It is text only. This makes it much faster and easier to test something that doesn't require graphics. After I fix the issue with the xHCI I spoke of above, I will re-build it with the graphics code where you can then run the GUI if you wish.
- Kazinsal
- Member
- Posts: 559
- Joined: Wed Jul 13, 2011 7:38 pm
- Libera.chat IRC: Kazinsal
- Location: Vancouver
- Contact:
Re: The Terrific Tell and Talk Thread
This is a pretty neat idea for a thread.
I've never actually really posted about my project on the forums. It's come up in discussion and in passing in #osdev on Freenode, since I spent a lot more time hanging around there than I do posting here, but I don't think I've ever explained what it *is*. Put simply, I'm designing and developing a router.
Okay, it's a bit more complex than that. I'm not a programmer by trade or by formal education; I'm a network engineer. I work on networks all day long, touching everything from straightforward campus ethernet and IP networks to multinational WANs utilizing multipoint VPN protocols living on multiple types of transports. My project is intended to produce a relatively simple environment that can be used to turn spare hardware into a solid device that can route between IP networks, and be configured without needing to look up piles of conf file documentation.
Now, of course, it'd be fun to be the person saying "heck yeah I'm going to take on $incumbent_router_manufacturer" but I have no expectation of being able to do that. I can't write and maintain enterprise-grade routing software in my spare time. I can't design and fab up IP forwarding ASICs. Hell, I don't even know where I'd go about OEMing some 1U x86-based routers. What I'm doing is mostly just for my own entertainment and also to get a bit more lower-than-low-level knowledge of networking, in case it might come in handy in my career. But as far as this hobby goes, it serves as being something relatively unique. I can't recall ever seeing a hobby OS project that's designed to be a router.
I'm a big proponent of setting real-world usability goals in the more advanced "levels" of hobby operating systems development. This is mine.
What have I been working on recently? Well, eliminating stability problems and loss issues, recently. I've taken a few weeks off from working on it, so coming back to the issues I had before (5-8% packet loss, mass forwarding failures after too high traffic spikes) with a fresh set of troubleshooting plans has done wonders. Currently, doing incredible naïve forwarding including full routing table lookups, MAC table lookups, packet copying, and input/output queue processing for every packet, I'm maintaining a 1200 byte packet forwarding rate of about 120 Mbps when testing with ESXi 6.5 and its relatively terrible Intel NIC emulation. This is honestly pretty damn slow, but considering there's effectively no forwarding optimization going on here, and I'm fairly certain I'm being limited by the sheer number of VMEXITs that are being done, it's not bad.
Theoretically, the forwarding engine is capable of being used in production for my lab network. However, I'm not quite sure of its stability. I'm leaving it forwarding as fast as it can overnight to see if it holds up. If it can stay alive overnight, I'll add a third interface to the router VM, put it on my production vSwitch in ESXi, and configure the interface in my OS to have an IP on my "production" home network. Then we'll see how it handles traffic from real machines.
My next goals are:
I've never actually really posted about my project on the forums. It's come up in discussion and in passing in #osdev on Freenode, since I spent a lot more time hanging around there than I do posting here, but I don't think I've ever explained what it *is*. Put simply, I'm designing and developing a router.
Okay, it's a bit more complex than that. I'm not a programmer by trade or by formal education; I'm a network engineer. I work on networks all day long, touching everything from straightforward campus ethernet and IP networks to multinational WANs utilizing multipoint VPN protocols living on multiple types of transports. My project is intended to produce a relatively simple environment that can be used to turn spare hardware into a solid device that can route between IP networks, and be configured without needing to look up piles of conf file documentation.
Now, of course, it'd be fun to be the person saying "heck yeah I'm going to take on $incumbent_router_manufacturer" but I have no expectation of being able to do that. I can't write and maintain enterprise-grade routing software in my spare time. I can't design and fab up IP forwarding ASICs. Hell, I don't even know where I'd go about OEMing some 1U x86-based routers. What I'm doing is mostly just for my own entertainment and also to get a bit more lower-than-low-level knowledge of networking, in case it might come in handy in my career. But as far as this hobby goes, it serves as being something relatively unique. I can't recall ever seeing a hobby OS project that's designed to be a router.
I'm a big proponent of setting real-world usability goals in the more advanced "levels" of hobby operating systems development. This is mine.
What have I been working on recently? Well, eliminating stability problems and loss issues, recently. I've taken a few weeks off from working on it, so coming back to the issues I had before (5-8% packet loss, mass forwarding failures after too high traffic spikes) with a fresh set of troubleshooting plans has done wonders. Currently, doing incredible naïve forwarding including full routing table lookups, MAC table lookups, packet copying, and input/output queue processing for every packet, I'm maintaining a 1200 byte packet forwarding rate of about 120 Mbps when testing with ESXi 6.5 and its relatively terrible Intel NIC emulation. This is honestly pretty damn slow, but considering there's effectively no forwarding optimization going on here, and I'm fairly certain I'm being limited by the sheer number of VMEXITs that are being done, it's not bad.
Theoretically, the forwarding engine is capable of being used in production for my lab network. However, I'm not quite sure of its stability. I'm leaving it forwarding as fast as it can overnight to see if it holds up. If it can stay alive overnight, I'll add a third interface to the router VM, put it on my production vSwitch in ESXi, and configure the interface in my OS to have an IP on my "production" home network. Then we'll see how it handles traffic from real machines.
My next goals are:
- Reduce the amount of time I spent in the forwarding process by caching MAC header rewrites, implementing a FIB, reducing copy amounts.
- Acquire more real machines so I don't have to rely on emulated NICs and be constrained by VMEXIT performance.
- Implement network address translation.
-
- Member
- Posts: 232
- Joined: Mon Jul 25, 2016 6:54 pm
- Location: Adelaide, Australia
Re: The Terrific Tell and Talk Thread
I'm quite excited to see this thread I think it's a good idea to have a place on the forums where people can just discuss what their working on without necessarily asking a question or anything. It's just in time for me personally too, as my project has finally crossed the border (in my mind) from bare metal 386 executable to operating system, so I finally have something to say about it.
Today, for the first time I have a system where an arbitrary number of unprivileged ELF binaries (in GRUB modules) can run indefinitely side by side in separate address spaces with the kernel switching between them. Technically I have user mode applications running on an operating system. yay!
To get to this point, I have implemented physical and virtual memory management (including kernel heap), process control, interrupt handling (including some syscalls) and the ability to load a subset of ELFs. From here I feel things open up, almost a scary amount! Of course this is hardly ground breaking stuff, but it's quite thrilling to hit this point for the first time.
Today, for the first time I have a system where an arbitrary number of unprivileged ELF binaries (in GRUB modules) can run indefinitely side by side in separate address spaces with the kernel switching between them. Technically I have user mode applications running on an operating system. yay!
To get to this point, I have implemented physical and virtual memory management (including kernel heap), process control, interrupt handling (including some syscalls) and the ability to load a subset of ELFs. From here I feel things open up, almost a scary amount! Of course this is hardly ground breaking stuff, but it's quite thrilling to hit this point for the first time.
- bellezzasolo
- Member
- Posts: 110
- Joined: Sun Feb 20, 2011 2:01 pm
Re: The Terrific Tell and Talk Thread
Nice progress! An HDD driver could be a good next step, ditch the module requirement. Ofc, the proliferation of SATA makes that a bit more fun.StudlyCaps wrote:I'm quite excited to see this thread I think it's a good idea to have a place on the forums where people can just discuss what their working on without necessarily asking a question or anything. It's just in time for me personally too, as my project has finally crossed the border (in my mind) from bare metal 386 executable to operating system, so I finally have something to say about it.
Today, for the first time I have a system where an arbitrary number of unprivileged ELF binaries (in GRUB modules) can run indefinitely side by side in separate address spaces with the kernel switching between them. Technically I have user mode applications running on an operating system. yay!
To get to this point, I have implemented physical and virtual memory management (including kernel heap), process control, interrupt handling (including some syscalls) and the ability to load a subset of ELFs. From here I feel things open up, almost a scary amount! Of course this is hardly ground breaking stuff, but it's quite thrilling to hit this point for the first time.
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
https://github.com/ChaiSoft/ChaiOS
-
- Member
- Posts: 501
- Joined: Wed Jun 17, 2015 9:40 am
- Libera.chat IRC: glauxosdever
- Location: Athens, Greece
Re: The Terrific Tell and Talk Thread
Hi,
Regards,
glauxosdever
Congratulations on your progress! I'd say it's definitely an operating system now, even if you are using a ramdisk or similar modules. Now, as you said, things open up. You can go in several directions: storage, filesystems, networking, graphics, audio, userspace, porting (if applicable), etc. Do whatever you like!StudlyCaps wrote:Today, for the first time I have a system where an arbitrary number of unprivileged ELF binaries (in GRUB modules) can run indefinitely side by side in separate address spaces with the kernel switching between them. Technically I have user mode applications running on an operating system. yay!
To get to this point, I have implemented physical and virtual memory management (including kernel heap), process control, interrupt handling (including some syscalls) and the ability to load a subset of ELFs. From here I feel things open up, almost a scary amount! Of course this is hardly ground breaking stuff, but it's quite thrilling to hit this point for the first time.
Or an ATAPI driver, it depends where are you booting from. But one would probably also need a filesystem driver too in order for a storage driver to be useful.bellezzasolo wrote:Nice progress! An HDD driver could be a good next step, ditch the module requirement. Ofc, the proliferation of SATA makes that a bit more fun.
Regards,
glauxosdever
-
- Member
- Posts: 232
- Joined: Mon Jul 25, 2016 6:54 pm
- Location: Adelaide, Australia
Re: The Terrific Tell and Talk Thread
Thanks for the kind words guys! I'm currently adding a basic serial port driver to make debugging a bit easier, but I like the idea of adding drive support next, from there I can load programs at run time, then I think a user shell makes sense.
Though I'm booting from cdrom, I might go straight for AHCI and then implementing ATAPI support from there, then SATA.
Though I'm booting from cdrom, I might go straight for AHCI and then implementing ATAPI support from there, then SATA.
Re: The Terrific Tell and Talk Thread
This is a cool project. I wonder what the performance implications are of designing and owning the entire stack for one explicit purpose, or what doing so would allow one to do that otherwise couldn't.Kazinsal wrote: Okay, it's a bit more complex than that. I'm not a programmer by trade or by formal education; I'm a network engineer. I work on networks all day long, touching everything from straightforward campus ethernet and IP networks to multinational WANs utilizing multipoint VPN protocols living on multiple types of transports. My project is intended to produce a relatively simple environment that can be used to turn spare hardware into a solid device that can route between IP networks, and be configured without needing to look up piles of conf file documentation.
Welcome aboard! I'd agree with glauxosdever, getting to full memory and cpu management crosses the line from bootable program to the beginnings of an operating system. Exciting times will be a plenty now. If you start to feel a bit of analysis paralysis about where to go next, pick one thing that you'd like to define your project with and start going down that path.StudlyCaps wrote: o get to this point, I have implemented physical and virtual memory management (including kernel heap), process control, interrupt handling (including some syscalls) and the ability to load a subset of ELFs. From here I feel things open up, almost a scary amount! Of course this is hardly ground breaking stuff, but it's quite thrilling to hit this point for the first time.
@Kazinsal and StudlyCaps do you have your projects on Github or somewhere you'd like to share?
Re: The Terrific Tell and Talk Thread
Great idea for a thread! I'll happily share one of my more recent troubles that took me atleast a week to fix. I've spent the last few months making sure that my graphic stack is running smoothly on 64 bit as it were on 32 bit, which meant getting LLVM and Mesa3D to play nicely together with my operating system, and during rendering I kept getting a strange memory fault in dlfree (dlmalloc), it seamed that memory were being corrupted.
By further inspection i then found out memory were not being corrupted, but rather LLVM was calling free on an invalid address. The address being freed was 32 bytes larger than the address that was returned by malloc. So i then had to dig down into both mesa3d and LLVM to figure out where the address was being allocated, why the wrong address was being freed and where it was being used. It took me atleast one or two days to figure out that the address was both allocated correctly and used correctly, BUT it actually were during initialization of a stack object due.
Due to optimizations it used XMM6 and XMM7 registers to initialize the stack objects since they were very large in size, which I first thought was weird, why not use XMM0 and XMM1? Well by researching this I then found out that XMM6 and XMM7 are non-volatile, while XMM0-5 are volatile, which the function relied on being empty suddenly contained various addresses.
This immediately made me suspecious about usage of my memcpy where I've written both SSE and SSE2 versions that use these registers, and correctly enough it was due to not saving XMM6 and XMM7 registers in 64 bit mode, which resulted in me overwriting the cleared XMM6 register. This caused the object to be initialized with existing values instead of being initialized with zeros.
This was also an issue in the 32 bit versions, but the optimizer did not use those registers when building the 32 bit image.
What'd you know.
By further inspection i then found out memory were not being corrupted, but rather LLVM was calling free on an invalid address. The address being freed was 32 bytes larger than the address that was returned by malloc. So i then had to dig down into both mesa3d and LLVM to figure out where the address was being allocated, why the wrong address was being freed and where it was being used. It took me atleast one or two days to figure out that the address was both allocated correctly and used correctly, BUT it actually were during initialization of a stack object due.
Due to optimizations it used XMM6 and XMM7 registers to initialize the stack objects since they were very large in size, which I first thought was weird, why not use XMM0 and XMM1? Well by researching this I then found out that XMM6 and XMM7 are non-volatile, while XMM0-5 are volatile, which the function relied on being empty suddenly contained various addresses.
This immediately made me suspecious about usage of my memcpy where I've written both SSE and SSE2 versions that use these registers, and correctly enough it was due to not saving XMM6 and XMM7 registers in 64 bit mode, which resulted in me overwriting the cleared XMM6 register. This caused the object to be initialized with existing values instead of being initialized with zeros.
This was also an issue in the 32 bit versions, but the optimizer did not use those registers when building the 32 bit image.
What'd you know.
Re: The Terrific Tell and Talk Thread
Your project sounds very interesting. Have you considered implementing a driver for a paravirtualized NIC (such a virtio-net)? I do not know what ESXi supports in that direction; for qemu/kvm, however, the virtio interfaces give a huge performance boost.Kazinsal wrote:What have I been working on recently? Well, eliminating stability problems and loss issues, recently. I've taken a few weeks off from working on it, so coming back to the issues I had before (5-8% packet loss, mass forwarding failures after too high traffic spikes) with a fresh set of troubleshooting plans has done wonders. Currently, doing incredible naïve forwarding including full routing table lookups, MAC table lookups, packet copying, and input/output queue processing for every packet, I'm maintaining a 1200 byte packet forwarding rate of about 120 Mbps when testing with ESXi 6.5 and its relatively terrible Intel NIC emulation. This is honestly pretty damn slow, but considering there's effectively no forwarding optimization going on here, and I'm fairly certain I'm being limited by the sheer number of VMEXITs that are being done, it's not bad.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
Re: The Terrific Tell and Talk Thread
Over the course of two weeks spent at a small vacation home at the North Sea coast, I spent quite some hours working on PDCLib, my C standard library implementation aimed specifically at compliance, minimalism (i.e. no "extensions"), and adaptability to your OS of choice. The project, for those who don't know, is what's left of my own OS project from the early 2000's, and explicitly geared toward the OSDev crowd (although I learned it has been adopted in a couple other places as well, including an inofficial XBox SDK, which I think is awesome).pat wrote:Have you recently added a new feature to your OS, or accomplished some milestone?
The things I achieved in these two weeks, aside from recharging my batteries and spending some quality time with wife and kids:
- Improving the recent switch to CMake as PDCLib's build system;
- Implementing symbol visibility (i.e., __attribute__("hidden") / dllexport);
- Integration of dlmalloc; PDCLib now has decent memory management.
- Implementation of C11's <threads.h> as a wrapper to Pthreads.
The very stop-gap memory allocation that had been part of PDCLib since 2005 had been a sore point for me ever since. I always intended to, eventually, include dlmalloc or some other PD memory allocator, but never actually got around to it. Erin Shepherd did it while she was maintaining PDCLib, but it was an older version of dlmalloc, and I could not really make heads or tails out of the modifications / config settings that had been done, so I could not adapt that solution easily to my branch of the library.
(*)
What I did now was taking the lastest version of dlmalloc (2.8.6), and "plug it in" to PDCLib. (Both dlmalloc and PDCLib are CC0 licensed, so that wasn't an issue.) It now resides in functions/_dlmalloc, the underscore of the subdirectory indicating that this code is "internal" (like _PDCLIB), not implementing a specific header (the other directories being string, stdlib, locale, you get the idea). Aside from the first couple of lines, in which I make some configuration defines, includes and whatnot, I aimed for leaving Doug's source as untouched as possible. Then I created a diff file which I added to the repository, so that anybody can see at a glance what changes I made to v2.8.6 of Doug's malloc.c, enabling people to easily apply the same changes to upcoming versions, or different forks of Doug's work.
There are some subtle things to fix some compiler warnings I got with the rather strict warning settings employed by PDCLib's build system. I defined USE_DL_PREFIX, making all the functions use the dl prefix (dlmalloc, dlfree, dlcalloc, ...). I also forwarded DLMALLOC_EXPORT to _PDCLIB_LOCAL, so that all of dlmalloc's internals remain invisible from the outside. I then "unmasked" the standard functions...
Code: Select all
#define dlmalloc malloc
#define dlcalloc calloc
#define dlrealloc realloc
#define dlfree free
#if __STDC_VERSION__ >= 201112L
#define dlmemalign aligned_alloc
#endif
This adds some requirements on the hosting OS (dlmalloc makes use of a variety of system calls), but I feel it's a massive improvement over my my own stopgap implementation (which used only brk / sbrk, but never gave memory back to the OS until the process terminated, fragmented like crazy and was all kinds of ineffective).
----
I then went ahead and implemented <threads.h> as a wrapper to pthread.
<threads.h> isn't part of the C99 standard, having made its appearance only in C11 (which I so far considered out-of-scope for PDCLib v1.0). But I perfectly understood the various requests to have PDCLib made thread-safe. The easiest, and most portable, way to do so was to implement <threads.h> and then use those facilities to get thread-safety (most noteably in <stdio.h>).
The implementation was actually pretty easy. C11's <threads.h> is modelled after pthread anyway, only differing in the types (e.g. thrd_t instead of pthread_t, mtx_t instead of pthread_mutex_t and so on) and the names of the functions.
A snag was that no PDCLib header must actually include <pthread.h>. The user's namespace needs to be kept clean, so dragging in all the identifiers declared by <pthread.h> was a no-go. The solution I ended up with was a helper program, pthread_readout.c, which -- compiled separately -- includes <pthread.h>, takes its definitions, and uses them to generate the lines of code that need to be included in _PDCLIB_config.h to ensure that PDCLib's data types and pthread's data types are actually cast-compatible, and can be used without having to marshal / unmarshal them on every function call.
Adapting to other threading API's should be possible, either by similar direct-cast compatibility, or more involved marshalling / unmarshalling.
Once that had been done, I went through <stdio.h> and added conditionally-compiled mutex locks to the FILE structure, so that thread safety can be had. (Conditionally compiled, as a platform might not have threading support, and I wouldn't want to inflict compilation failure on such a platform. Remember, PDCLib is aimed at you lot, so it's "making do" with what's actually there.) The most difficult part was, again, figuring out all the bold assumptions made in Linux space (GCC making statements on the standard library's capabilities, for example, by predefining __STDC_NO_THREADS__ as glibc eschews <threads.h>)...
What remains to be done is shoving errno into thread-specific storage, as required by the standard. That sounds rather easy, until you get to the point where you have to initialize that on process startup... at this point, PDCLib doesn't hook into pre-main() stuff at all, but uses static initialization for all its runtime support. I'd much prefer to keep it that way (easier to interface with the host OS), but my vacation is over and I have to get back into the rut of my 9-to-5 job instead of figuring that one out right now.
Oh, and I realized freopen() is probably broken in more ways than one (doesn't remove closed streams from the list of open ones, for one thing), but that's small fish compared to what's been achieved.
----
With the above modifications, I am very close to the point where I can actually claim to have overtaken the Shepherd branch of PDCLib. That had been irking me to no end -- I applaud Erin for the work she did on PDCLib, but felt I couldn't continue where she left off. So I had forked my own branch at the point where I left off, leaving PDCLib with an unsupported but more feature-complete (Shepherd) branch, and my supported but inferior branch (trunk/master). Being at the point where I can wholeheartedly can say "use trunk, it's got everything the other branch has and more" is a really good feeling of achievement. No more being of two minds which branch to recommend...
----
(*): Yes I know dlmalloc is no longer considered "the cool sh*t". I know there is ptmalloc, and yet other allocators. I will probably look into making them available as options. For now dlmalloc was a) code I had looked at before, b) code that had been "touched" by the maintainer more recently than alternatives I browsed through at a glance, c) code that's also CC0 (hence does not require additional licensing legalese), and d) so much of an improvement over what was previously in PDCLib that merits relative to other available allocator implementations didn't really matter.
Every good solution is obvious once you've found it.