Avoiding code bloat and overhead, particularly in C++
- zesterer
- Member
- Posts: 59
- Joined: Mon Feb 22, 2016 4:40 am
- Libera.chat IRC: zesterer
- Location: United Kingdom
- Contact:
Avoiding code bloat and overhead, particularly in C++
Hello,
I recently (2 months ago) began rewriting my kernel in C rather than C++. I love C++ as a language: it's powerful, usually quite fast, and it has a lot of features that I find are missing in C. My switch to C was therefore not due to dislike of the language, but had more to do with the way in which my style of coding had a tendency to introduce bloat and needless overhead to the code while sucking up time that could be spent writing features.
Now, I'm considering switching back to C++. My reasoning for this is simple: I miss many features that C++ had. Destructors, templating, some OO features, namespaces, etc. These features are of course not essential and I've been managing without them, but I find myself unintentionally wrapping C features up in a way that mirrors how I'd structure my code in C++.
My question is this: How can I avoid bloat and overhead in my code? Do you have any 'top-tips'? I've long since learned that avoiding OO is usually a good idea when trying to reduce bloat. Do any of you have any other strategies for keeping your code clean, efficient, and readable? I'm a full-time student with multiple ongoing projects, so I usually only get about 6-8 hours a week working on my OS. I don't want to spend that time implementing things that have little or no useful impact on the functionality of the OS.
Thanks for reading,
Joshua Barretto
I recently (2 months ago) began rewriting my kernel in C rather than C++. I love C++ as a language: it's powerful, usually quite fast, and it has a lot of features that I find are missing in C. My switch to C was therefore not due to dislike of the language, but had more to do with the way in which my style of coding had a tendency to introduce bloat and needless overhead to the code while sucking up time that could be spent writing features.
Now, I'm considering switching back to C++. My reasoning for this is simple: I miss many features that C++ had. Destructors, templating, some OO features, namespaces, etc. These features are of course not essential and I've been managing without them, but I find myself unintentionally wrapping C features up in a way that mirrors how I'd structure my code in C++.
My question is this: How can I avoid bloat and overhead in my code? Do you have any 'top-tips'? I've long since learned that avoiding OO is usually a good idea when trying to reduce bloat. Do any of you have any other strategies for keeping your code clean, efficient, and readable? I'm a full-time student with multiple ongoing projects, so I usually only get about 6-8 hours a week working on my OS. I don't want to spend that time implementing things that have little or no useful impact on the functionality of the OS.
Thanks for reading,
Joshua Barretto
Current developing Tupai, a monolithic x86 operating system
http://zesterer.homenet.org/projects.shtml
http://zesterer.homenet.org/projects.shtml
- max
- Member
- Posts: 616
- Joined: Mon Mar 05, 2012 11:23 am
- Libera.chat IRC: maxdev
- Location: Germany
- Contact:
Re: Avoiding code bloat and overhead, particularly in C++
Hey,
that's a good question. I'm using C++ too and I currently have a tendency to just write C-style code, using simple functions for the most part and only using classes where I have a real reason to. A positive aspect is that you avoid the overhead of wrapping it in a class with a lot of static methods... a negative aspect is that you need to take care that your function names are unique.
What looks most bloated to you/creates the most overhead in your code?
Greets
that's a good question. I'm using C++ too and I currently have a tendency to just write C-style code, using simple functions for the most part and only using classes where I have a real reason to. A positive aspect is that you avoid the overhead of wrapping it in a class with a lot of static methods... a negative aspect is that you need to take care that your function names are unique.
What looks most bloated to you/creates the most overhead in your code?
Greets
Re: Avoiding code bloat and overhead, particularly in C++
You'll have to be more precise if you want a good answer to that question. Can you give us an example where C++ causes overhead compared to C?
In general use the right tools for the job. C++ does not force you to use complex classes, inheritance, templates and other features if you don't want to. You can just write C code and use C++ to augment that where it makes sense.
Regarding OOP: The suggestion that OOP should be avoided makes me think that you confuse OOP and use of complex classes or deep inheritance hierarchies. Those notions do not coincide. In fact, overly complex classes and deep inheritance hierarchies should be avoided even in application-level programming. As a good example of how OOP can be applied while avoiding these pitfalls, take a look at FreeType, Cairo or the Linux kernel's device subsystem.
In general use the right tools for the job. C++ does not force you to use complex classes, inheritance, templates and other features if you don't want to. You can just write C code and use C++ to augment that where it makes sense.
Regarding OOP: The suggestion that OOP should be avoided makes me think that you confuse OOP and use of complex classes or deep inheritance hierarchies. Those notions do not coincide. In fact, overly complex classes and deep inheritance hierarchies should be avoided even in application-level programming. As a good example of how OOP can be applied while avoiding these pitfalls, take a look at FreeType, Cairo or the Linux kernel's device subsystem.
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].
-
- Member
- Posts: 595
- Joined: Mon Jul 05, 2010 4:15 pm
Re: Avoiding code bloat and overhead, particularly in C++
C++ can blow up if you use it a way that is not appropriate for operating systems or embedded systems. I have actually seen how expert programmers create code that blows up at the same time think they are very smart for making complex templates. For operating systems, keep it simple so that the compiler can optimize. C++ is a complex language and as it becomes more complex as well as the code become more complex, it is more likely that the compiler don't know how to optimize.
Don't exercise the type system too much when using templates. This is different from compiler to compiler. I've seen compilers that create double instances of the code for a template that use two different types but derived from the same type so the basic type is exactly the same. Don't assume the the C++ compiler is intelligent. When using node based template algorithms, it can be better to use member nodes and have pointer to member and member to pointer in an outer template class and then let an inner template class do the work on the member node, this might increase code reusage for example.
Don't exercise the type system too much when using templates. This is different from compiler to compiler. I've seen compilers that create double instances of the code for a template that use two different types but derived from the same type so the basic type is exactly the same. Don't assume the the C++ compiler is intelligent. When using node based template algorithms, it can be better to use member nodes and have pointer to member and member to pointer in an outer template class and then let an inner template class do the work on the member node, this might increase code reusage for example.
-
- Member
- Posts: 1146
- Joined: Sat Mar 01, 2014 2:59 pm
Re: Avoiding code bloat and overhead, particularly in C++
There's nothing wrong with implementing OO-like functionality in C. For a long time I've written C code that defines structures for each "type"/"class" of "object" and then functions that take as their first parameter a pointer to one of these structures. Where needed I also write a "destroy"/"free" function that takes a pointer to the structure, performs any required cleanup, and deallocates the memory. There's nothing wrong with doing this, obviously you can't have inheritance like this (unless you use a union, or make another structure that has the same members as the parent structure with additional members at the end, but I don't recommend either of these for a number of reasons) but the overhead is going to be smaller than whatever a full-blown OOP language like C++ produces and the code is (I find, anyway) a lot simpler to read and maintain.
And you can name your functions in a namespace-like manner to avoid naming conflicts. I *always* prefix function names with the name of whatever module they belong to (and the names of any sub-modules if applicable). Also make as many functions "static" as possible, in C the "static" keyword is like "private" in an OOP language, it restricts the visibility of a variable or function to the source code file in which it's defined (or is it to any code in the same object file? I have no idea because I always compile one source file to one object file) - this is great for avoiding naming conflicts although personally I tend to still use the same naming convention as for other functions simply because it makes the code more readable.
TL;DR There's no problem with using C in an OO-like manner and no need to force yourself to use C++ when you feel that C is more suitable for your project/coding style/workflow/whatever.
And you can name your functions in a namespace-like manner to avoid naming conflicts. I *always* prefix function names with the name of whatever module they belong to (and the names of any sub-modules if applicable). Also make as many functions "static" as possible, in C the "static" keyword is like "private" in an OOP language, it restricts the visibility of a variable or function to the source code file in which it's defined (or is it to any code in the same object file? I have no idea because I always compile one source file to one object file) - this is great for avoiding naming conflicts although personally I tend to still use the same naming convention as for other functions simply because it makes the code more readable.
TL;DR There's no problem with using C in an OO-like manner and no need to force yourself to use C++ when you feel that C is more suitable for your project/coding style/workflow/whatever.
When you start writing an OS you do the minimum possible to get the x86 processor in a usable state, then you try to get as far away from it as possible.
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Syntax checkup:
Wrong: OS's, IRQ's, zero'ing
Right: OSes, IRQs, zeroing
Re: Avoiding code bloat and overhead, particularly in C++
One of the basic maxims of C++ is, you don't pay for what you don't use.
Example 1, OOP. The abstract idea is to keep state, and code operating on that state, in one place. That's basically a way to keep relevant things in one place (the class definition). You can mimic that in C, C++ just makes it easier by adding syntactic sugar. Whether you are lugging around a "data pointer" explicitly or a "this pointer" implicitly doesn't make a difference.
What most people mean when they talk about "OOP overhead" is when you're looking at virtual functions, polymorphic code, that kind of stuff. But you don't have to. You can use classes just like you would use C structures, without coming up with complex class hierarchies -- and implementing such hierarchies in C would mean going through multiple pointer indirections as well. So, use it if it is necessary, and if not, don't use it. OOP is a means to an end -- organizing your code -- not an end in itself, making everything derive from "Object". (The way Java does it to get around the limitations imposed by the JVM.)
The same for templates. Yes, sure, they can blow up your code. But they can also reduce it. You don't use templates to make "cooler" code, you use templates when you'd otherwise write multiple implementations of the same algorithm anyway. One way to keep this under control, if you can't trust your subconsciousness, is to put only the declaration in the header, and the definition in a source file (as you would with a "normal" class / function) -- and instantiate the template explicitly in the source file, for exactly those types you want it instantiated for. The compiler will tell you when you use that template for types you didn't forsee (by failing loud and early, the way it should be). This also keeps recompilation times down as well...
I could go on in similar tones for RTTI, exceptions, you-name-it. As with any other language, you need to know what you are doing. And I admit that C++ makes this harder than C did, because it is the more complex language.
But the abilities granted by deterministic destructors alone -- namely RAII and the capability to handle resources securely and safely -- is something I wouldn't want to do without. And something I sorely miss whenever I am "forced" by circumstances to do plain C.
Iterators and the separation of containers and algorithms come a close second.
Just one thing. Remember to make your public ABI plain, "extern" C. The C++ ABI is not stable, and not portable even among compilers on the same platform. When talking to other languages or components, you really want to be "plain C", as it is the lingua franca for software components. As you can pass out opaque pointers to C++ classes through a C API (think "FILE *" here), that isn't really a limitation.
Example 1, OOP. The abstract idea is to keep state, and code operating on that state, in one place. That's basically a way to keep relevant things in one place (the class definition). You can mimic that in C, C++ just makes it easier by adding syntactic sugar. Whether you are lugging around a "data pointer" explicitly or a "this pointer" implicitly doesn't make a difference.
What most people mean when they talk about "OOP overhead" is when you're looking at virtual functions, polymorphic code, that kind of stuff. But you don't have to. You can use classes just like you would use C structures, without coming up with complex class hierarchies -- and implementing such hierarchies in C would mean going through multiple pointer indirections as well. So, use it if it is necessary, and if not, don't use it. OOP is a means to an end -- organizing your code -- not an end in itself, making everything derive from "Object". (The way Java does it to get around the limitations imposed by the JVM.)
The same for templates. Yes, sure, they can blow up your code. But they can also reduce it. You don't use templates to make "cooler" code, you use templates when you'd otherwise write multiple implementations of the same algorithm anyway. One way to keep this under control, if you can't trust your subconsciousness, is to put only the declaration in the header, and the definition in a source file (as you would with a "normal" class / function) -- and instantiate the template explicitly in the source file, for exactly those types you want it instantiated for. The compiler will tell you when you use that template for types you didn't forsee (by failing loud and early, the way it should be). This also keeps recompilation times down as well...
I could go on in similar tones for RTTI, exceptions, you-name-it. As with any other language, you need to know what you are doing. And I admit that C++ makes this harder than C did, because it is the more complex language.
But the abilities granted by deterministic destructors alone -- namely RAII and the capability to handle resources securely and safely -- is something I wouldn't want to do without. And something I sorely miss whenever I am "forced" by circumstances to do plain C.
Iterators and the separation of containers and algorithms come a close second.
Just one thing. Remember to make your public ABI plain, "extern" C. The C++ ABI is not stable, and not portable even among compilers on the same platform. When talking to other languages or components, you really want to be "plain C", as it is the lingua franca for software components. As you can pass out opaque pointers to C++ classes through a C API (think "FILE *" here), that isn't really a limitation.
Every good solution is obvious once you've found it.
Re: Avoiding code bloat and overhead, particularly in C++
I find this discussion of moving towards C from C++ slightly bizarre, since I'm doing the total opposite!
Originally, my kernel was written in a "C/C++" or "C with (a few) classes" style. Over time, this has become difficult to maintain, understand and extend. It doesn't help that "first attempt" implementations often lack "direction" and are "designed" as one goes (although I maintain that the best way to get a proper understanding of a programming problem is to write an implementation; even if it's terrible, you've learned more about the problem).
I'm (slowly) working my way through my OS moving towards a more "OO", better designed structure. I've re-written my kernel's memory manager, written OO wrappers for my entire userspace API in such a way that it's basically become something of an "application framework" and am planning to re-write most of the rest of my kernel subsystems in similar style (i.e. vfs, device management, process management, possibly even parts of the scheduler).
The C++ "unstable" ABI issue doesn't really matter much for a hobby OS where you're unlikely to have more than one C++ compiler anyway. I do plan to upgrade from my aging GCC 4.8 to GCC 8.0 when it's released.
RTTI and exceptions are not used in my kernel (and I have no plans to use them), but are used in userspace. Templates are used a fair amount (and will likely increase), but since they add zero runtime overhead (apart from possibly the issue of slightly increased code size by having the same template instantiated in multiple object modules; e.g. kernel modules or shared libraries), there's no reason not to use them. They're certainly better than the alternatives; function-like macros and multiple manually-written algorithm implementations.
I've recently written (in userspace) a basic SQLite ORM and an "RPC" system based on my OS's native IPC messaging system, both making extensive use of templates. It's massively easier to write something like "rpc::NewProcServer<0>(rpc::make_function(&GetAssociation));" (actual code using my RPC system) than it is to manually implement the "server" for each individual procedure an application makes available... I'm thinking about making something along the same lines to make syscalls easier...
Deep inheritance hierarchies aren't necessary, but certainly make things easier when used correctly. In userspace, I have a base class for everything that is represented by a "handle" and subclasses for each specific kind of handle (file, thread, device, memory-mapped region, etc.). I intend to so something similar in kernel-space for the new VFS and device driver systems.
Originally, my kernel was written in a "C/C++" or "C with (a few) classes" style. Over time, this has become difficult to maintain, understand and extend. It doesn't help that "first attempt" implementations often lack "direction" and are "designed" as one goes (although I maintain that the best way to get a proper understanding of a programming problem is to write an implementation; even if it's terrible, you've learned more about the problem).
I'm (slowly) working my way through my OS moving towards a more "OO", better designed structure. I've re-written my kernel's memory manager, written OO wrappers for my entire userspace API in such a way that it's basically become something of an "application framework" and am planning to re-write most of the rest of my kernel subsystems in similar style (i.e. vfs, device management, process management, possibly even parts of the scheduler).
The C++ "unstable" ABI issue doesn't really matter much for a hobby OS where you're unlikely to have more than one C++ compiler anyway. I do plan to upgrade from my aging GCC 4.8 to GCC 8.0 when it's released.
RTTI and exceptions are not used in my kernel (and I have no plans to use them), but are used in userspace. Templates are used a fair amount (and will likely increase), but since they add zero runtime overhead (apart from possibly the issue of slightly increased code size by having the same template instantiated in multiple object modules; e.g. kernel modules or shared libraries), there's no reason not to use them. They're certainly better than the alternatives; function-like macros and multiple manually-written algorithm implementations.
I've recently written (in userspace) a basic SQLite ORM and an "RPC" system based on my OS's native IPC messaging system, both making extensive use of templates. It's massively easier to write something like "rpc::NewProcServer<0>(rpc::make_function(&GetAssociation));" (actual code using my RPC system) than it is to manually implement the "server" for each individual procedure an application makes available... I'm thinking about making something along the same lines to make syscalls easier...
Deep inheritance hierarchies aren't necessary, but certainly make things easier when used correctly. In userspace, I have a base class for everything that is represented by a "handle" and subclasses for each specific kind of handle (file, thread, device, memory-mapped region, etc.). I intend to so something similar in kernel-space for the new VFS and device driver systems.
Re: Avoiding code bloat and overhead, particularly in C++
There is a bit more to it than just that. The C++ ABI sometimes changes between versions of the same compiler. You would have to recompile everything. And that means the user-space applications as well, with all the breakage that might introduce, with applications or drivers no longer supported by the supplier and suddenly no longer working after the latest OS update. Linux has gone through some of these phases (not related to C++ ABI but kernel ABI / API), and it never has been a pretty sight.mallard wrote:The C++ "unstable" ABI issue doesn't really matter much for a hobby OS where you're unlikely to have more than one C++ compiler anyway.
Or imagine your OS passing std::string parameters... but the userspace app using a different C++ library implementation (as it would be free to do). That might fail noisy and early, or it might fail when you least expect it...
Then there's the point that most "third party" languages (take Perl, for example) basically expect C linkage to "talk" to other components. Sure, you can wrap around this, but all in all, you'll be much easier off if you keep your OS API plain C.
And that's coming from me, your resident "I like C++ best" guy. I still very much prefer it on the inside, but my outsides are all C these days. Much less headaches all around.
Every good solution is obvious once you've found it.
- bellezzasolo
- Member
- Posts: 110
- Joined: Sun Feb 20, 2011 2:01 pm
Re: Avoiding code bloat and overhead, particularly in C++
The C++ ABI issue, as I understand, can be combatted. There is an ABI that comes as close as possible to a standard for C++ classes. That ABI? Microsoft's Common Object Model.
Obviously, there are some issues to be worked around when developing on Operating System. However, I don't see that as insurmountable. After all, the core of the Windows Executive... an Object Manager. Obviously, the NT kernel came before COM. I don't think there's any basic obstruction to structuring classes as COM objects in a kernel. And of course, you can use COM in C. Anything is possible.
You then could pass COM objects as parameters, however, there would of course be a natural boundary between kernel objects and user mode. You'd need some kind of thunk.
Obviously, there are some issues to be worked around when developing on Operating System. However, I don't see that as insurmountable. After all, the core of the Windows Executive... an Object Manager. Obviously, the NT kernel came before COM. I don't think there's any basic obstruction to structuring classes as COM objects in a kernel. And of course, you can use COM in C. Anything is possible.
You then could pass COM objects as parameters, however, there would of course be a natural boundary between kernel objects and user mode. You'd need some kind of thunk.
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS
https://github.com/ChaiSoft/ChaiOS
Re: Avoiding code bloat and overhead, particularly in C++
My approach to managing the "ABI issue" is to avoid doing GCC updates too frequently (hence the intention to upgrade from the 4-year old 4.8 series directly to 8.0) and, in future, to have multiple versions of libstdc++ and C++ compiled shared libraries for legacy applications. GCC guarantees ABI compatibility within an "x.y" compiler series.
The basic syscall API is implemented in a C-only shared library. If some third-party wants to port another C++ compiler/runtime, how/if they link to existing OS-provided C++ modules is their problem.
In kernel-space, I fully expect all modules to be compiled with the same compiler. If there are ever any third-party kernel modules, they will have to be re-compiled by the vendor whenever the kernel moves to a new compiler. While I have no "philosophical" aversion to third-party closed-source kernel modules, I'm not going to put any effort into supporting them.
In addition, the "RPC" system that I recently created should have no issues communicating between modules using different C++ runtimes (as long as the fundamental types are the same) or even other languages entirely, with a bit of work. Eventually, I intend to update and port an OO IPC system that I created in my University days to fill the "COM"/"CORBA" role; this will have bindings for multiple languages and a fully specified "wire protocol", I don't intend to have anything equivalent to Windows' "in-process" COM servers at this time, although I did consider the possibility when working on the aforementioned University project.
The basic syscall API is implemented in a C-only shared library. If some third-party wants to port another C++ compiler/runtime, how/if they link to existing OS-provided C++ modules is their problem.
In kernel-space, I fully expect all modules to be compiled with the same compiler. If there are ever any third-party kernel modules, they will have to be re-compiled by the vendor whenever the kernel moves to a new compiler. While I have no "philosophical" aversion to third-party closed-source kernel modules, I'm not going to put any effort into supporting them.
In addition, the "RPC" system that I recently created should have no issues communicating between modules using different C++ runtimes (as long as the fundamental types are the same) or even other languages entirely, with a bit of work. Eventually, I intend to update and port an OO IPC system that I created in my University days to fill the "COM"/"CORBA" role; this will have bindings for multiple languages and a fully specified "wire protocol", I don't intend to have anything equivalent to Windows' "in-process" COM servers at this time, although I did consider the possibility when working on the aforementioned University project.
- zesterer
- Member
- Posts: 59
- Joined: Mon Feb 22, 2016 4:40 am
- Libera.chat IRC: zesterer
- Location: United Kingdom
- Contact:
Re: Avoiding code bloat and overhead, particularly in C++
Thanks very much for all of the advice and tips everyone! After a lot of thought (even considering rewriting my kernel in Rust) I've decided to use C++, with the aim of being very strict about where and when I use certain features of the language in order to avoid code bloat (not performance overhead).
Current developing Tupai, a monolithic x86 operating system
http://zesterer.homenet.org/projects.shtml
http://zesterer.homenet.org/projects.shtml
Re: Avoiding code bloat and overhead, particularly in C++
The tendency of C++ to cause code bloat is part of the reason I'm making my own language instead.
The class-oriented programming model means that you can only extend the behavior of a class by using inheritance. Which is silly, because a vast majority of the time, your methods aren't virtual, and thus don't need to be *inside* the class definition at all. It just increases the nesting of the program. Templates are often used to generate less code, but don't actually understand the type system or structure of the language, so it's really no different than the compiler doing a bunch of arbitrary string manipulation. Extremely slow to compile, extremely painful to debug. Exception handling is another huge pain. Unlike Java, functions have to opt out of exceptions rather than into exceptions. There's also a lot of syntax noise in C++, largely due to it's insistence on being sorta backwards compatible with C.
Not going to wrap it in disclaimers. C++ has poorly thought out abstractions toted as genius in jerking circles, and is bad byproduct of being hard to write good code with. Then again, most programming languages are the same, so pick your poison (or as others would say, "pick the right tool for the job").
The class-oriented programming model means that you can only extend the behavior of a class by using inheritance. Which is silly, because a vast majority of the time, your methods aren't virtual, and thus don't need to be *inside* the class definition at all. It just increases the nesting of the program. Templates are often used to generate less code, but don't actually understand the type system or structure of the language, so it's really no different than the compiler doing a bunch of arbitrary string manipulation. Extremely slow to compile, extremely painful to debug. Exception handling is another huge pain. Unlike Java, functions have to opt out of exceptions rather than into exceptions. There's also a lot of syntax noise in C++, largely due to it's insistence on being sorta backwards compatible with C.
Not going to wrap it in disclaimers. C++ has poorly thought out abstractions toted as genius in jerking circles, and is bad byproduct of being hard to write good code with. Then again, most programming languages are the same, so pick your poison (or as others would say, "pick the right tool for the job").