[language design] My working notes on Thelema

Schol-R-LEA · Post by **Schol-R-LEA** » Fri May 20, 2016 5:43 am

embryo2 wrote:
Schol-R-LEA wrote:Massalin states that since asynchronous context switches (e.g., interrupt handlers) cannot capture the pre-empted process's state, they must save all state, while yields (sleep(), IPC, I/O waits) are within the context and so can capture it.
I understand the words but miss the meaning. Why interrupt handler can't capture anything about the process? Just read registers and follow the pointers from the process's root structure.

I think I may have dropped a word or two there, sorry; it should have been something like 'exactly capture pre-empted process's state', that is, the subset of registers the process actually is using at that point and nothing else. The issue isn't capturing the process's state, but rather doing so efficiently, namely, saving only the registers the process is actually using, and especially avoiding saving the floating-point registers for processes that don't use the FPU or similar features. This last part was the biggest issue at the time, as the FPU on the Quamachine (and most others of the time) was a physically separate co-processor chip, and saving it's state (on most systems, it isn't clear if the Qua had special hardware support for it or not) meant transferring from the FPU to CPU and then the main memory, a process that she wrote as being five times slower per register than saving CPU register (though this was also because the FP registers were larger than the CPU registers, of course).

This is why the successive edit - the point in the dissertation about how it is not the interrupted process's registers that matter, as you only need save those ones the interrupt handler itself uses - makes the idea I initially posted unnecessary. While it may be possible to reduce the saved state even further by taking only the intersection of the two, the processing needed to work that out would exceed the savings in almost all cases.

Mind you, the x86 systems have long had special instructions for saving contexts, but they are (or were, I don't know about the current models) rarely used as they ended up slower than the software equivalent. I am short on time right now, but I will check this later - though it would help if someone posted an explanation of the current state of affairs as well.

Schol-R-LEA · Post by **Schol-R-LEA** » Fri May 20, 2016 8:29 am

One of the comments in the kernel white paper that stood out the most to me was about the importance of separating specification from impl ementation, and how hewing too closely to the implicit design from the specification, rather than see it as a contract, was holding many programmers back.

I haven't gotten very far in the dissertation yet, so I don't know what if any role this has explicitly in it, this idea seems central to the authors' ideas. While it is a common principle in software design in general, I have rarely seen it used to such effect as in Synthesis.

This resonated strongly for me, as it is equally central to my own ideas - the idea that whether a data structure's runtime form is a POD, an object, a quaject, or even a computed value should be irrelevant to the developer, so long as it represents the data as needed - dovetails perfectly to what I want myself.

Schol-R-LEA · Post by **Schol-R-LEA** » Fri May 20, 2016 9:40 am

While I am still trying to understand quajects - which are basically an implementation mechanism in which the method pointers and/or data values of an object are replaced by runtime-generated code fragments, and either called directly or copied into a code stream inline - I can already see that they present potential problems when used on modern processors. Current systems almost all use a paging or other memory management system that implements a Modified Harvard architecture, rather than the Von Neumann architecture used in older systems, including Massalin's Quamachine. This was already an issue for me, as using code (at least when represented as s-expressions) and data interchangeably is a common practice in the Lisp world, but most paging systems make pages holding machine code inaccessible and immutable for reasons of efficiency, security, and error detection.

For purely functional data, immutability is not a serious problem, but inaccessibility would be a showstopper. For mutable data structures, most paging systems would make implementing them as quajects out of the question.

It may be possible to write a paging manager for my own operating system that juggles these concerns, but it would probably mean having to separate pages with mutable quajects (and pages with immutable quajects for that matter) from general code pages, provided that the hardware paging system doesn't force code pages to be locked from write access before execution - and I am pretty sure most current ones do. For that matter, I am pretty sure most default to locking read access from everything except the instruction fetch mechanism, as well. Even if that is feasible, it would reduce the locality of quajects relative to other code pages, which was part of the reason for using them in the first place.

More seriously, for any operating system that doesn't expose an API to the paging manager - which is pretty much all of them - quajects and runtime-generated code in general would be impossible to use, at least as far as I can see, which constrains the JIT compiler's ability to optimize code to a crippling degree. If anyone can think of a means of getting around this, I would be grateful for suggestions.

Schol-R-LEA · Post by **Schol-R-LEA** » Fri May 20, 2016 10:06 am

I'll admit that there are some serious oddities in my language design, especially when you include the parts that are, technically, not really part of the core language at all. THis includes the numeric tower, which in my current plans expands upon that of Scheme to a bizarre degree.

First off, the base of the tower, for which all other numeric domains are simultaneously both subspaces and extensions, is an abstract Numeric-Value domain. It represents anything that could be treated as an unitary mathematical value, including not just integers, ratios, reals, and complex numbers, but also quaternions, vectors, matrices, lazily-evaluated polynomials, lazy constants such as pi, tau, and e, and even functions (which are not the same as the procedures created by the `fn` form, but instead define a mapping of a domain subspace to a co-domain subspace, like mathematical functions do) - all of which I am considering including in the standard library. Furthermore, range (and more generally, subspace) types are definable, as are extension domains, and as already stated it is possible for a domain to be both (for cases where a given sub-space has an operation that is specific to it, such as the vector dot product).

I mentioned lazy constants, and I should explain that. Basically, unless you force evaluation of the constant, applying one of the standard numeric forms to a lazy constant causes the form to return a polynomial representing that operation rather than a final value. Since the polynomial domain's laziness is 'contagious', it means all subsequent form applications will continue to return polynomials until the polynomial is evaluated. Why would this be desirable? Three reasons: first, to reduce the risk of evaluation error; second, to make it possible to use runtime polynomial reduction to simplify the value before evaluation, potentially reducing the computation burden; and finally, because it allows the compute form (the constant evaluator) to replace the default fixed value with a computed one when doing so would be useful, making it easier to control the significant precision at runtime if the implementor chooses.

Like with so many of these ideas, whether this is feasible, and worth the extra effort to implement, remains to be seen. I actually don't want to overemphasize mathematics in this, but paradoxically, it seems to me that the best way to do that is actually to provide as rich a set of numeric types and operations as possible, so that the client-programmer doesn't need to spend a lot of time reinventing the wheel.

cxzuk · Post by **cxzuk** » Fri May 20, 2016 10:15 am

I've only just read all your posts and just wanted to reiterate my worries to your progress.

If you only design your programming language in your head or on paper, you may fall into the traps of motivated reasoning (you have a believe of something and will force it to be true with 'research' rather than proof) and you may end up solving problems that just don't exist and worse, missing problems you didn't know existed.

I think this harks back to the BNF opinion. While there are tools to convert BNF, in practise it will not seed any of your actual code (because it doesn't describe enough of what your actually describing).

It becomes a piece of paper with some writing on, a bit of research you cant really execute, analyse and evolve. and ultimately something you'll throw away.

Get away from the research, get something computing and wait until your programming language, in the form of a code example, raises a problem.

good luck

Schol-R-LEA · Post by **Schol-R-LEA** » Fri May 20, 2016 10:26 am

cxzuk wrote:I think this harks back to the BNF opinion. While there are tools to convert BNF, in practise it will not seed any of your actual code (because it doesn't describe enough of what your actually describing).

OK, I am at a loss as to why anyone would think of BNF as something executable. It's a formal tool for describing a language, not something used for implementing it. You use it as a shorthand for explaining how the syntax is structured, and perhaps as a guide for the parser design, but not as part of the design itself. I only included it because it make explaining things easier, not because I am actually going to generate code from it - it is less for immediate use and more for keeping track of things for future reference, and for explaining the design to others not involved in the language design process.

"Get away from the research"? This whole project is research. Period. I don't even expect to actually have a 'product', aside from possibly some published papers; if the final language actually proves useful, great, but that's pretty much incidental to my goals.

As for the coding, I am nowhere near ready to do any. I have a bunch of loose ideas, not a real design. I want to make heads or tails out of them first, and see how they fit together as ideas, before I even consider doing any programming. Language design is one of the few places where Big Design Up Front is not just practical, but pretty much necessary.

The thing is, I have a head full of ideas, most of them only partially formed, and I need to get them written out and try to understand them myself before I can even hope to write any code. Are any of these ideas any good? I have no clue. I don't even know what most of them are yet. I'm still trying to pin down a bunch of clouds, not build a house. I do mean to experiment with them, implement them, see if they work, but I can't do that when they aren't much more than vague notions.

I agree that I should spend more time working on those things that are more accessible, projects where I have something resembling a solid plan of where to go with them, such as finishing the updates on Suntiger and writing the code for loading the XML opcode descriptions into usable data structures, but whenever I try to focus on those things, my mind wanders. It's driving me crazy, which is why I am writing these notes to get the ideas out of my head for a while so I can get back to the coding I can do now.

embryo2 · Post by **embryo2** » Sat May 21, 2016 7:40 am

Schol-R-LEA wrote:The issue isn't capturing the process's state, but rather doing so efficiently, namely, saving only the registers the process is actually using

But how important is the issue? If not cached registers can be restored in at least 1/10 000 fraction of process's time slice. And it's not important how many registers are restored. If cached the fraction will be essentially lesser. So, what is more important - to cache process's data properly or to pay attention to the number of registers involved?

Schol-R-LEA wrote:but most paging systems make pages holding machine code inaccessible and immutable for reasons of efficiency, security, and error detection.

In modern OS it's not obligatory to copy solutions from ancient OSes.

For applications, for example, it is possible to provide mutable pages, while for kernel - immutable. Also it is possible to live without paging at all. The main question is about what the paging is required for in a modern OS? And if the OS employs runtime compilation then it is required to weight the benefits of possible solutions with or without paging. For a quick start, obviously, it's easier to do without paging at all. Next it is important to prevent applications to spoil each others memory because of bugs, but if the OS is going to rely heavily on a flexible compiler and managed code then it seems to me the space for bugs will shrunk quickly with the progress of the compiler development. Without pointer arithmetic it's mostly about automatic memory management which can spoil memory in case of buggy garbage collector, for example. So, in the end some polished compiler and good garbage collector can make the protection means (like paging) obsolete.

As for security it's relatively simple to prevent applications from reading each others data in case of a managed language - just set all structure elements to the default values and prevent interprocess communication by reference. So, it leaves the paging with bug hunting role only. But may be it's possible to introduce something like debug mode when performance and runtime optimizations are not so important, while in the normal/optimized mode it could be possible to achieve highest performance without any paging at all (but with a little bigger probability of errors).

Schol-R-LEA · Post by **Schol-R-LEA** » Sat May 21, 2016 6:17 pm

embryo2 wrote:
Schol-R-LEA wrote:but most paging systems make pages holding machine code inaccessible and immutable for reasons of efficiency, security, and error detection.
In modern OS it's not obligatory to copy solutions from ancient OSes.

Correct me if I am wrong, but I am fairly certain that this is a hardware issue, at least in long mode. It is my understanding that you can't run LM without paging, and that marking a code page as executable also locks it. I am fairly certain the same holds true for ARM processors. Have I misunderstood this?

(Note the assumption that I am probably incorrect - actually, it is the absolute assumption that no one is ever completely correct, the axiomatic position that total understanding is unattainable, which is in fact the essence of both skepticism and the scientific method. I don't want to claim I am right; I want to try to be a bit less wrong, and so I ask for input. Is that such a bad thing? I don't even hew to scientific skepticism as a philosophical position - I am a Discordian, after all - but I can't argue with it as a practical means if reducing errors. Given that, why is it that those who claim that they do see it as more than just a pragmatic tool then so flagrantly violate the principle? sigh Sorry for the rant, it really isn't aimed at you, Embryo, it's just that, well, there's a lot of people, here and elsewhere - not to name anyone specific, though I am being quite transparent enough that such is unnecessary - who seem to see admitting that they don't know something is an unforgivable heresy.)

embryo2 · Post by **embryo2** » Sun May 22, 2016 3:21 am

Schol-R-LEA wrote:Correct me if I am wrong, but I am fairly certain that this is a hardware issue, at least in long mode. It is my understanding that you can't run LM without paging, and that marking a code page as executable also locks it. I am fairly certain the same holds true for ARM processors. Have I misunderstood this?

Are you talking about Lisp Machine? Or? If it's the former then paging is not required, as is the case for every virtual machine. But may be I have missed something and meaning of LM is different.

Octocontrabass · Post by **Octocontrabass** » Sun May 22, 2016 9:04 am

Schol-R-LEA wrote:Correct me if I am wrong, but I am fairly certain that this is a hardware issue, at least in long mode. It is my understanding that you can't run LM without paging, and that marking a code page as executable also locks it. I am fairly certain the same holds true for ARM processors. Have I misunderstood this?

I don't know about ARM, but in x86, marking a page as executable has no effect on whether the page may be read or written. Operating systems may choose to deny write access to executable pages, but this isn't a hardware requirement.

Schol-R-LEA · Post by **Schol-R-LEA** » Sun May 22, 2016 9:22 am

embryo2 wrote:Are you talking about Lisp Machine?

Long Mode (on the x86, since most other CPU architectures aren't so absurd as to have something like Real Mode).

Schol-R-LEA · Post by **Schol-R-LEA** » Sun May 22, 2016 9:25 am

Octocontrabass wrote:
Schol-R-LEA wrote:Correct me if I am wrong, but I am fairly certain that this is a hardware issue, at least in long mode. It is my understanding that you can't run LM without paging, and that marking a code page as executable also locks it. I am fairly certain the same holds true for ARM processors. Have I misunderstood this?
I don't know about ARM, but in x86, marking a page as executable has no effect on whether the page may be read or written. Operating systems may choose to deny write access to executable pages, but this isn't a hardware requirement.

Ah, that's exactly what I needed to know. I'm not sure where I got the impression locking executable pages was mandated by hardware, and I am quite pleased to hear that I was wrong about that.

It still leaves me stuck for targetting most other operating systems, but at least for the JIT compiler and runtime code generator on Kether, this will help a great deal.

Schol-R-LEA · Post by **Schol-R-LEA** » Sun May 22, 2016 10:12 am

As for paging being 'obsolete', I would need to here more about your justifications for this claim, but at this time I have to disagree with you, as paging serves several purposes above and beyond either security or memory swapping. Paging on the x86 also serves to simplify the sharing of memory (and through that, other resources) between processes and even between paravirtualized operating systems, and if I am not mistaken can be used to make the L2 caching more efficient (since, IIRC, the content-addressable memory lookup works better when it can restrict the possible cache mappings to the pages currently in use, and an OS can, in theory anyway, reschedule to a different process whose data is already in cache when a cache miss occurs, rather than waiting for the cache to refill the memory for the process that experienced the miss... er, I think the cache can both read and be read at the same time, anyway, as long as the consumer isn't reading from the same memory as the producer, maybe I'm just confused).

As for garbage collection, I actually see paging as a benefit for that, not a hindrance. Pages serve as a natural quantum for hierarchical meta-collectors, as well as allowing fairly fine-grained copy collection - I am planning on experimenting with a compacting copier that would run when a dirty page is to be swapped out, copy the live data to an empty page, remap the swap space to the new page (if there is any live data still) and swap it, then free the original page. Collecting per page, or per a group of pages, also makes a mixed-algorithm approach more feasible - you could have different collection methods for different types of data, for example a set of pages exclusively for elements of the same size and type, or use an algorithm that is specific to the application such as the end-off algorithm for logic programming data - and simplifies adaptive generational collecting (because you do the generation marking per page rather than per element). It even would make it feasible to have some parts of memory garbage collected and not other parts of memory while still maintaining a system-wide memory management policy (the memory management would still be automatic, but other techniques such as reference counting or automatic references could be employed depending on the usage).

embryo2 · Post by **embryo2** » Mon May 23, 2016 5:08 am

Schol-R-LEA wrote:Paging on the x86 also serves to simplify the sharing of memory (and through that, other resources) between processes and even between paravirtualized operating systems

What can be more simplified than direct access to a memory segment?

Schol-R-LEA wrote:if I am not mistaken can be used to make the L2 caching more efficient (since, IIRC, the content-addressable memory lookup works better when it can restrict the possible cache mappings to the pages currently in use, and an OS can, in theory anyway, reschedule to a different process whose data is already in cache when a cache miss occurs, rather than waiting for the cache to refill the memory for the process that experienced the miss... er, I think the cache can both read and be read at the same time, anyway, as long as the consumer isn't reading from the same memory as the producer, maybe I'm just confused).

Currently x86 paging caches information only for x86 paging (like in TLBs). So if we get rid of it then there will be nothing missing in our caches. But may be there are additional mechanisms employed by the cache related hardware. I haven't studied deeply the relations between caching of code/data and paging. However, would we rely on such mechanisms that change too often then it will be the best distraction from more important goals. I suppose it's much better to concentrate on algorithms like JIT or garbage collection instead of hunting negligible advantages of transient hardware related hacks.

Schol-R-LEA wrote:As for garbage collection, I actually see paging as a benefit for that, not a hindrance. Pages serve as a natural quantum for hierarchical meta-collectors, as well as allowing fairly fine-grained copy collection - I am planning on experimenting with a compacting copier that would run when a dirty page is to be swapped out, copy the live data to an empty page, remap the swap space to the new page (if there is any live data still) and swap it, then free the original page.

Well, I see here a memory copy operation implemented because "there's paging and it should be used somehow". In the end we have the same amount of memory occupied by the same data but marked as a different page. And what is the advantage here? It's simpler just not to copy it.

Schol-R-LEA wrote:Collecting per page, or per a group of pages, also makes a mixed-algorithm approach more feasible - you could have different collection methods for different types of data, for example a set of pages exclusively for elements of the same size and type

Why we can't do the same without paging?

Schol-R-LEA wrote:It even would make it feasible to have some parts of memory garbage collected and not other parts of memory while still maintaining a system-wide memory management policy (the memory management would still be automatic, but other techniques such as reference counting or automatic references could be employed depending on the usage).

Why we can't do the same without paging?

Every algorithm can be implemented using basic instruction set that doesn't include paging instructions. So, it's better to point out the advantages you can see if paging is employed and talk about why such advantage can not be implemented in a different way. And mostly you'll see there are no advantages related to the paging (except bug hunting in debug mode, which in fact is kind of protection and so the idea of paging is irrelevant here again).

Schol-R-LEA · Post by **Schol-R-LEA** » Mon May 23, 2016 8:23 am

embryo2 wrote:
Schol-R-LEA wrote:Paging on the x86 also serves to simplify the sharing of memory (and through that, other resources) between processes and even between paravirtualized operating systems
What can be more simplified than direct access to a memory segment?

Sorry, perhaps 'simplify' was not the best wording I could have used; I was not referring to the simplicity of the software (more on that momentarily) but the to complexity of the computations the L2 cache system would have to make. My understanding of this matter may be incomplete, however. To the best of my knowledge, the OS developer has only limited options for optimizing the L1, L2 and (when present) L3 caches on the x86 chips, mostly regarding flushing a cache or disabling a cache entirely, but the cache hardware itself will be more effective at controlling it's own mapping of segments of memory to cache cells when it can limit the set of memory it needs to map. Page maps, as I understand it, provide some of that information.

If anyone can correct me, or provide additional information in general about this, I would appreciate it.

Furthermore, using unpaged memory is not simpler from the perspective of the operating system. Most 'flat' memory modes are really only flat from the perspective of a userland application; any multitasking OS still needs to parse the memory up into multiple process spaces. The paging hardware makes this easier, by allowing non-contiguous memory regions to be mapped to a contiguous virtual space (important when memory is being allocated at runtime), among other things. It also reduces the computational burden for the OS itself, again by automatically restricting the space of memory that the system needs to be concerned with at a given moment. Yes, there are costs to using it, and most of the results could be achieved in other ways, but I am not certain if any of those ways are, in fact, simpler from the perspective of the OS developer, nor am I convinced that simplicity for the operating system developers is really a wise goal. Providing simplicity for the users and the application developers often requires considerable complexity on the part of the OS implementation.

Again, any corroborations, refutations, or clarifications on these points would be useful here.

In any case, it is my (admittedly imperfect) understanding that you cannot enter x86-64 long mode without enabling paging, and I have not yet found anything indicating that it can be disabled for applications once in long mode. Similarly, I have the impression that at least some models of the ARM and other RISC architectures require paging in order to execute code in user mode - the OS, of course, does not need to actually implement meaningful page loading and unloading handlers, but the paging will be in place no matter what AFAICT. I haven't really delved into the matter, however, so I will try to read up on it.

Finally, for my OS design, the details of paging mechanisms, especially those of specific processor models, are not an inherent part of the design; I mean to leave most of that to the hardware-specific code generation in the JIT compiler and the system-level code generator, and separate the rest to hardware-specific modules in the OS itself. Most of the OS itself will be unaware of the specific garbage-collection methods in use at an given time, though that could interrogate the memory manager if the module needs to know something, and can request a GC cycle on its current memory usage (though whether that would be done, or is even applicable to the particular memory usage pattern, would be the call of the memory manager itself).

embryo2 wrote:
Schol-R-LEA wrote:if I am not mistaken can be used to make the L2 caching more efficient (since, IIRC, the content-addressable memory lookup works better when it can restrict the possible cache mappings to the pages currently in use, and an OS can, in theory anyway, reschedule to a different process whose data is already in cache when a cache miss occurs, rather than waiting for the cache to refill the memory for the process that experienced the miss... er, I think the cache can both read and be read at the same time, anyway, as long as the consumer isn't reading from the same memory as the producer, maybe I'm just confused).
Currently x86 paging caches information only for x86 paging (like in TLBs). So if we get rid of it then there will be nothing missing in our caches. But may be there are additional mechanisms employed by the cache related hardware. I haven't studied deeply the relations between caching of code/data and paging.

I will admit that I have not done so either, however you seem to have misunderstood what I am talking about. The L2 cache is a content-addressable memory hardware unit on the processor die that prefetches the memory for the L1 cache, which in turn prefetches memory for access by the CPU. It is a fairly complex subsystem that is used in some form by nearly every current generation CPU regardless of architecture, and uses content tagging to map a specific piece of code or data in memory to the copy of it in the cache. It is mostly automatic, but the paging system can (AFAIK) give it additional information about the current state of memory usage, to help it make cache and flush decisions.

embryo2 wrote:However, would we rely on such mechanisms that change too often then it will be the best distraction from more important goals.

I gather that you intend to write a single high-level garbage collector that would be used unchanged on all hardware, correct? While this is fine for your overall reference design, I don't think you would find it feasible in practice. Real-world systems need to have hardware-specific implementations covering multiple configurations, at least as modules that can be substituted for the general-purpose version. Given that I am borrowing heavily from Synthesis, much of the hardware optimization would be handled programmatically, but I would still need to have templates and specializers for the specific hardware in place beforehand to give the system the information it would need to optimize the system.

embryo2 wrote:I suppose it's much better to concentrate on algorithms like JIT or garbage collection instead of hunting negligible advantages of transient hardware related hacks.

Fine, except that garbage collection itself is difficult to implement efficiently without applying a variety of techniques, some of which are indeed going to be 'transient hardware related hacks'. You seem to be seeing virtual memory as a complication, but I do not agree; using a hardware-unaware garbage collection algorithm may be acceptable for a single application using a conservative library collector (e.g., the popular Boehm collector) in a language that does not normally apply GC, but I can't see any way to get adequate performance for system-wide GC without applying every hardware and software hack you can. Paging can actually make this easier, IMAO, as it provides a hardware tool for doing something you will probably have to do anyway (that is, partition the memory into several memory arenas, with different specializations, generation aging criteria, and collection algorithms) in order to provide acceptable performance.

embryo2 wrote:
Schol-R-LEA wrote:As for garbage collection, I actually see paging as a benefit for that, not a hindrance. Pages serve as a natural quantum for hierarchical meta-collectors, as well as allowing fairly fine-grained copy collection - I am planning on experimenting with a compacting copier that would run when a dirty page is to be swapped out, copy the live data to an empty page, remap the swap space to the new page (if there is any live data still) and swap it, then free the original page.
Well, I see here a memory copy operation implemented because "there's paging and it should be used somehow". In the end we have the same amount of memory occupied by the same data but marked as a different page. And what is the advantage here? It's simpler just not to copy it.

No, it is copying because that is how copying collectors work - 'garbage collection' is something of a misnomer here, as it is the live data that is actually collected by a copying collector, with the memory containing any remaining dead data then being freed in bulk. The algorithm works by copying the live data to a reserved area of memory in order to both eliminate garbage and compact the heap; when the copying is done, the reserved area is then marked as the active area, and the previous active area is marked as freed. My idea is simply to use paging to trigger the garbage collector, and using pages to partition the memory for faster collection cycles.

Of course, copy collection is only one family of algorithms for GC, and has performance characteristics that lend it to certain uses and not to others. In my planned design, not all memory arenas need use copy collection, but since the 4K pages happen to be a convenient size for the individual arena blocks - just large enough to be useful while still small enough that collecting one would be brief regardless of algorithm, reducing immediate latency - it is simply a useful tool for the hardware-specific part of the memory manager.

The same applies to using page misses to trigger collection - the memory manager has to track when memory is being exhausted within a given arena anyway, and the pager happens to be a convenient mechanism for detecting that. In some cases, this combination may even allow a memory arena to avoid paging a given block out entirely - if the page has no live data, then it can simply be marked as freed, and for arenas using a non-copying algorithm such as mark and sweep, it may be able to clear enough heap to let the memory allocation proceed without actually swapping the page. It is 'just' an optimization, but a practical memory manager probably would require such optimizations to perform adequately, and the OS should be able to apply them when they are available.

True, for a portable system, the reference design of an OS should not rely on any specific mechanism, but it should surely allow such a mechanism to be used for a particular implementation. At some point, any software implementation has to apply the mechanisms of the hardware it is running on, and regardless of whether the hardware-specific implementation is in the OS code, in a compiler or other code generator (regardless of when it is running, so long as it is producing the final executable image), or in a pseudo-machine interpreting a byte code, it should be able to use the information it has regarding the capabilities and state of the system to use the hardware efficiently. Ignoring that reality is almost certain to lead to an unusable system.

embryo2 wrote:
Schol-R-LEA wrote:Collecting per page, or per a group of pages, also makes a mixed-algorithm approach more feasible - you could have different collection methods for different types of data, for example a set of pages exclusively for elements of the same size and type
Why we can't do the same without paging?
Schol-R-LEA wrote:It even would make it feasible to have some parts of memory garbage collected and not other parts of memory while still maintaining a system-wide memory management policy (the memory management would still be automatic, but other techniques such as reference counting or automatic references could be employed depending on the usage).
Why we can't do the same without paging?

You can, of course, just not as efficiently or securely (as a hardware mechanism, the paging can, at least in principle, both run faster than the software equivalents, and block uncontrolled access to pages from outside of the processes those pages are assigned to when running userland processes - and unlike segmentation, paging is fine-grained enough and automatic enough to be practical for my intended design). If the paging hardware is there (and you may need to use it anyway), why eschew it if it makes the process simpler? It is automating things I intend to do anyway, so it would make little sense not to use it for it's intended purpose.

Again, you seem to be thinking in terms of writing a single, wholly abstract operating system that does not address the specific hardware in any meaningful way. Doing this would be to ignore a fundamental aspect of OS design, even for a nominally portable OS. It makes far more sense to see the majority of the abstract OS as a reference design to be specialized later, or at least as a set of templates for the OS services. Any working system would do well to provide an API of hooks which can be used to apply hardware-specific code when it is available.

To repeat one more time: if anyone can give me reasons why I am mistaken about any of this, or provide any additional illumination fnord on these matters, it would be appreciated.

OSDev.org

[language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema

Re: [language design] My working notes on Thelema