Page 1 of 1

Program vs. data in memory

Posted: Sat Aug 24, 2019 1:24 am
by Seahorse
I decided to read the book "Programming From The Ground Up" and it talks about program instructions and data both living in memory. Even though I get this, I'm still trying to understand the technological implications of that. Is it only saying that you can access both in the same way?

Re: Program vs. data in memory

Posted: Sat Aug 24, 2019 3:57 am
by bzt
Seahorse wrote:I decided to read the book "Programming From The Ground Up" and it talks about program instructions and data both living in memory. Even though I get this, I'm still trying to understand the technological implications of that. Is it only saying that you can access both in the same way?
Yes, that's one of the Neumann principles (in contrast to the Harvard architecture where instructions were stored separately, not in the same memory as data).

Neumann's original idea was that this architecture would allow self-teaching computers, but we know now that it also allows viruses and malicious buffer overflow attacks. So most machines has some sort of protection that limits only specified parts of data to be interpreted as instructions (segmentation, paging access etc.). To learn more, read The Computer and the Brain from Neumann.

Cheers,
bzt

Re: Program vs. data in memory

Posted: Sat Aug 24, 2019 2:36 pm
by Schol-R-LEA
I am going to recommend something unusual for this site (though not for me, as others here will tell you), and recommend a book and series of lectures well removed from the nuts and bolts aspects of systems programming, one famous (or infamous) for its ivory-tower approach to programming: Structure and Interpretation of Computer Programs by Hal Abelson and Gerry Sussman, and the companion video lecture series (which are also available on YouTube).

OK, I can already hear the puzzled (and/or angry) reactions to this, but hear me out.

Despite its title, the book is actually an introductory textbook on Computer Science, one which was used from 1982 to 2008 at MIT for their first two semesters of Comp Sci (the videos are from 1986, and cover the first edition of the book, but they are still mostly relevant to the second edition, which the one which is available for free from the MIT website).

The relevant part here is that it uses Scheme (a language in the Lisp family) for all its code examples, and Scheme, like all Lisps, focuses heavily on code generation, code transformation (though they don't cover syntactic macros, which is the usual system in Lisp languages for code transformation - not to be confused with the textual macros of languages such as C, BTW) and higher-order functions. Later chapters of the book (and later lectures in the series) discuss not only how to interpret Scheme code in Scheme and how to compile it to a toy machine instruction set (of a sort), but also how to write domain-specific languages in Scheme (including a sort of rump Prolog interpreter and a picture manipulation library/embedded mini-language).

This is in large part because Lisps are homoiconic - that is to say, the Lisp code as interpreted/pre-compiled is itself a first-class Lisp data structure (specifically, a heterogeneous singly-linked list of atoms and other lists). That is to say, Lisp programs are both code and data at the same time, and (in the interpreted form) can be manipulated as either one without specialized introspection tools. Furhtermore, even in compiled Lisp, executable functions are also first-class data structures, meaning that a function can take another function as an argument, or return a function as
its result, without resorting to explicit pointer manipulation or other work-arounds.

In other words, the book and lectures are all about the duality of code and data.

It's a pretty amazing book, really, give that it starts from almost nothing and runs non-stop to cover some of the most advanced topics in the field. However, most people find it a tough slog to read through, and many a college lecturer failed to give it the energy and life it needs in order to make it work as a course textbook, which is part of why it had such a terrible reputation in some circles.

Now, I don't want to mislead you, they discuss the topic in a way far, far removed from the manner in which code and data are stored in a von Neumann architecture computer at the hardware level, and compiled Lisp code is an executable binary like any other. But going through at least a few of the lecture videos - say, the first four or five days of the lectures, or 10 videos of about an hour apiece - and the first two massive chapters of the text should give you a helpful, if somewhat unconventional, grasp of the concept.

If nothing else, it shows a side of and approach to software development which most programmers are almost entirely unfamiliar with.

For something a bit closer to the topic, you might want to read Alexia Massalin and Carlton Pu's work on the Synthesis operating system, which uses an abstraction called 'quajects' to allow code to be generated at run-time from a series of templates by the OS in a secure manner.

Or just, you know, read up on how executable binaries are loaded into a process's memory, and how the memory pages are set to executable mode. For most systems and uses, this is about the extent of how it is used in modern computer systems, though you'll occasionally run into some piece of dynamically-generated code in the form of a thunk or a trampoline.

Some reading up on more conventional types of compilers and assemblers might help, too, as they naturally have to work with object-code binaries as their primary output data. Conversely, linkers have to work with said Object Files as their primary input, and output an executable file (or even an executable binary, for a run-time or dynamic/shared-library linker). I recommend David Salomon's Assemblers and Loaders (available free from the website) and Grune, et al.'s Modern Compiler Implementation for the former, and John Levine's Linkers and Loaders (his website keeps an incomplete beta version of it available for free) for the latter.

For that matter, just a good textbook on assembly programming, such as Assembly Language Step by Step, would go far in explicating the matter. Though if you can find a good one on, say, MIPS or ARM assembly, you might find it easier to grasp than the Lovecraftian horror that is the x86 architecture (MIPS Assembly Programming and See MIPS Run would be my recommendations there, though Rapsberry Pi Assembly for Raspbian isn't bad either, and finding live hardware for practicing with it is obviously neither difficult or expensive).

Re: Program vs. data in memory

Posted: Tue Aug 27, 2019 5:07 pm
by linguofreak
Seahorse wrote:I decided to read the book "Programming From The Ground Up" and it talks about program instructions and data both living in memory. Even though I get this, I'm still trying to understand the technological implications of that. Is it only saying that you can access both in the same way?
That, and everything that that implies.

On the one hand, it means that code that loads a file from disk to a specified address in memory doesn't have to care if the file is data or executable, it uses the same instructions either way. On the other hand, it means that instruction fetches and data accesses have to share the bus, since any part of memory can hold both code and data. It also heightens the security implications of certain classes of bugs, as it makes it easier for an attacker to do things like passing more data to a program than it expects to receive from the user, causing code to be overwritten with arbitrary data of the attacker's choosing, which can be code to do something nasty (the actual attacks that are carried out tend to be a bit more nuanced than that, but that should give you the general picture).

Because of the flexibility advantages and security and memory bandwidth disadvantages to the von Neumann architecture, computers these days tend to be von Neumann at the level of the CPU instruction set and motherboard layout, but Harvard at the level of memory management (allowing OSes to impose a Harvard architecture on programs even though the computer itself is von Neumann) and caches (so that commonly used code and data are stored separately from each other and don't have to share cache bandwidth, even thought less commonly used code and data are stored together and share bandwidth to RAM).

Re: Program vs. data in memory

Posted: Wed Aug 28, 2019 8:46 pm
by Seahorse
linguofreak wrote:
Seahorse wrote:I decided to read the book "Programming From The Ground Up" and it talks about program instructions and data both living in memory. Even though I get this, I'm still trying to understand the technological implications of that. Is it only saying that you can access both in the same way?
That, and everything that that implies.

On the one hand, it means that code that loads a file from disk to a specified address in memory doesn't have to care if the file is data or executable, it uses the same instructions either way. On the other hand, it means that instruction fetches and data accesses have to share the bus, since any part of memory can hold both code and data. It also heightens the security implications of certain classes of bugs, as it makes it easier for an attacker to do things like passing more data to a program than it expects to receive from the user, causing code to be overwritten with arbitrary data of the attacker's choosing, which can be code to do something nasty (the actual attacks that are carried out tend to be a bit more nuanced than that, but that should give you the general picture).

Because of the flexibility advantages and security and memory bandwidth disadvantages to the von Neumann architecture, computers these days tend to be von Neumann at the level of the CPU instruction set and motherboard layout, but Harvard at the level of memory management (allowing OSes to impose a Harvard architecture on programs even though the computer itself is von Neumann) and caches (so that commonly used code and data are stored separately from each other and don't have to share cache bandwidth, even thought less commonly used code and data are stored together and share bandwidth to RAM).
Is the OS-imposed Harvard-style architecture similar to a virtual architecture then? Comparing analogy with virtual memory and that sort of stuff. That's what it sounds like.

Re: Program vs. data in memory

Posted: Wed Aug 28, 2019 10:42 pm
by linguofreak
Seahorse wrote:
Is the OS-imposed Harvard-style architecture similar to a virtual architecture then? Comparing analogy with virtual memory and that sort of stuff. That's what it sounds like.
Basically, the hardware that provides virtual memory on modern CPUs also provides, as part of that, a way for the OS to specify whether the contents of each page can be written, read, and executed.