Opinions On A New Programming Language
Re: Opinions On A New Programming Language
That looks very much like the old style of C declarations.
Re: Opinions On A New Programming Language
...~ wrote:That looks very much like the old style of C declarations.
Code: Select all
// my language
func main(argc: int, argv: **char) -> int {
...
}
// old style of c decl
main(argc, argv)
int argc;
char **argv;
{
}
Code: Select all
const char *const s = "hello world";
Code: Select all
const s: *const char = "hello world";
Code: Select all
const s: int16ptr -> const char = "hello world"
Code: Select all
const s = "hello world";
Code: Select all
var object: object;
Code: Select all
func fclose(FILE) {
...
}
Re: Opinions On A New Programming Language
You could make that your compiler generates debug/information files that store all user identifiers, data types, etc. It could work as a tool to list all functions, variables, data types, etc., in a program.
You can also generate an information file that shows how each custom data type was resolved from the top level until the point where a construct made only of basic data types is built so that knowing what a data type is really made of can be made trivial by the compiler.
You can also generate an information file that shows how each custom data type was resolved from the top level until the point where a construct made only of basic data types is built so that knowing what a data type is really made of can be made trivial by the compiler.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Opinions On A New Programming Language
~ wrote:It's a matter of making that assembly language easier to understand, simpler, more portable, like NASM syntax over plain Intel syntax over AT&T syntax.
~ wrote:I've managed the low level problem at least for x86 by making it possible to use the very same NASM assembly code portably to generate 16, 32 or 64-bit code with a header file called x86 Portable, which adds things like wideword, wideax, widebx, widecx, widedx, widesi, widedi, widesp, widebp, a16/a32/awide, o16/o32/owide.
~ wrote:I intend to use x86 Portable always for the code generated by my compiler.
~ wrote:In that way I can generate assembly code with portability at the same level to that of C
~ wrote:x86 Portable is just a NASM include header file that adds automatically-sized instructions and registers for word/dword/qword/wideword according to the target x86 CPU,
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Re: Opinions On A New Programming Language
I already plan on doing that in the same way that the '-g' switch of the C/C++ compiler generates debugging information.~ wrote:You could make that your compiler generates debug/information files that store all user identifiers, data types, etc. It could work as a tool to list all functions, variables, data types, etc., in a program.
You can also generate an information file that shows how each custom data type was resolved from the top level until the point where a construct made only of basic data types is built so that knowing what a data type is really made of can be made trivial by the compiler.
Re: Opinions On A New Programming Language
About using automatically-sized variables for increased portability, if you think about it, it would be better to use something like intwide_t/uintwide_t, than having to decide between standard longs and ints.
With automatic size selection, you bring automatic register/word width portability to any CPU architecture, not just x86, and with that your code would be much cleaner.
I really don't know why types like ptrwide_t/uintwide_t/intwide_t were never added to the C/C++ standard and assembly language. Code today would be incredibly more portable now.
Even when automatic word width selection is a very important programming concept used in x86, it doesn't seem to be integrated anywhere else, not even in the most recent .NET languages, Java, or anywhere else. I will add those types to my compilers because I know that they alone can simplify the whole programming facilities of any language in existence.
With automatic size selection, you bring automatic register/word width portability to any CPU architecture, not just x86, and with that your code would be much cleaner.
I really don't know why types like ptrwide_t/uintwide_t/intwide_t were never added to the C/C++ standard and assembly language. Code today would be incredibly more portable now.
Even when automatic word width selection is a very important programming concept used in x86, it doesn't seem to be integrated anywhere else, not even in the most recent .NET languages, Java, or anywhere else. I will add those types to my compilers because I know that they alone can simplify the whole programming facilities of any language in existence.
-
- Member
- Posts: 5486
- Joined: Mon Mar 25, 2013 7:01 pm
Re: Opinions On A New Programming Language
C already has an automatically-sized type optimized for the target CPU's register width: int.~ wrote:About using automatically-sized variables for increased portability, if you think about it, it would be better to use something like intwide_t/uintwide_t, than having to decide between standard longs and ints.
C already has this too, though only from C99 onwards. The signed form is intptr_t, and the unsigned form is uintptr_t.~ wrote:ptrwide_t
Good news, you don't need to add any types to your compiler to make C portable.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Opinions On A New Programming Language
Sigh I really ought to leave this alone, just leave your imbecilic ranting buried under the Killfile, but noooooo...
char - a integer value of at least seven bits, capable of holding an ASCII character, defined by the platform and the specific compiler.
short int, or short - an integer value no smaller than the size of a char, and no larger than the size of an int.
int - an integer value no smaller than a short, but no larger than a long.
long int or long - an integer value no smaller than an int.
float - a floating-point type, preferably matching one of the native FPU formats, not larger than a double
double - a floating-point type, preferably matching one of the native FPU formats, not smaller than a float
In all cases, the signed versions would use (one of) the native sign method(s) of the CPU's arithmetic operations, though most C code assumes 2's-complement wherever it matters, because very, very few CPUs after the mid-1960s used anything else.
This ambiguity was deliberate, because C doesn't just run on x86 systems, or even just on 32-bit and 64-bit systems. The language standard leave a lot of wiggle room, in part to avoid having to deprecate a lot of existing code, but mostly because they can't predict the hardware it would be used on.
However, this came at the price of exactitude in the language - you could have two compilers, on different systems or even on the same system, which used different bit widths. The fixed-size integer types - int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t - were introduced to bring back the predictability, but at the cost of - you guessed it - the very flexibility you are looking for. The flexibility, I might remind you, which is still part of the language with the native-sized types.
Now, on the 32-bit x86, and most other 32-bit systems, compilers usually mapped these as char - byte, short - two bytes, int - four bytes, long - usually 4 bytes, but occasionally 6 or 8 bytes (in either case, some special data juggling would be needed to use a bit width greater than the largest native value). However, this was just a convention, not a part of the language definition, or even a necessary method on the x86.
When 64-bit systems started to appear, this led to an obvious compatibility problem for code that mistakenly assumed a 32-bit long, so the long long int type was added, and (IIRC) defined as 'no shorter than long and matching the largest native type in the system'. One wonders what will happen if 128-bit registers, addresses, and data paths ever become widely used...
For addresses - or rather, pointers - things get a bit trickier, because some CPU ISAs have multiple pointer sizes, either for different memory modes (as in the x86) or for different kinds of addressing (I can't think of any examples OTTOMH, but I understand it was something known to come up in things like the Burroughs mainframes). The default assumption is that addresses are all of a fixed size for a given CPU mode, that the address space is flat, and that the CPU won't change modes or mix memory models at runtime. This led to some complications for the MS-DOS C compilers, which meant that you generally needed to select a memory model at compile time, but again, the standard allowed for this. Hence the 'tiny' (single code/data/stack segment), 'small' (single code and data segment, separate stack segment), 'standard' (single code segment, separate data and stack segments), 'big' (multiple code segments, one each for data and stack), and 'huge' (anything goes, but all pointers are FAR, and watch the segment boundaries) models used by most x86 compilers of the era.
I am, as I said, working mostly from memory, so I probably have some of this wrong or out of date. Comments and corrections welcome.
(Actually, it is shocking, or at least unfortunate, but the truth is that tagged memory has never been common, mainly for reasons only tangentially related to the technique itself. I would love to see a tagged memory architecture go into widespread use, but I don't expect it to ever happen.)
Write this on the blackboard 1000, ~: THERE ARE NO DATA TYPES IN x86 ASSEMBLY.
None. They don't exist. There are data sizes, but they only exist to tell the CPU how many bytes to fetch, and, more importantly, they are part of the opcode. The assembler syntax may show them as modifiers, rather than giving them separate mnemonics, but in the actual machine code, they are part of the instruction. Changing the value after it is assembled would require changing the executable image - either by putting patch tags in for the linker to reset them, or by using self-modifying code at runtime.
It can be done, but it would require either changing the object format to support it, or adding extra code to perform the runtime modification - not too hard in real mode, though it would be difficult to do it safely and consistently, but in the protected modes or long mode, it would require a system call requesting the kernel remove the 'executable, read-only' flags on the page in question, making the modification, setting the flags back, before returning to the user application.
Better, I would say, to do it all in a high-level language - and C doesn't count as one, really - which can abstract the numeric types entirely, either by default or by specific syntax, and leave all this work to the compiler.
unnamed max supported system integers -> integer bignums -> rational numbers -> fixed-point bignums -> max system supported floating-point numbers -> floating-point bignums -> complex numbers
Not all have all of these, but most have at least system longs, bignum ints, and max floats. Some may have a Binary Coded Decimal Fixed-Point type as well, or (as Python does) have libraries to support them. When they do support rationals, fixednums, fixed BCDs, or complex numbers, they will generally either size the components dynamically, or just use bigints (for the numerator and denominator of rationals, or for the underlying integer value for fixnums) or a BCD equivalent (for big BCDs), or with any of the above, for the real imaginary parts of complex numbers.
This doesn't apply only to implicitly-typed languages, either. Haskell, Erlang, Go, and (I think) Rust all have some ability to work in this way, though they aren't the default behavior in any of them and they all have their quirks (for example, while Haskell does require typing for all data, it allows you to define 'typeclasses' for groups of similar types - something almost, but not quite, completely unlike an OOP class hierarchy, but serving basically the same purpose; the Haskell compiler or interpreter applies type inference to determine what the actual type of a datum is, and use it to hold the value.
Even C# has some ability to do this now, with the 'var' types, though I have no idea whether it does any kind of type resolution either at compile time or run time. I assume so, because that's sort of an important feature of a well-designed OOP language with sensible support for polymorphism (not all do - I am looking at you, C++), but I can't be bothered to check.
Mind you, bignums are usually orders of magnitude slower than the system numbers, so you generally can coerce a value to one size and representation when you need to, to some extent.
Each language has its own way of handling (or not handling) ambiguities and loss of exactitude, but for the most part, they do a decent job of it for most purposes. and can force an inexact representation of an exact value, or vice versa, at the possible cost of precision (e.g, when going from a rational '1/3'
to a floating-point '0.333333..').
In all of these cases, this sort of numeric flexibility depends on two things: a practical separation of the language definition and language implementation, and a conceptual separation of the data from the variables. This means that in all of these languages, the assumed structure is that variables are references - either typed or untyped, depending - to typed data objects, but they translator (compiler, interpreter, whatev') is free to 'snap pointers' and use the memory of the references to hold suitably small elements so long as doing so doesn't change the semantics or run-time behavior.
As you can see, this just doesn't fit C well, and doesn't fit assembly language - which, by definition, approximates a one-to-one correspondence between mnemonics and machine opcodes.
I am done for now, but I have a feeling that you are going to say more foolish things in the near future.
That is exactly what int and long already are! The standard C numeric types are (IIRC) defined as:~ wrote:About using automatically-sized variables for increased portability, if you think about it, it would be better to use something like intwide_t/uintwide_t, than having to decide between standard longs and ints.
char - a integer value of at least seven bits, capable of holding an ASCII character, defined by the platform and the specific compiler.
short int, or short - an integer value no smaller than the size of a char, and no larger than the size of an int.
int - an integer value no smaller than a short, but no larger than a long.
long int or long - an integer value no smaller than an int.
float - a floating-point type, preferably matching one of the native FPU formats, not larger than a double
double - a floating-point type, preferably matching one of the native FPU formats, not smaller than a float
In all cases, the signed versions would use (one of) the native sign method(s) of the CPU's arithmetic operations, though most C code assumes 2's-complement wherever it matters, because very, very few CPUs after the mid-1960s used anything else.
This ambiguity was deliberate, because C doesn't just run on x86 systems, or even just on 32-bit and 64-bit systems. The language standard leave a lot of wiggle room, in part to avoid having to deprecate a lot of existing code, but mostly because they can't predict the hardware it would be used on.
However, this came at the price of exactitude in the language - you could have two compilers, on different systems or even on the same system, which used different bit widths. The fixed-size integer types - int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t - were introduced to bring back the predictability, but at the cost of - you guessed it - the very flexibility you are looking for. The flexibility, I might remind you, which is still part of the language with the native-sized types.
Now, on the 32-bit x86, and most other 32-bit systems, compilers usually mapped these as char - byte, short - two bytes, int - four bytes, long - usually 4 bytes, but occasionally 6 or 8 bytes (in either case, some special data juggling would be needed to use a bit width greater than the largest native value). However, this was just a convention, not a part of the language definition, or even a necessary method on the x86.
When 64-bit systems started to appear, this led to an obvious compatibility problem for code that mistakenly assumed a 32-bit long, so the long long int type was added, and (IIRC) defined as 'no shorter than long and matching the largest native type in the system'. One wonders what will happen if 128-bit registers, addresses, and data paths ever become widely used...
For addresses - or rather, pointers - things get a bit trickier, because some CPU ISAs have multiple pointer sizes, either for different memory modes (as in the x86) or for different kinds of addressing (I can't think of any examples OTTOMH, but I understand it was something known to come up in things like the Burroughs mainframes). The default assumption is that addresses are all of a fixed size for a given CPU mode, that the address space is flat, and that the CPU won't change modes or mix memory models at runtime. This led to some complications for the MS-DOS C compilers, which meant that you generally needed to select a memory model at compile time, but again, the standard allowed for this. Hence the 'tiny' (single code/data/stack segment), 'small' (single code and data segment, separate stack segment), 'standard' (single code segment, separate data and stack segments), 'big' (multiple code segments, one each for data and stack), and 'huge' (anything goes, but all pointers are FAR, and watch the segment boundaries) models used by most x86 compilers of the era.
I am, as I said, working mostly from memory, so I probably have some of this wrong or out of date. Comments and corrections welcome.
Well, gee, we can just use the type tags too... oh, wait, you tell me that the x86 doesn't support tagged memory, and neither do any of the other major general-purpose ISAs today? Shocking!~ wrote:With automatic size selection, you bring automatic register/word width portability to any CPU architecture, not just x86, and with that your code would be much cleaner.
I really don't know why types like ptrwide_t/uintwide_t/intwide_t were never added to the C/C++ standard and assembly language. Code today would be incredibly more portable now.
(Actually, it is shocking, or at least unfortunate, but the truth is that tagged memory has never been common, mainly for reasons only tangentially related to the technique itself. I would love to see a tagged memory architecture go into widespread use, but I don't expect it to ever happen.)
Write this on the blackboard 1000, ~: THERE ARE NO DATA TYPES IN x86 ASSEMBLY.
None. They don't exist. There are data sizes, but they only exist to tell the CPU how many bytes to fetch, and, more importantly, they are part of the opcode. The assembler syntax may show them as modifiers, rather than giving them separate mnemonics, but in the actual machine code, they are part of the instruction. Changing the value after it is assembled would require changing the executable image - either by putting patch tags in for the linker to reset them, or by using self-modifying code at runtime.
It can be done, but it would require either changing the object format to support it, or adding extra code to perform the runtime modification - not too hard in real mode, though it would be difficult to do it safely and consistently, but in the protected modes or long mode, it would require a system call requesting the kernel remove the 'executable, read-only' flags on the page in question, making the modification, setting the flags back, before returning to the user application.
Better, I would say, to do it all in a high-level language - and C doesn't count as one, really - which can abstract the numeric types entirely, either by default or by specific syntax, and leave all this work to the compiler.
WUT?!?! Which x86 are you talking about? There is no such thing in x86! This is one of the most absurd statements you have ever made, and frankly, that's astounding by itself!~ wrote:Even when automatic word width selection is a very important programming concept used in x86,
Actually, those are now more the exception rather than the rule. Dynamic languages, both older ones like SNOBOL, Icon, Prolog, and almost the entire Lisp family, and newer ones like Python, Ruby, and Io (and newer Lisps such as Clojure), all use flexible numbers of one variety or another - since there is no express typing, the compiler or interpreter is free to select the appropriate size and representation, and resize it when something bigger or less precise is needed - often with some sort of arbitrary-precision data types (AKA BigInt, BigFixed, and BigFloat types) used when there is an overflow. They generally have a 'numeric tower', or a class hierarchy that serves the same purpose, and will generally go from~ wrote: it doesn't seem to be integrated anywhere else, not even in the most recent .NET languages, Java, or anywhere else. I will add those types to my compilers because I know that they alone can simplify the whole programming facilities of any language in existence.
unnamed max supported system integers -> integer bignums -> rational numbers -> fixed-point bignums -> max system supported floating-point numbers -> floating-point bignums -> complex numbers
Not all have all of these, but most have at least system longs, bignum ints, and max floats. Some may have a Binary Coded Decimal Fixed-Point type as well, or (as Python does) have libraries to support them. When they do support rationals, fixednums, fixed BCDs, or complex numbers, they will generally either size the components dynamically, or just use bigints (for the numerator and denominator of rationals, or for the underlying integer value for fixnums) or a BCD equivalent (for big BCDs), or with any of the above, for the real imaginary parts of complex numbers.
This doesn't apply only to implicitly-typed languages, either. Haskell, Erlang, Go, and (I think) Rust all have some ability to work in this way, though they aren't the default behavior in any of them and they all have their quirks (for example, while Haskell does require typing for all data, it allows you to define 'typeclasses' for groups of similar types - something almost, but not quite, completely unlike an OOP class hierarchy, but serving basically the same purpose; the Haskell compiler or interpreter applies type inference to determine what the actual type of a datum is, and use it to hold the value.
Even C# has some ability to do this now, with the 'var' types, though I have no idea whether it does any kind of type resolution either at compile time or run time. I assume so, because that's sort of an important feature of a well-designed OOP language with sensible support for polymorphism (not all do - I am looking at you, C++), but I can't be bothered to check.
Mind you, bignums are usually orders of magnitude slower than the system numbers, so you generally can coerce a value to one size and representation when you need to, to some extent.
Each language has its own way of handling (or not handling) ambiguities and loss of exactitude, but for the most part, they do a decent job of it for most purposes. and can force an inexact representation of an exact value, or vice versa, at the possible cost of precision (e.g, when going from a rational '1/3'
to a floating-point '0.333333..').
In all of these cases, this sort of numeric flexibility depends on two things: a practical separation of the language definition and language implementation, and a conceptual separation of the data from the variables. This means that in all of these languages, the assumed structure is that variables are references - either typed or untyped, depending - to typed data objects, but they translator (compiler, interpreter, whatev') is free to 'snap pointers' and use the memory of the references to hold suitably small elements so long as doing so doesn't change the semantics or run-time behavior.
As you can see, this just doesn't fit C well, and doesn't fit assembly language - which, by definition, approximates a one-to-one correspondence between mnemonics and machine opcodes.
I am done for now, but I have a feeling that you are going to say more foolish things in the near future.
Last edited by Schol-R-LEA on Sat Dec 09, 2017 12:35 pm, edited 7 times in total.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Opinions On A New Programming Language
Ah, thank you, I had a feeling I was missing things.Octocontrabass wrote:C already has this too, though only from C99 onwards. The signed form is intptr_t, and the unsigned form is uintptr_t.~ wrote:ptrwide_t
I also forgot to mention wchar_t, but that's it's own special flavor of insanity. I am guessing that in the absence of other specifying information, most compilers - but not all - use a four-byte or larger value for those, in case a UTF-32 character, or UTF-8 character with modifiers, comes along to make a hash of the size assumptions. And that's without even taking into consideration the fact that both C and C++ added different types with the same type name... meh, at least they tried I guess, which is more than I can say for the Unicode committee themselves over the past several years.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Re: Opinions On A New Programming Language
So, a few things I should mention right quick:
- The 'intptr' type in my language is not the same as an 'intptr' in C. In C, it means an integer that's large enough to hold a pointer; in my language, it's a pointer that's small enough to fit into an integer.
- The number of bits that an integer can have is arbitrary. You could create an 'int47' if you wanted. All that tells the compiler is that the size of the value (as stored in memory) is 47 bits. For doing arithmetic, it'll promote up to the next largest hardware accelerated integer size (64-bits), perform the operation, and truncate the result back to 47 bits. This behavior makes bitfields unnecessary.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Opinions On A New Programming Language
The real problem is that ~ isn't talking about portability in the sense anyone else, ever understands the term. What he means by 'portability' is, as far as I can tell, 'using the same x86 assembly source code regardless of system mode'.
Or perhaps, given his obsession with MS-DOS-like systems, real mode programming, and resurrecting pre-Windows95 DOS Extender based games, "being able to turn compiled Windows programs into something that will work in my Vastly Superior CLI-based DOS Extender, which everyone will immediately realize is the ideal way to use a computer and drop all of this GUI nonsense".
Maybe.
In ~'s world, portability to other architectures doesn't matter, because x86 is the be-all and end-all of computing, and nothing else exists - or if they did, everyone in the world would still always use x86 assembly programming when writing programs targeting them them, for Reasons.
Because ARM runs legacy MS-DOS programs natively, right?
Similarly, to him C is the One True High-Level Language, and anyone talking about any other language is in a state of sin.
You know, I might be projecting my personal opinions about his early statements on him here, just a bit.
Oh, well, at least he doesn't spew paranoid, racist and homophobic BS, unlike A Certain Now-Banned OS Dev who is awaiting trial at the moment, nor does he claim to be making money off of a closed-source OS that runs on the World's Slowest Virtual Machine Interpreter for a CPU design that most people consider to be of purely academic interest but which another now banned individual insists is inherently easier to program for and more cost effective to produce a hardware implementation of than any other ISA, for Reasons. But that's little comfort for the rest of us.
Or perhaps, given his obsession with MS-DOS-like systems, real mode programming, and resurrecting pre-Windows95 DOS Extender based games, "being able to turn compiled Windows programs into something that will work in my Vastly Superior CLI-based DOS Extender, which everyone will immediately realize is the ideal way to use a computer and drop all of this GUI nonsense".
Maybe.
In ~'s world, portability to other architectures doesn't matter, because x86 is the be-all and end-all of computing, and nothing else exists - or if they did, everyone in the world would still always use x86 assembly programming when writing programs targeting them them, for Reasons.
Because ARM runs legacy MS-DOS programs natively, right?
Similarly, to him C is the One True High-Level Language, and anyone talking about any other language is in a state of sin.
You know, I might be projecting my personal opinions about his early statements on him here, just a bit.
Oh, well, at least he doesn't spew paranoid, racist and homophobic BS, unlike A Certain Now-Banned OS Dev who is awaiting trial at the moment, nor does he claim to be making money off of a closed-source OS that runs on the World's Slowest Virtual Machine Interpreter for a CPU design that most people consider to be of purely academic interest but which another now banned individual insists is inherently easier to program for and more cost effective to produce a hardware implementation of than any other ISA, for Reasons. But that's little comfort for the rest of us.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Re: Opinions On A New Programming Language
Writing assembly code is only fun if you're fighting space and time.
You couldn't pay me enough to program in assembly code these days. Instruction sets are way too obtuse, too many layers of abstraction over the hardware, hardware is too varied and has little to no documentation, and pesky memory protection kills the fun of using thunks to dynamically load stuff to crunch the spacial requirements.
You couldn't pay me enough to program in assembly code these days. Instruction sets are way too obtuse, too many layers of abstraction over the hardware, hardware is too varied and has little to no documentation, and pesky memory protection kills the fun of using thunks to dynamically load stuff to crunch the spacial requirements.
- Schol-R-LEA
- Member
- Posts: 1925
- Joined: Fri Oct 27, 2006 9:42 am
- Location: Athens, GA, USA
Re: Opinions On A New Programming Language
Fair enough. Sorry to keep re-de-railing your discussion, BTW, it's just that a lot of what Tilde says is so outrageous, and he is so strident insistent on interfering in other members' posts, that it is hard not to confront him about it.
That having been said, my point about dynamic type resolution might be of interest here, in light of your discussion earlier with Solar about virtual vs. non-virtual methods. My view is that, for polymorphism in both functions and types, it is something better bound as late as possible - possibly as late as runtime. The trick here is for the compiler to identify cases where polymorphism isn't necessary - cases where the type, or at least its size constraints, are known ahead of time so it can resolve them early and choose a fixed representation that is more efficient.
Maybe this is a Lisp-ish view, as non-toy Lisp compiler implementations are full of this sort of thing, and some are even used by (non-toy) interpreters. The highly dynamic approach inherent to Lisp makes them necessary for anything like acceptable performance. Even things as basic to the language as lists are often not what they seem to be - nearly ever serious Lisp interpreter since the days if the MIT AI Lab (circa 1965) has used 'CDR-coding' to snap the pointers for lists when the are created, turning the linked lists into tagged arrays of elements (which could be addresses to a datum or another list, or any tagged datum that would fit into the same space as an address, or in some cases more exotic things such as tagged indexes into blocks of homogeneous data elements which could be treated as a unit by the garbage collector) as a way of reducing both space and time overhead.
In my view, it would make sense to have virtual (or at least potentially virtual) be the default, and have some modifier indicate when something has to be resolved early. Similarly, a variable in a 'parent class', 'interface', 'template', or 'typeclass' could be resolved as the actual type/class if the compiler can determine early on that it can only be an object of a certain sub-class, even if the programmer can't be certain.
Or, in the same spirit as 'CDR-coding', it could make an object of a parent class, and use a pointer to the parts not shared by the child classes. This could involve the compiler having to determine which functions/methods can be kept as-is, and which need to be in the vtable or equivalent. This means that the class runtime implementations may well differ from program to program, or even from one compilation of the same program to the next.
(And, since in my OS I mean to allow both load-time resolution and runtime code synthesis, it may vary between loads of the same program, or even over time as it runs in the case of some system-service-related objects - but that's getting ahead of things, as it isn't clear if it is really applicable to modern hardware at all.)
Basically, my goal in my own language(s) is to allow the programmer to make these decisions, but not to require them to do so if the aren't directly relevant to the programmer's needs - basically, allow optimization without forcing premature optimization. This is contrary to both the C/C++ school of thought, where the programmer has to make these decisions for all cases, but also to the Java/C# school, where such decisions are often taken out of their hands (while anomalously still requiring them to make some of them without a sensible default). It is a fine line to walk, and I will admit that it may not be possible, but I do mean to try.
I wouldn't blame you in the least for being a but more conservative in your design.
On the topic of type or class polymorphism, I was also wondering if you had considered going with the sort of type inference found in Haskell and Rust. This relates back to the previous assertion, in that it is essentially a way of exposing that mechanism for use by the developers of new types or classes.
I was also wondering if you were familiar with predicate dispatch, and in particular Millstein's seminal paper on the topic, and whether you saw it as something that would fit in with your design.
That having been said, my point about dynamic type resolution might be of interest here, in light of your discussion earlier with Solar about virtual vs. non-virtual methods. My view is that, for polymorphism in both functions and types, it is something better bound as late as possible - possibly as late as runtime. The trick here is for the compiler to identify cases where polymorphism isn't necessary - cases where the type, or at least its size constraints, are known ahead of time so it can resolve them early and choose a fixed representation that is more efficient.
Maybe this is a Lisp-ish view, as non-toy Lisp compiler implementations are full of this sort of thing, and some are even used by (non-toy) interpreters. The highly dynamic approach inherent to Lisp makes them necessary for anything like acceptable performance. Even things as basic to the language as lists are often not what they seem to be - nearly ever serious Lisp interpreter since the days if the MIT AI Lab (circa 1965) has used 'CDR-coding' to snap the pointers for lists when the are created, turning the linked lists into tagged arrays of elements (which could be addresses to a datum or another list, or any tagged datum that would fit into the same space as an address, or in some cases more exotic things such as tagged indexes into blocks of homogeneous data elements which could be treated as a unit by the garbage collector) as a way of reducing both space and time overhead.
In my view, it would make sense to have virtual (or at least potentially virtual) be the default, and have some modifier indicate when something has to be resolved early. Similarly, a variable in a 'parent class', 'interface', 'template', or 'typeclass' could be resolved as the actual type/class if the compiler can determine early on that it can only be an object of a certain sub-class, even if the programmer can't be certain.
Or, in the same spirit as 'CDR-coding', it could make an object of a parent class, and use a pointer to the parts not shared by the child classes. This could involve the compiler having to determine which functions/methods can be kept as-is, and which need to be in the vtable or equivalent. This means that the class runtime implementations may well differ from program to program, or even from one compilation of the same program to the next.
(And, since in my OS I mean to allow both load-time resolution and runtime code synthesis, it may vary between loads of the same program, or even over time as it runs in the case of some system-service-related objects - but that's getting ahead of things, as it isn't clear if it is really applicable to modern hardware at all.)
Basically, my goal in my own language(s) is to allow the programmer to make these decisions, but not to require them to do so if the aren't directly relevant to the programmer's needs - basically, allow optimization without forcing premature optimization. This is contrary to both the C/C++ school of thought, where the programmer has to make these decisions for all cases, but also to the Java/C# school, where such decisions are often taken out of their hands (while anomalously still requiring them to make some of them without a sensible default). It is a fine line to walk, and I will admit that it may not be possible, but I do mean to try.
I wouldn't blame you in the least for being a but more conservative in your design.
On the topic of type or class polymorphism, I was also wondering if you had considered going with the sort of type inference found in Haskell and Rust. This relates back to the previous assertion, in that it is essentially a way of exposing that mechanism for use by the developers of new types or classes.
I was also wondering if you were familiar with predicate dispatch, and in particular Millstein's seminal paper on the topic, and whether you saw it as something that would fit in with your design.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Re: Opinions On A New Programming Language
It's definitely better in terms of flexibility and simplicity. It's one of the things I love about languages like Lisp and Self (Smalltalk). The downside though is that you pay for the overhead of runtime compilation and dynamic binding. This specific language is intended for the same niche as C/C++, so that's a tradeoff I'm not willing to make.Schol-R-LEA wrote:That having been said, my point about dynamic type resolution might be of interest here, in light of your discussion earlier with Solar about virtual vs. non-virtual methods. My view is that, for polymorphism in both functions and types, it is something better bound as late as possible - possibly as late as runtime. The trick here is for the compiler to identify cases where polymorphism isn't necessary - cases where the type, or at least its size constraints, are known ahead of time so it can resolve them early and choose a fixed representation that is more efficient.
That being said though, I'm also working on a Self-based command language (like the Unix bash shell) in which this will be relevant.
It sounds more or less like it's just an optimization for linked lists that happens to work because the lists in Lisp are value types (immutable). In my language, they're references by default like in C/C++, but can be made into value types by adding a $ sigil:Schol-R-LEA wrote:Maybe this is a Lisp-ish view, as non-toy Lisp compiler implementations are full of this sort of thing, and some are even used by (non-toy) interpreters. The highly dynamic approach inherent to Lisp makes them necessary for anything like acceptable performance. Even things as basic to the language as lists are often not what they seem to be - nearly ever serious Lisp interpreter since the days if the MIT AI Lab (circa 1965) has used 'CDR-coding' to snap the pointers for lists when the are created, turning the linked lists into tagged arrays of elements (which could be addresses to a datum or another list, or any tagged datum that would fit into the same space as an address, or in some cases more exotic things such as tagged indexes into blocks of homogeneous data elements which could be treated as a unit by the garbage collector) as a way of reducing both space and time overhead.
Code: Select all
func cons(item: int, list: $[int]) -> $[int] {
return [int] { item, list }
}
Yeah, as I mentioned before, it's certainly debatable. Being virtual never causes problems, but being non-virtual makes a system more rigid and hard to change. That being said, I also think it's important to base the design of the language on what we do empirically rather than a one-size-fits-all approach. It would get old rather quickly if I had to constantly sprinkle keywords all over the place to improve performance because the compiler is making assumptions that are rarely the case.Schol-R-LEA wrote:In my view, it would make sense to have virtual (or at least potentially virtual) be the default, and have some modifier indicate when something has to be resolved early. Similarly, a variable in a 'parent class', 'interface', 'template', or 'typeclass' could be resolved as the actual type/class if the compiler can determine early on that it can only be an object of a certain sub-class, even if the programmer can't be certain.
These are definitely on my workbench. I'm thinking long and hard about things like allowing the language to modify it's own syntax trees before evaluation/compilation (like Lisp does).Schol-R-LEA wrote:(And, since in my OS I mean to allow both load-time resolution and runtime code synthesis, it may vary between loads of the same program, or even over time as it runs in the case of some system-service-related objects - but that's getting ahead of things, as it isn't clear if it is really applicable to modern hardware at all.)
I certainly agree with the philosophy, but as I stated, I think the true path here is to design the language based on what the programmer is usually doing rather than trying to create a one-size-fits-all tool.Schol-R-LEA wrote:Basically, my goal in my own language(s) is to allow the programmer to make these decisions, but not to require them to do so if the aren't directly relevant to the programmer's needs - basically, allow optimization without forcing premature optimization. This is contrary to both the C/C++ school of thought, where the programmer has to make these decisions for all cases, but also to the Java/C# school, where such decisions are often taken out of their hands (while anomalously still requiring them to make some of them without a sensible default). It is a fine line to walk, and I will admit that it may not be possible, but I do mean to try.
Yeah. Thats actually why there's a "var" keyword in the languageSchol-R-LEA wrote:On the topic of type or class polymorphism, I was also wondering if you had considered going with the sort of type inference found in Haskell and Rust. This relates back to the previous assertion, in that it is essentially a way of exposing that mechanism for use by the developers of new types or classes
Code: Select all
var foo = "bar";
The language currently supports it in the sense of overloaded functions and UFCS (Unified Function Call Syntax) and I'm currently working on an implementation of constrained type substitution. Originally, I had planned a syntax like:Schol-R-LEA wrote:I was also wondering if you were familiar with predicate dispatch, and in particular Millstein's seminal paper on the topic, and whether you saw it as something that would fit in with your design.
Code: Select all
func add(x: $T, y: T) -> T;
A lot of good suggestions there btw.
Re: Opinions On A New Programming Language
I've been thinking a bit about a new memory management model. In it, objects have owners and borrowers.
An owner is a function that can:
By default, all primitives like integers and strings are copied and all non-primitives like arrays and structs are linked. The sigils are only necessary if you need to override that behavior, such as in the above example where the caller is responsible for destroying the return value
An owner is a function that can:
- destroy the object
- write to the object
- lend the object to a borrower
- give the object to a new owner
- write to the object
- lend the object to a borrower
- Copy - Equivalent to "pass by value", this gives the function a copy of an object, for which it is the sole owner. The notation for a copy is a $ sigil.
- Link - This is a kind of "pass by reference" in which a function is allowed to borrow an object. The notation for a link is a % sigil.
- Move - This is a kind of "pass by reference" in which a function is given ownership of an object. The notation for a move is a / sigil.
Code: Select all
func strdup(str: %cstr) -> /cstr;