Programs without arguments

Jezze · Post by **Jezze** » Wed May 02, 2012 4:18 pm

All right. I have just recently started something that I might come to regret but in my world it makes sense somehow but I'm often wrong (hey I don't have an IQ of 160

) and I like the feedback I get here so I will try to explain the reasoning behind my decision to the best of my ability.

What I've done is that I've removed the possibility to send program arguments to my userspace programs. This means that instead of the usual:

Code: Select all

unsigned int main(int argc, char **argv)
{

    return 0;

}

Now all my programs must look like this:

Code: Select all

void main()
{

}

When a program is started by using the system call execute() it will automatically inherit all the open file descriptor from it's parent so it will share STDIN and STDOUT and any other open files the parent has. If it needs another file it can use fopen() just as on any other UNIX operating system but those descriptors will only be local to the child itself.

To illustrate what I mean lets take the program cat. Normally you would give cat some arguments saying what files it should concatenate. My implementation of cat does not have any notion of what files it should work on (it does not get any arguments with the filenames), all it needs to know is that it can read data from some predetermined file descriptor like stdin and print it to stdout. This is pretty much the same as a normal pipe.

An simple implementation could look like this:

Code: Select all

void main()
{

    char buffer[0x1000];
    unsigned int count;

    count = fread(3, 0x1000, buffer);
    fwrite(STDOUT, count, buffer);

    count = fread(4, 0x1000, buffer);
    fwrite(STDOUT, count, buffer);

}

This version of cat is very limited in that it only prints the first 0x1000 bytes of it's file descriptors 3 and 4 (and not stdin as you might have noticed).

I like this way of writing programs because it almost forces programs to do one thing and do it well which is what the UNIX philosophy is about (I've heard). If a program does not have options a program can't really do much else than one thing. Many programs on modern UNIXes have a shitload of different parameters and you often don't remember or use half of them anyway. Without arguments you need to have many smaller programs to do all the things one typical program could do but at the same time you only use a subset of that normal program anyway.

Another good thing about not having arguments is that when calling execute() I don't need to copy the arguments from the parent's memory space to the child which I would have to do otherwise. This saves a few lines of code so it's nothing important but it separates the processes even more which I like because now I don't need to add stuff to the stack or the heap or wherever you might store the arguments.

Also it makes the implementation of a shell quite interesting. Given an input string like this:

Code: Select all

$ cat myfile.txt anotherfile.txt

All it has to do is to call fopen() on "myfile.txt" and "anotherfile.txt" before calling execute("cat") and cat, when started, will automatically print the contents of the supplied files without having any need to know their names. It would be just a little bit more complicated to add pipes and redirects but still pretty simple.

Well... that was all I had. Does it make any sense what so ever?

AndrewAPrice · Post by **AndrewAPrice** » Wed May 02, 2012 4:55 pm

Jezze wrote:When a program is started by using the system call execute() it will automatically inherit all the open file descriptor from it's parent so it will share STDIN and STDOUT and any other open files the parent has. If it needs another file it can use fopen() just as on any other UNIX operating system but those descriptors will only be local to the child itself.

I understand what you're trying to accomplish, and this is not necessarily a bad idea, but if my program launches an external program I don't want it to have access to all the files I have opened.

For example, imagine a database server - it has confidential database files open -and it wants to launch an external parser. It'll have to lock all threads that may be doing file I/O, close all confidential file handles, launch the external program, and re-open all file handles again.

The other thing is, I think you may be confusing input stream and the arguments. You're treating the arguments as an endless stream, rather than a fix length. There is nothing wrong with this (your OS - your rules) but some arguments change the entire behaviour of the program, therefore it'd be useful to know what all of the arguments are before you begin processing.

You can get rid argc/argv by passing arguments in some other way. For example, pass the exact command used to execute the tool - e.g. "cat filea.txt fileb.txt" - followed by an ASCII control character in stdin, with standard stdin stuff following after this. That way you can still dynamically stream as many arguments as you want into it, and the program knows when to stop listening for arguments and to start processing the incoming data.

Alternatively, you could have a function such as:

Code: Select all

char *getProcessString();

That returns the exact command, typed into bash or whatever, arguments and all.

Or, you could remove arguments completely, and rely on IPC for one program to tell another what to do.

Jezze · Post by **Jezze** » Wed May 02, 2012 5:21 pm

Yeah there is a good point to the first argument. There is no correlation between what the parent's file descriptors are and what the child expects it's input to come from meaning the child might expect input from file descriptor 4 but the parent already have another file with descriptor 4. And this could create troubles I agree. Also the confidential thing. You gave me something to think about there.

For the second argument there must have been some misunderstanding and I'm sorry for being unclear. I don't want arguments to be sent to my programs at all, they are not suppose to come from any stream either. A program is meant to be small enough that it only does one thing so no arguments would be required (it could use a configuration file but that's another story).

Thanks for the reply!

piranha · Post by **piranha** » Wed May 02, 2012 7:56 pm

If you do a fork() and exec() you should close any files that you don't want the child to have access to before you do the exec, and after the fork.

Basically what you're trying to do it make every program do exactly and only one thing, in essence making each argument its own different program. The major flaw that I see in this is extreme data duplication. The difference in code between 'cat' and 'cat -n' is so little that they should be in the same program. Also, what if a program has more than one input file? Say, 'cat a b c d > e'. I assuming that your shell would open up all the files and then pass them to cat in the right place. Cat would then loop through all its open file descriptors and output it to e. It would give no output to the screen. So why not reroute stdout then? Same thing applies to "cat < a.txt > b.txt". Instead of using file descriptors 3 and 4, why not use stdin/out? Say you want to pipe the output of something to another program? You'd have to standardize which filedes is input and which is output. But thats already one, with stdin and stdout.

What if the argument is not a file? What if you want to ping google.com? How does your shell open 'google.com'? It would need to be programmed in with networking support, which is cumbersome to have in a simple shell.

The biggest problem I see is the first one, the unneeded duplication of data. Instead of 'cat' and 'cat -n' being the same program, you have 'cat', 'cat_linenumbers', and one for every different switch that cat may accept. The alternative is to pass the program arguments. This is easy and simple to do in exec. Like you said, it doesn't waste many lines of code, nor take up a lot of resources. So why not just use that?

-JL

iansjack · Post by **iansjack** » Thu May 03, 2012 2:49 am

I'm not sure that duplication of data need be a problem. Two solutions immediately come to mind.

1. The different programs are links to the same executable, which determines how to act depending upon the name it was called with. This is commonly done in Unix.

2. The programs are separate executables, but most of the work is done by dynamically linked libraries. The program itself is essentially a set of stubs, which take up little space.

However, in either case, these solutions occupy additional disk space unnecessarily.

By far the bigger objection to programs without arguments, to my mind, is the sheer difficulty of remembering and documenting the huge number of programs that would be required. I want a simple directory listing: "ls". I want a detailed listing: "ls_detailed". I want a simple listing including hidden files: "ls_hidden". I want a detailed listing including hidden files: "ls_detailed_hidden", and so on. The number of combinations soon spirals out of control. It's easy to document, and to remember, a number of switches to the same basic command - and it's very easy to combine those switches in different ways. But to do this via separate programs is unweildly. And that's before you even start worrying about variable numbers of parameters and other variations.

The paradigm of having a simple program that acts in different ways according to information passed to it at run time has served us well; it requires something with more functionality to replace it, not something that is more constrained.

Jezze · Post by **Jezze** » Thu May 03, 2012 4:43 am

I think both of your posts have very good points to them.

To start with piranhas points. Yeah the STDIN and STDOUT would still be used as normal so that doesn't change. My example was a bit clumbsy I admit. Regarding the problem with small differences like cat and cat -n I thought this could be solved by simple pipe:ing. In this case it would be something like $ cat myfile.txt | linenums or something like that. It is much more dynamic in my oppinion to have a seperate program that adds linenumbers to output instead of having that functionality inside cat. Actually this example perfectly hightlights why I think arguments are bad because it enables programs to do stuff like -n that shouldnt really be part of it. As for the example with ping www.google.com my solution would be to start ping without arguments. This will tell ping to read from stdin, and then you just write www.google.com and press CTRL+D and ping will start reading it's standard in and start pinging the url.

As for iansjacks points this is an issue of piping as well. ls could as default print all files and then you apply a filter on it like ls | hiddenfiles.

sandras · Post by **sandras** » Thu May 03, 2012 4:52 am

(Jezze made some point's while I was writing this.)

All I use cat for is either

Code: Select all

cat a

or

Code: Select all

cat a b > c

, so I would have my cat not only not take arguments but also not read from stdin if no arguments are given. It was annoying when I issued it without arguments and it just read from stdin as back then I did not know how to turn it off. The only case where I need line counting is in my editor, when the compiler says, there's an error on line 123. Anyway, if you need a command line way to do it, have a command named linenumber, which will read from stdin and output the same thing on the other end, just with line numbers. Notice, that linenumber does not concatenate, like cat does, it's only function is to number the lines. Also, cat's only function is to concatenate files and output them to stdout. The thing that we can use it for outputing only one file is just a nice side effect. Thinking about it, I think, such a cat is very beautiful. And bugs? I believe you'd be very capable of making it bugless. Also I think linenumber is easy to remember for a command used to count lines, isn't it? Well, at least I find it easier than cat -n, as it's more descriptive, though longer too. Some reading: http://harmful.cat-v.org/cat-v/ : )

As Jezze said, he's only looking for ways to pass file descriptors, if I'm right, but let me tell you why I like the stdout > stdin way of passing arguments:
* it can pass both flags and files (and anything else) as arguments (think grep, sed);
* it's more portable than passing arguments on stack;
* (compared to Jezze's way) you do not need to close all the files you do not want your child to have access to, when you spawn your satan's spawn;
* if you're going the Unix way (which I do, not blindly (I can only hope)), it's a text stream;
* as a concept, it's simpler than passing arguments in other ways, and (almost?) entirely reuses an already existing concept (text streams) (what I try to do is simplify my OS model and implementation, while still keeping nice things like security, flexibilty, etc.).

What I'm worried about in this model, is the overhead of parsing the stream from stdin, as compared to argc + **argv way.

In this model cat has to remember how to read from stdin, but hey, if you forgot to give arguments, do it now, while cat is reading'em. What are the possibilties, of making it interactive? I'm not sure, cat could detect, that you didn't give arguments through command line, and print "give me some cat food".

As for code duplication, yes you can use the BusyBox way and/or shared libs, and I don't see how it takes more disk space. The code becomes shared (by definition, it consumes less space), and also, there's less wasted space due to internal block fragmentation. Unless you're talking about the overhead of the code, that figures out which applet in the program to launch, in which case, I'd say, that sharing the code makes up for this overhead.

sandras · Post by **sandras** » Thu May 03, 2012 4:58 am

What about if you have the line numbers flag in many programs, like 256 of them. : ) Wouldn't it be more space efficient to have a separate linenumbers program and have all the others pipe through it, when you need the lines numbered. Even without the two mentioned code sharing examples, I think, this would be more space efficient.

Also, maybe not quite on topic, but maybe some of you might be interested in the suckless.org way of program configuration. If you don' know what I'm all about, - they just use compile time configuration. This results in smaller, faster, and safer executables.

iansjack · Post by **iansjack** » Thu May 03, 2012 5:27 am

Jezze wrote:As for iansjacks points this is an issue of piping as well. ls could as default print all files and then you apply a filter on it like ls | hiddenfiles.

You could do that, but it's going to start to get a bit convoluted, and inefficient with all those processes being created, if you wish to apply a number of switches.

iansjack · Post by **iansjack** » Thu May 03, 2012 6:03 am

Sandras wrote:As for code duplication, yes you can use the busybox way and/or shared libs, and I don't see how it takes more disk space. The code becomes shared (by definition, it consumes less space), and also, there's less wasted space due to internal block fragmentation. Unless you're talking about the overhead of the code, that figures out which applet in the program to launch, in which case, I'd say, that sharing the code makes up for this overhead.

Each soft link that you create uses another i node. So a program with multiple switches uses one i node; multiple links to the same executable use many. And if you have many executable, rather than just links, each uses at least one block on the disk.

And you can't, as I see it, dispense with arguments altogether. If your C compiler is to compile a program it needs to get the filename(s) somehow. So at some point some executable must take that filename as an argument.

iansjack · Post by **iansjack** » Thu May 03, 2012 6:07 am

berkus wrote:If you look at it as a set of transformations it's not that bad.

$ alldirs | ls | hiddenfiles | bytype| colored

It becomes much more verbose for end-users, so you can introduce aliases; but the applications itself stay very small and focused, and perhaps much easier to audit and secure.

I don't honestly believe that most switches to most programs could be coded like that. Take, as an extreme example, gcc with it's multitude of switches. Do you really want a separate executable to process each of these? And how reusable are they? How many other programs take a switch like "-mno-red-zone", for example?

sandras · Post by **sandras** » Thu May 03, 2012 6:15 am

iansjack wrote:
Sandras wrote:As for code duplication, yes you can use the BusyBox way and/or shared libs, and I don't see how it takes more disk space. The code becomes shared (by definition, it consumes less space), and also, there's less wasted space due to internal block fragmentation. Unless you're talking about the overhead of the code, that figures out which applet in the program to launch, in which case, I'd say, that sharing the code makes up for this overhead.
Each soft link that you create uses another i node. So a program with multiple switches uses one i node; multiple links to the same executable use many. And if you have many executable, rather than just links, each uses at least one block on the disk.

TBH, I am not sure how much space an inode consumes. But as the inode has no non shared blocks associated with it, if you have enough inodes in your fs, you're set. For example, I have never seen a Linux + BusyBox system, that ran out of symlinks with standard mke2fs configuration. So practically, symlinks don't consume space.

iansjack wrote:
berkus wrote:If you look at it as a set of transformations it's not that bad.

$ alldirs | ls | hiddenfiles | bytype| colored

It becomes much more verbose for end-users, so you can introduce aliases; but the applications itself stay very small and focused, and perhaps much easier to audit and secure.
I don't honestly believe that most switches to most programs could be coded like that. Take, as an extreme example, gcc with it's multitude of switches. Do you really want a separate executable to process each of these? And how reusable are they? How many other programs take a switch like "-mno-red-zone", for example?

As far as I understand, Jezze intends to pass only files as arguments, but no flags or anything. That means you could do cc file.c.

iansjack · Post by **iansjack** » Thu May 03, 2012 6:16 am

Sandras wrote:As far as I understand, Jezze intends to pass only files as arguments, but no flags or anything. That means you could do cc file.c.

And have a separate filter program to process -mno-red-zone?

sandras · Post by **sandras** » Thu May 03, 2012 6:25 am

iansjack wrote:
Sandras wrote:As far as I understand, Jezze intends to pass only files as arguments, but no flags or anything. That means you could do cc file.c.
And have a separate filter program to process -mno-red-zone?

I do not know what that option stands for (what does it stand for?), but I agree, that some things may be better left for command line flags. I was talking about what I think Jezze wants, not what gcc needs. For example, he could use a different compiler, which does not take your mentioned option, if he decides to self-host. Whatever that option does, I do not believe it is essential for a self hosting OS.

iansjack · Post by **iansjack** » Fri May 04, 2012 12:47 am

berkus wrote:
iansjack wrote:
Sandras wrote:As far as I understand, Jezze intends to pass only files as arguments, but no flags or anything. That means you could do cc file.c.
And have a separate filter program to process -mno-red-zone?
You can have an options file

$ cc file.c file.opt

That's certainly a possibility, but I don't think it's as elegant as the command-line switches paradigm. The truth is that many, if not most, programs can usefully take options in some form; this is particularly true of versatile programs such as compilers where there may be a host of configurations. The discussion is just about the most useful, and most elegant, way to communicate that information to the executable.

The command-line switch communicates this information in a simple, compact way that doesn't depend upon a large number of executables for a simple task or the editing of other files to vary the execution of the program. As they say, if it ain't broke then don't fix it. Command-line switches are well established, do the job with the minimum of extraneous detail, and ain't broke.

OSDev.org

Programs without arguments

Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments

Re: Programs without arguments