LtG wrote:linguofreak wrote:
The only way to absolutely guarantee it is to have a OOM killer that queries the user for which task should be killed, because only the user knows what is "innocent" and what is "malicious".
I'm not sure how feasible that is, are "normal" end users supposed to know what is malicious and what is innocent? Do they in practice?
OK, so even then it's not an absolute guarantee, but assuming an experienced user who knows what programs they want running, it is. In any case, the claim is that it is the only way to guarantee that innocent processes survive and malicious ones die, not that it is feasible (at least in all cases).
Also, asking really only works on a interactive desktop type of scenario, or how would you do it on a console/shell/prompt situation? Shells are generally expected to do what you tell them to do, not to come up with their own questions all of a sudden while you're using 'vi' or something.
Actually, doing it on a console is the use case I'd be more confident of being able to do it in. Doing it in a GUI would be much more likely to end up allocating memory in non-obvious ways, while our theoretical user-querying OOM killer has to be able to do its job without allocating memory. Ideally, its memory usage would be O(1) in the number of processes. If it used O(f(n)) memory (for an arbitrary function f(n) and number of processes n), it would have to receive memory for its recordkeeping on a process as the first step the system took in starting that process.
As for system messages coming up while you're working in a terminal, Linux will print all sorts of errors to the system console regardless of what you may be doing on that virtual terminal. My box is currently spewing errors about a failed CD drive that I haven't had the time to open up the machine to disconnect. An OOM situation is next thing to a kernel-panic / bluescreen, both of which will happily intrude while you're doing other things, so I don't see any big problem with the OOM killer doing the same thing.
What about servers? The system freezes, waiting for user to answer what to kill, but server admins don't typically look at each server every minute of the day, so it might take quite a while before someone realizes to check the server.
The OS doesn't need to totally grind to a halt. It will have to stall on outstanding allocations until the user makes a decision or a process ends on its own or otherwise returns memory to the OS, but processes that aren't actively allocating (or are allocating from free chunks in their own heaps rather than going to the OS for memory) can continue running. Now, if a mission-critical process ends up blocking on an allocation, yes, you have a problem, and a user OOM killer might not be appropriate for situations where this is likely to cause more trouble than an OOM situation generally does in the first place.
Of course SSH into the server might no longer work because SSH too needs memory, so the moment it tries to spawn it gets frozen due to OOM condition. So you basically get yourself in a chicken and egg situation.
For a server, your OOM killer could actually make use of a reverse-SSH protocol where the OOM killer makes an *outbound* ssh-like connection, using pre-allocated memory, to a machine running server management software, which could then send alerts to admin's cell phones, take an inbound SSH connection from an administrator's workstation (or phone), and pass input from the administrator to the OOMed server and output from the server back to the administrator.
In extreme conditions the user might not even be able to open the task manager to check what to kill because of the chicken-egg.[/qujote]
This can be solved by the OOM killer presenting its own task list kept in pre-allocated memory.
[qoute]I suppose if each process had min/max (like in the ELF header) as well as progress indication and how to min/max changes due to progress, then it might be possible to change the problem into a scheduling problem.. But given how difficult scheduling is already, I'm not sure if the extra complexity really serves anyone.
Not every process has bounded memory requirements. Not every process has bounded runtime. Daemons are generally supposed to keep running indefinitely.
One useful feature would be a facility for processes to provide the kernel with a list of free pages in their heaps so that the kernel could reclaim those pages if needed.
Perhaps allowing over commitments, but keeping a track of it and when used memory gets too close to max available you need to start to lessen scheduling and allowing individual processes to complete to free their memory until you get below some threshold and then go back into normal over commit allowed status.