Page 1 of 1

Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 7:01 am
by ggherdov
Hello,

I work on Linux kernel development at SUSE Labs and I'd like to promote one of our job openings. This is my first post on the forum, but I've been reading for a while.

Live Patching developer

"Live Patching" is the Linux kernel infrastructure by which one can apply changes to a running kernel; this allows, for instance, to fix a security vulnerability with no downtime. For a more detailed introduction to live patching see the links [0][1][2] below. As a live patching developer you will extend and maintain our live patching tooling, both for kernel and userspace live patching. The main purpose of this role is working on improving the automation that helps with generation and verification of the actual live patches. In addition to that, you will implement tooling for userspace live patching and participate in developing the actual live patches as well.

This is a remote working position; if you want to work at a SUSE office instead -- Nuremberg (DE), Prague (CZ) and elsewhere, that is fine as well.

What we offer

You will take part in the development of the core parts of our enterprise and community distributions and you will be encouraged to submit your work upstream. We will sponsor travel to relevant conferences where you can present your work. Working time is flexible and we offer a bunch of the usual benefits (these differ in different countries, though).

How to apply

Preferably, submit all relevant information in a single PDF file, so that no important detail is lost in transit. Give us some time to process your application. Expect the interview to be done over phone. Form submission for this position at this link.

This is not the only job opening currently available at SUSE, see https://jobs.suse.com/

[0] "Low-level Function of kGraft" in the SUSE Administation guide.
[1] "Topics in live kernel patching", summary of upstream live patching current challenges from LWN (Nov 2016)
[2] KGraft Wikipedia entry

EDIT: fixed a dead link.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 7:52 am
by Ycep
Firstly offer money.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 8:18 am
by matt11235
Lukand wrote:Firstly offer money.
Have a look at SUSE on Glassdoor. It looks like the pay is very good.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 8:31 am
by iansjack
Lukand wrote:Firstly offer money.
Working time is flexible and we offer a bunch of the usual benefits (these differ in different countries, though).
I think you can assume that "usual benefits" includes remuneration; the rate, no doubt, will depend upon experience.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 10:02 am
by ggherdov
Compensation is generous and depends upon the experience of the candidate, as well as the region where they live.

As an obvious example, the cost of life is different in California, Utah, France and Czech Republic: the salary offer will reflect that difference as employees in different countries are attached to different financial entities within the company itself.

People are expected to know their market rate and negotiate accordingly; before I myself was hired, I consulted various online datasets like glassdor.com or the "Prices and Earnings 2015 report" by the Swiss bank UBS, I set my price and that was happily matched by the company.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 8:48 pm
by Love4Boobies
I really think that live updating is the wrong architectural approach. Instead of putting a lot of costly effort into developing a solution that is almost certainly going to contain all sorts of ugly bugs and which involves state transition functions at certain points in the kernel, a more sensible approach would be to serialize its state, discard the current kernel, start up the new one and deserialize the prior state. The downtime would really be negligible, the likelyhood of bugs would be serverly decreased, and the maintenance would be so much less expensive that it's not even funny.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 9:35 pm
by dchapiesky
Love4Boobies wrote:I really think that live updating is the wrong architectural approach. Instead of putting a lot of costly effort into developing a solution that is almost certainly going to contain all sorts of ugly bugs and which involves state transition functions at certain points in the kernel, a more sensible approach would be to serialize its state, discard the current kernel, start up the new one and deserialize the prior state. The downtime would really be negligible, the likelyhood of bugs would be serverly decreased, and the maintenance would be so much less expensive that it's not even funny.
as well as how are the updates versioned and delivered to the running kernel? And how do you take an inventory to *know* just what is currently running... I can't see enterprise security review passing this at all... Ultimately you will need a Live Patching Security Team as well.... I see nothing but pain in this job offering (no offense)

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Wed Jan 04, 2017 10:21 pm
by Love4Boobies
dchapiesky wrote:as well as how are the updates versioned and delivered to the running kernel? And how do you take an inventory to *know* just what is currently running...
Well, versioning and delivering are necessary regardless of your update strategy. As for how you would know what's running, let me elaborate on what I meant above (while still disagreeing with this approach). First, you'd wait for the kernel to reach a safe state (one that you can more easily reason about), perhaps by notifying it of an incoming update, at which point it might resolve all current requests and postpone any incoming ones. The safe state alone is not sufficient because the new version might expect some computation to have been performed before the kernel reached that point. To resolve this issue, a special transfer function would translate the state that makes sense in the old version into one that makes sense in the new version. After that, the updated kernel could come out of its safe state and start accepting requests once again. For simplicity, you'd only have transfer functions from one kernel version to the next and perform incremental updates in order to reach the latest version from any prior one. It's a bit more involved than that (not by much) but I was trying to keep the explanation short.

The biggest problem is that this requires maintainers to write up these transfer functions for every new version. That's a lot of work and there are opportunities for bugs. And if you've discovered a bug in the transfer function from version A to version B while preparing the transfer function from B to C then this latest update might need to treat a pure installation of version B differently than one that was updated from A, as that bug might mean it is in some weird state. It gets worse and worse the later you discover the bug.

On the other hand, simply restarting the kernel and telling it what the old kernel was doing is, in principle, as simple as a text editor loading a file saved by a prior version of that editor. What are you worried about, the less than half a second downtime it might take for serializing, reading the new version from disk, and deserializing? I'd say it's well worth the trade-off.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Thu Jan 05, 2017 4:17 am
by Kevin
Love4Boobies wrote:The safe state alone is not sufficient because the new version might expect some computation to have been performed before the kernel reached that point. To resolve this issue, a special transfer function would translate the state that makes sense in the old version into one that makes sense in the new version.
I think you're overestimating the scope of this. If you read the links in the original post, this is mostly about security fixes that involve replacing code, but don't require any change in data structures. Given this, no transfer function is necessary at all.
On the other hand, simply restarting the kernel and telling it what the old kernel was doing is, in principle, as simple as a text editor loading a file saved by a prior version of that editor. What are you worried about, the less than half a second downtime it might take for serializing, reading the new version from disk, and deserializing? I'd say it's well worth the trade-off.
Well, we all know that nobody has ever seen problems with suspend-to-disk, right? ;)

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Thu Jan 05, 2017 5:04 am
by Love4Boobies
Kevin wrote:
Love4Boobies wrote:The safe state alone is not sufficient because the new version might expect some computation to have been performed before the kernel reached that point. To resolve this issue, a special transfer function would translate the state that makes sense in the old version into one that makes sense in the new version.
I think you're overestimating the scope of this. If you read the links in the original post, this is mostly about security fixes that involve replacing code, but don't require any change in data structures. Given this, no transfer function is necessary at all.
Except the new code might use an extra variable or have its logic altered in such a way that a variable in the old code has a value that makes no sense after the transition to the new code. These are very basic common scenarios where transfer functions would be required.
Kevin wrote:
On the other hand, simply restarting the kernel and telling it what the old kernel was doing is, in principle, as simple as a text editor loading a file saved by a prior version of that editor. What are you worried about, the less than half a second downtime it might take for serializing, reading the new version from disk, and deserializing? I'd say it's well worth the trade-off.
Well, we all know that nobody has ever seen problems with suspend-to-disk, right? ;)
Coming out of hibernation can take a long time because you're not just talking about the kernel but also about all the programs running in user space and their data. What I am suggesting should be much faster than booting the machine.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Thu Jan 05, 2017 6:09 am
by Kevin
Love4Boobies wrote:Except the new code might use an extra variable or have its logic altered in such a way that a variable in the old code has a value that makes no sense after the transition to the new code. These are very basic common scenarios where transfer functions would be required.
It's easy to think of such cases, but it still seems to be out of scope for this specific project.

And a security fix can pretty much always do without such changes. In some cases, it might not be able to provide the fully correct implementation with live patching, but removing the exploitability is most likely possible. For the full fix, a scheduled maintenance window (which doesn't have to be "right now" like without live patching) can be used then.
Coming out of hibernation can take a long time because you're not just talking about the kernel but also about all the programs running in user space and their data. What I am suggesting should be much faster than booting the machine.
I'm not talking about the time it takes, but about reliability. Things like hardware not working any more after resume because the state wasn't correctly restored. And that's within a single kernel version. In your live patching with changed data structures, even if you serialise some state and deserialise it in the new kernel, the new kernel still needs to translate that state.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Thu Jan 05, 2017 7:57 am
by ggherdov
Love4Boobies wrote:I really think that live updating is the wrong architectural approach. Instead of putting a lot of costly effort into developing a solution that is almost certainly going to contain all sorts of ugly bugs and which involves state transition functions at certain points in the kernel, a more sensible approach would be to serialize its state, discard the current kernel, start up the new one and deserialize the prior state. The downtime would really be negligible, the likelyhood of bugs would be serverly decreased, and the maintenance would be so much less expensive that it's not even funny.
I'm not going to lie and tell you that live patching is a walk in the park. It's a hard technical problem for all the reasons you stated and more; with some kind of patches it's easier and works better, but sometimes you need to get really creative and all you can achieve is a temporary solution. The role I posted above is exactly within the scope of tooling to verify the correctness of live patches and extend the territory where the mechanism works reliably. From an engineering perspective I think this job offers some unique intellectual challenges to an operating system enthusiast, and that's why I recommended it in this technical community.

Your objection is on the "strategy" level, as in "you're solving the wrong problem". Well, it's a practical solution to a practical problem. Sure, not in every use case is appropriate to go down the live patching way -- but sometimes it's the right tool, some SUSE customers demanded it and it makes economical sense for the company to invest in this technology. From the presentation "SUSE Linux Enterprise Live Patching Roadmap", given at the SUSE Conference 2015, here some examples of situation where organizations need live patching, be it to respond to an incident or to perform an emergency change:
  • * In-memory databases: SAP partnered with SUSE to optimize their HANA database on our distro; there are deployments with ~10 TeraBytes of memory where the problem is not the time it takes to reboot the machine but stop/restart the database.
    * Virtualization hosts: an organization might have many (~1000) lightweight virtual machines on a single host, and a reboot of the host could require a level of coordination between parties that takes time, while an emergency change could be needed immediately.
    * Numerical simulation: supercomputers, like Pleiades at NASA which runs SUSE Linux, perform long running numerical simulations (weeks/months) which cannot be stopped at will. In such cases live patching let the operators face unexpected situations with an appropriate tool.
In all those cases live patching gain you some time to plan for a proper reboot and update, while still mitigating an imminent contingency.

The entire presentation is ~50 minutes but the question "Why live patching?" is addressed at the very beginning.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Thu Jan 05, 2017 8:12 am
by matt11235
How is living in Prague, by the way? I've noticed there's quite a few tech companies (JetBrains, BI, SUSE...) and that the beer is cheap, but I think there must be more to it than that :wink:

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Thu Jan 05, 2017 8:46 am
by ggherdov
zenzizenzicube wrote:How is living in Prague, by the way? I've noticed there's quite a few tech companies (JetBrains, BI, SUSE...) and that the beer is cheap, but I think there must be more to it than that :wink:
I am new to the city; I relocated here 10 months ago from southern France (although I'm originally from Italy) ot join SUSE Labs. I'm very happy with the change; the city is very livable and affordable, local people is very friendly, public transportation is extremely reliable and works 24x7 (I don't need a car at all). Culturally is very rich -- you can be into anything from classical music, to underground hacker scene to free climbing (to name some random examples I found) and you'll find something to fill your free time. The language can be an obstacle but I've seen people learning it in a year and I'm studying it. Very well connected to the rest of Europe via trains and low cost flights. Also, as you mention, the tech scene is quite active.

Re: Job Opening: Live patching for the Linux kernel at SUSE

Posted: Thu Jan 05, 2017 1:06 pm
by Ycep
Kazakhstan seems rich for me. (13.000$ GDP per capita!)