How about a stable Kernel API?

Back in my ill-spent youth, I worked for three years on Solaris binary stability. These days, I work on Linux, which it turns out, has to deal with exactly the same problems.

One of them is retiring deprecated kernel interfaces.

Introduction

We ship continuously to add features, but we keep having to fix bugs.  It’s genuinely hard to keep an interface the same if it needs to change to fix a bug.

That’s especially hard when it’s not the same codebase. If someone else writes applications while I maintain stdio, every time I fix something, I threaten others with obsolescence.

Imagine, if you would, I just invented strscpy() and wanted to retire strncpy(). Do you imagine my proposal would be greeted with applause? Whether I was adressinmg kernel or application developers, I’m pretty sure they wouldn’t approve.

My experience

Edsel Adap and I maintaines a site of linker tools that depended on a specification file, rather like the .spec file used to build RPMs. Edsel did the front end and some of the back ends, I did the other back ends.

Every once in a while, one of the back ends would need some additional information, so we’d change the file and then be faced with changeing ing all the programs to suit. We didn’t like to do that: flag days are a pain.

So we used a versioned interface, much like the “SUNW_1.1” versioning notation used by the Solaris kernel. We had an include file that declared the version of this interface was 2.9, up from 2.8 as we’d added some additional information.  If an existing back end called for a 2.8 structure, they’d get it via a “downdater”, and nothing would change. When you wanted to update to 2.9, you’d recompile, see what just failed (we made sure there would be a meaningful error) and update it.  When all the back ends had been updated, we’d throw away the downdater for that particular version.

This is easy in a single codebase, with only about eight people working in it. It’s harder if the producer is in one codebase, the standard libraries, and the consumer is a user-level application.

It’s still possible, mind you, just slower.  I was part of an earlier project that introduced a new kind of “linker record” in Honeywell GCOS, and we had to maintain the old record for many releases. By the time I left Honeywell years later, we had only got as far as requiring a special linker option to be set for customers to keep using it (;-))

A proposal for the kernel

The process of controlled mutation via versioning came to us from the Solaris kernel (well, actually from David J. Brown, our boss, who worked on both kernel and libraries), so it is applicable to the Linux kernel as well.

The limitation is that it applies to things which have an interface: it’s applicable to BUG_ON(), but not to switch statements or variable-length arrays.

Start by defining a version for an interface you want to change, say the memcpy in the standard libraries. Image, if you would, that there is a similar interface (strcpy/strlcpy/strscpy, perhaps) in the kernel.  Call it 2.2 in the include file that defines it, change the declaration so that the linker sees it as memcpy@2.2 and create a memcpy@2.1 in the library, pointing to a backwards compatible version or a “downdater” that makes it possible to call the new version and get a valid result. Now run sed on everything to make all existing calls go to memcpy@2.1, and recompile.

At this point, you’re ready to start the process of phasing out the old memcpy. Anyone who tries to link to just “memcpy” will get an error from the linker.  Oh yes, you’d better change the linker message, too, to help them understand that something changed.

There are a family of variations on this: if you can check at runtime which version they should be using, you can chose to leave the declaration of the function just memcpy and diagnose when they should be using version 2.1. That was the approach we used with the Honeywell linker, followed on the next release by requiring the caller o pass the linker an option that says “use the old one”.

With an interface whose name is changing, like from BUG_ON() to check_data_corruption() and a requirement that the code be prepared to fall through, then the old name can disappear completely, and only BUG_ON@2.0 exist henceforth.  Anyone who tries to call the BUG_ON() will get a #error directive instead, with a nice message saying what they need to do.

I think you can see how this would apply to strcpy, strlcpy and strscpy.

Conclusions

It’s easier to have a Kernel-only API than a public ABI, but it changes the problem space. Instead of what to use, it becomes how to get rid of the last user of the old way.

Complete removal is the easiest, but evolution through controlled mutation is not just possible, it’s something we did for years.

 


Bibliography

  1.  Replacement of deprecated kernel APIs at LWN.net, the article I’m responding to.
  2. “DLL Hell”, and avoiding an NP-complete problem in my blog.
  3. You Don’t Know Jack About Software Maintenance by Paul Stachour, on the origin of versioning.
  4. Library Interface Versioning in Solaris and Linux David J. Brown and Karl Runge DJB’s paper on Linux versioning.

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s