Get front row seat and watch the development of micro-isv. We make cool product that solves all your problems...

logo

Friends
         
We need a (Formal) Failure Model
2009-03-12

So today's slashdot story about the bugs between gnome/kde and the ext4 file system highlight what happens when your try to reason about concurrent systems without a formal model.

For example, take a look at the rename(2) command, which has a pretty complete description:

rename() renames a file, moving it between directories if required.

Any other hard links to the file (as created using link(2)) are unaffected. Open file descriptors for oldpath are also unaffected.

If newpath already exists it will be atomically replaced (subject to a few conditions; see ERRORS below), so that there is no point at which another process attempting to access newpath will find it missing.

If oldpath and newpath are existing hard links referring to the same file, then rename() does nothing, and returns a success status.

If newpath exists but the operation fails for some reason rename() guarantees to leave an instance of newpath in place.

oldpath can specify a directory. In this case, newpath must either not exist, or it must specify an empty directory.

However, when overwriting there will probably be a window in which both oldpath and newpath refer to the file being renamed.

If oldpath refers to a symbolic link the link is renamed; if newpath refers to a symbolic link the link will be overwritten.

Ok. Now answer this question: is this sequence of events between 4 processes allowed:

P1

P2

P3

P4

Create a file 'head' write 'first' to it, close it




Create a file 'p1-newhead', content 'p1 wins'

Create a file 'p2-newhead', content 'p2 wins'





Reads the file 'head' => 'first'


Rename “p1-newhead” “head”

Rename “p2-newhead” “head”





Reads the file 'head' => 'p2-newhead'


Rename succeeds

Rename fails





Reads the file 'head' => 'p1-newhead'


See the problem: can P2's rename operation cause a temporarily visible effect, even though it eventually fails? Even thorough informal documentation of a trivial command isn't enough.

A real world effect of this is that making a database not loose transactions is actually rather more tricky in a real system than it looks on paper, generally because the components of a system will always trade correctness for performance. They found this at Livejournal way back in 2005.

Optimisations are always going to happen, but it would be nice to know when they are likely to break a system, especially as there are no easy ways to inject failures into such systems that allow them to be tested.

I think the only way is to start documenting the exact behaviour of the interfaces between, say the ext3 filesystem, mysql, or an IDE block device in a formal, machine readable way.

This way MySQL can say 'if the file system obeys these properties, then I'll provide this interface' and ext3 will say 'if the hard disk obeys these properties, then I'll provide this interface'. Once this is in place is should be easy to check that the properties provided by each layer are enough to satisfy the layer above.

It won't eliminate bugs, of course. What it will mean is that we can assemble systems without worrying 'is this safe to run over NFS', because there will be an actual model of the behaviour of the NFS server and some requirements imposed by the program to check against.

It isn't obvious which language to express the formal model in, but temporal logic would be an obvious choice. Could the valid behaviours be expressed in Ruby or Python? It'd be great if they could.