unwrapping wrappers

elliot

August 25, 2022

I started writing this on Thursday and got distracted indulging my inner eukaryote. I was hung up on interface design again, so I dove into coding a working example then went out to live a little outside the house, all of which resulted in a backlog of thoughts that I'm finally publishing now on Tuesday because this is a time traveling blog.

My first idea for building a libnbd IO manager for libext2fs was for pyext2fs to expose an IOManager type that could be subclassed from python, where I'd implement the IO manager interface through methods (presumably using the libnbd python bindings). Then I'd have an internal glue structure that exposes the callbacks libext2fs is expecting, which would each CallFunction() on a stored pointer to a concrete IOManager.

The first hiccup there is that io_manager.open(), and subsequently ext2fs_open(), doesn't accept user data, which means I have no way of storing that pointer to do the dispatching from within the callbacks. I could still pass in an object to the high level ext2fs.open() that I would use to fiddle with a global type pointer and have the callbacks dispatch through there. I've seen this in the pyalpm wrapper for instance, where callbacks specified by the public portions of the python module are stored in a global lookup table. In the case of libalpm those callback functions do accept a data pointer, so that design isn't as necessary there as it is here. But even though I could make the argument of necessity here, I'm still shying away from that statefulness. I'm worried that shared state might prevent me from reading from one filesystem backed by one type of storage and writing to another backed by a different type of storage, or at least make it more difficult. Though not a high priority use case at the moment, the approach sounds too limiting from the get go. I don't think I'm even considering locking access to that pointer as an option, I think I'm motivated to seek out an alternative.

Another thought was to implement an upstream ext2fs_open3() that does accept user data to be handed to io_manager.open(). That would break any extant custom IO managers out in the world so in that approach I think I'd need to amend that structure to support an additional open2(), and re-implement unix_io_manager.open() to call open2(..., NULL). If I had that then I could pass a python object to store in the io_channel that the io_manager.open2() implementation allocates, as I first had in mind.

I tossed around ideas of separate library packages, like libext2io and pyext2io, but eventually dove in by adding the IO manager to pyext2fs itself. That means that package now has a dependency on libnbd, but I think I have a rough outline for iterating away from there. I modeled the opening API after the engine URIs in sqlalchemy. By default the library assumes a file scheme, which dispatches to the "unix" IO manager provided by libext2fs.

import ext2fs

# equivalent to un-qualified "path/to/image.ext2fs"
with ext2fs.open("file:path/to/image.ext2fs") as fs:
    with fs.open("/a/file") as fin:
        print(fin.read(1024))

The NBD IO manager uses nbd_connect_uri() under the hood, and any URIs with supported schemes are passed there. So if only the location of the image changes then only the URI has to change:

import ext2fs

with ext2fs.open("nbd+unix:///?socket=nbd.sock") as fs:
    with fs.open("/a/file") as fin:
        print(fin.read(1024))

From here I think I'll move the NBD parts to a separate pye2nbd. He'll need a dependency on pyext2fs, which will need to expose a factory to wrap the opener that pye2nbd implements. Then pyext2fs can attempt an import of the "engine"-specific IO manager package, and that can be where I dictate and serialize the URI scheme mapping. And then, finally, e2http will depend on both of those packages.