OxidizedFinder Behavior and Compliance

OxidizedFinder strives to be as compliant as possible with other meta path importers. So generally speaking, the behavior as described by the importlib documentation should be compatible. In other words, things should mostly just work and any deviance from the importlib documentation constitutes a bug worth reporting.

That being said, OxidizedFinder’s approach to loading resources is drastically different from more traditional means, notably loading files from the filesystem. oxidized_finder breaks a lot of assumptions about how things have worked in Python and there is some behavior that may seem odd or in violation of documented behavior in Python.

The sections below attempt to call out known areas where OxidizedFinder deviates from typical behavior.

__file__ and __cached__ Module Attributes

Python modules typically have a __file__ attribute holding a str defining the filesystem path the source module was imported from (usually a path to a .py file). There is also the similar - but lesser known - __cached__ attribute holding the filesystem path of the bytecode module (usually the path to a .pyc file).

Important

OxidizedFinder will not set either attribute when importing modules from memory.

These attributes are not set because it isn’t obvious what the values should be! Typically, __file__ is used by Python as an anchor point to derive the path to some other file. However, when loading modules from memory, the traditional filesystem hierarchy of Python modules does not exist. In the opinion of PyOxidizer’s maintainer, exposing __file__ would be lying and this would cause more potential for harm than good.

While we may make it possible to define __file__ (and __cached__) on modules imported from memory someday, we do not yet support this.

OxidizedFinder does, however, set __file__ and __cached__ on modules imported from the filesystem. So, a workaround to restore these missing attributes is to avoid in-memory loading.

Note

Use of __file__ is commonly encountered in code loading resource files. See Loading Resource Files for more on this topic, including how to port code to more modern Python APIs for loading resources.

__path__ Module Attribute

Python modules that are also packages must have a __path__ attribute containing an iterable of str. The iterable can be empty.

If a module is imported from the filesystem, OxidizedFinder will set __path__ to the parent directory of the module’s file, just like the standard filesystem importer would.

If a module is imported from memory, __path__ will be set to the path of the current executable joined with the package name. e.g. if the current executable is /usr/bin/myapp and the module/package name is foo.bar, __path__ will be ["/usr/bin/myapp/foo/bar"]. On Windows, paths might look like C:\dev\myapp.exe\foo\bar.

Python’s zipimport importer uses the same approach for modules imported from zip files, so there is precedence for OxidizedFinder doing things this way.

Support for __init__ in Module Names

There exists Python code that does things like from .__init__ import X.

__init__ is special in Python module names because it is the filename used to denote a Python package’s filename. So syntax like from .__init__ import X is probably intended to be equivalent to from . import X. Or import foo.__init__ is probably intended to be written as import foo.

Python’s filesystem importer doesn’t treat __init__ in module names as special. If you attempt to import a module named foo.__init__, it will attempt to locate a file named foo/__init__.py. If that module is a package, this will succeed. However, the module name seen by the importer has __init__ in it and the name on the created module object will have __init__ in it. This means that you can have both a module foo and foo.__init__. These will both be derived from the same file but are actually separate module objects.

PyOxidizer will automatically remove trailing .__init__ from module names. This will enable PyOxidizer to work with syntax such as import foo.__init__ and from .__init__ import X and therefore be compatible with Python code in the wild. However, PyOxidizer may not preserve the .__init__ in the module name. For example, with Python’s path based importer, you could have both foo and foo.__init__ in sys.modules but PyOxidizer will only have foo.

A limitation of PyOxidizer module name normalization is it only normalizes the single trailing .__init__ from the module name: __init__ appearing inside the module name are not normalized. e.g. foo.__init__.bar is not normalized to foo.bar. This may introduce incompatibilities with Python code in the wild. However, for this to be true, the filesystem layout would have to be something like foo/__init__/bar.py. This hopefully does not occur in the wild. But it is conceivable it does.

See https://github.com/indygreg/PyOxidizer/issues/317 and https://bugs.python.org/issue42564 for more discussion on this issue.

ResourceReader Compatibility

ResourceReader has known compatibility differences with Python’s default filesystem-based importer. See Support for ResourceReader for details.

ResourceLoader Compatibility

The ResourceLoader interface is implemented but behavior of get_data(path) has some variance with Python’s filesystem-based importer.

See Support for ResourceLoader for details.

Note

ResourceLoader is deprecated as of Python 3.7. Code should be ported to ResourceReader / importlib.resources if possible.

importlib.metadata Compatibility

OxidizedFinder implements find_distributions() and therefore provides the required hook for importlib.metadata to resolve Distribution instances. However, the returned objects do not implement the full Distribution interface.

Here are the known differences between OxidizedDistribution and importlib.metadata.Distribution instances:

  • OxidizedDistribution is not an instance of importlib.metadata.Distribution.

  • locate_file() is not defined.

  • @staticmethod at() is not defined.

  • @property files raises NotImplementedError.

There are additional _ prefixed attributes of importlib.metadata.Distribution that are not implemented. But we do not consider these part of the public API and don’t feel they are worth calling out.

In addition, OxidizedFinder.find_distributions() ignores the path attribute of the passed Context instance. Only the name attribute is consulted. If name is None, all packages with registered distribution files will be returned. Otherwise the returned list contains at most 1 PyOxidizerDistribution corresponding to the requested package name.

pkgutil Compatibility

The pkgutil package in Python’s standard library reacts to special functionality on MetaPathFinder instances.

pkgutil.iter_modules() attempts to use an iter_modules() method to obtain results.

OxidizedFinder implements iter_modules(prefix="") and pkgutil.iter_modules() should work. However, there are some differences in behavior:

  • iter_modules() is defined to be a generator but OxidizedFinder.iter_modules() returns a list. list is iterable and this difference should hopefully be a harmless implementation detail.

  • Support for the path argument to pkgutil.iter_modules() requires that OxidizedFinder’s path_hook is installed in sys.path_hooks. This will be done automatically if OxidizedFinder is installed at interpreter initialization time.

Paths Hooks Compatibility

The OxidizedFinder.path_hook method from an instantiated instance can be installed on sys.path_hooks to enable a OxidizedFinder to function as a path entry finder.

As a brief refresher, callables on sys.path_hooks are called with paths, giving them the opportunity to service a particular path. If a path hook responds to a path by returning a path entry finder, that returned object will service that path. Often, the paths passed to path hooks are from sys.path. However, arbitrary paths can be passed in. A property of the returned path entry finder is it only targets a particular level in the package hierarchy. Unlike meta path finders (which can service any named resource it knows about), path entry finders are bound to a specific package target level and will only return resources existing at that level.

path hooks are used by the following mechanisms:

  • The standard library PathFinder (the meta path finder that Python uses to load resources from the filesystem) uses sys.path_hooks as part of resolving a finder for a given sys.path entry.

  • pkgutil.get_importer() for resolving the finder for a given sys.path entry. This in turn is used by various code, including other pkgutil APIs.

  • pkg_resources maps path entry finder types to functions to enable a resolution of pkg_resources.Distribution instances for individual paths.

When installed on sys.path_hooks, OxidizedFinder.path_hook will respond to the following path values:

Important

path_hook is very strict about what values it will respond to.

The value must be a str and be equal to OxidizedFinder.path_hook_base_str or have OxidizedFinder.path_hook_base_str plus a directory separator as the exact string prefix.

path_hook will not respond to bytes, pathlib.Path, or any other path-like type.

OxidizedFinder.path_hook_base_str may not be the same value as sys.executable. Always use OxidizedFinder.path_hook_base_str to derive sys.path values to ensure the path hook will respond.

When path_hook is called with its OxidizedFinder.path_hook_base_str value, a OxidizedPathEntryFinder bound to the source OxidizedFinder is returned. This finder is able to service root resources (i.e. top-level modules and packages).

When path_hook is called with a virtual sub-directory of OxidizedFinder.path_hook_base_str, the same thing happens except the returned OxidizedPathEntryFinder will only service resources at the exact package hierarchy specified by that virtual sub-directory.

The validation and normalization of path values is similar to the following:

def path_hook(self, path: str):
    # Path exactly matching current_exe will be bound to resources at root.
    if path == self.path_hook_base_str:
        return ...

    # Virtual sub-directories must begin with self.current_exe + directory
    # separator.
    if not path.startswith((self.path_hook_base_str + "/", self.path_hook_base_str + "\\")):
        raise ImportError

    # Part after directory separator.
    package_part = path[len(self.path_hook_base_str) + 1:]

    # Normalize to UNIX style directory separators, allowing Windows
    # separators to exist.
    package_part = package_part.replace("\\", "/")

    # Ban leading, trailing, and consecutive directory separators.
    if package_part.startswith("/") or package_part.endswith("\\") or package_part.contains("//"):
        raise ImportError()

    # Ban dots in directory components.
    for part in package_part.split("/"):
        if part.startswith(".") or part.endswith(".") or part.contains(".."):
            raise ImportError()

    # Normalize directory tree to package hierarchy. e.g. foo/bar -> foo.bar.
    package = package_part.replace("/", ".")

    # When converting the package string to a Rust string to facilitate
    # resource name comparisons, it is encoded to UTF-8, replacing
    # "bad" code points with the Unicode replacement code point.
    rust_package_string = package.encode("utf-8", "replace")

Note that when the package component of virtual sub-directories is converted to a Rust string, we use the UTF-8 encoding, not Python’s active filesystem encoding. This is to keep things simpler. And since OxidizedFinder indexes resource names using Rust’s UTF-8 backed string type anyway, this seems semantically correct from the perspective of oxidized_importer.

As an example, if path were os.path.join(finder.path_hook_base_str, "a"), the finder would only service modules of the form a.*. So a, a.b would match but a.b.c and d would not.

For best results, use os.path.join(finder.path_hook_base_str, str) to define values that will be accepted by the path hook.

OxidizedPathEntryFinder complies with the PathEntryFinder protocol and implements OxidizedPathEntryFinder.find_spec() and OxidizedPathEntryFinder.invalidate_caches(). However, support for the deprecated methods find_loader and find_module is not implemented. Instances also implement OxidizedPathEntryFinder.iter_modules(), enabling it to be used by pkgutil.iter_modules().

pkg_resources Compatibility

OxidizedFinder can be registered as a provider for pkg_resources, enabling pkg_resources APIs to be used with resources tracked by OxidizedFinder instances.

However, there are known compatibility differences. See Support for pkg_resources for more.