.. _pitfalls: ================== Packaging Pitfalls ================== While PyOxidizer is capable of building fully self-contained binaries containing a Python application, many Python packages and applications make assumptions that don't hold inside PyOxidizer. This section talks about all the things that can go wrong when attempting to package a Python application. .. _no_file: Reliance on ``__file__`` ======================== Python modules typically have a ``__file__`` attribute that defines the path of the file from which the module was loaded. (When a file is executed as a script, it masquerades as the ``__main__`` module, so non-module scripts can behave as modules too.) It is relatively common for Python modules in the wild to use ``__file__``. For example, modules may do something like ``module_dir = os.path.abspath(os.path.dirname(__file__))`` to locate the directory that a module is in so they can load a non-Python file from that directory. Or they may use ``__file__`` to resolve paths to Python source files so that they can be loaded outside the typical ``import`` based mechanism (various plugin systems do this, for example). Strictly speaking, the ``__file__`` attribute on modules is not required. Therefore any Python code that requires the existence of ``__file__`` is either broken or has made an explicit choice to not support module loaders - like PyOxidizer - that don't store modules as files and may not set ``__file__``. **Therefore required use of __file__ is highly discouraged.** It is recommended to instead use a *resources* API for loading *resource* data relative to a Python module and to fall back to ``__file__`` if a suitable API is unavailable or doesn't work. See the next section for more. Resource Reading ================ Many Python application need to load *resources*. *Resources* are typically non-Python *support* files, such as images, config files, etc. In some cases, *resources* could be Python source or bytecode files. For example, many plugin systems load Python modules outside the context of the normal ``import`` mechanism and therefore treat standalone Python source/bytecode files as non-module *resources*. PyOxidizer can break existing resource reading code by invalidating assumptions about where resources are located. Historically, resources almost always translate to individual paths on the filesystem. One can use ``__file__`` to derive the path to a resource file and ``open()`` the file. So there is a lot of code in the wild that relies on ``__file__`` for this use case. .. important:: Use of ``__file__`` will not work for in-memory resources in PyOxidizer applications and Python code will need to use a resource reading API to access resources data within the binary. Depending on your need to support Python versions older than 3.7, the solution may or may not be simple. That's because for most of its lifetime, Python hasn't had a robust story for loading *resource* data. ``pkg_resources`` was the recommended solution for a while. Python 3 introduced the `ResourceLoader `_ interface on module loaders. But this interface became deprecated in Python 3.7 in favor of the `ResourceReader `_ interface and associated APIs in the `importlib.resources module `_ But even the modern ``ResourceReader`` interface isn't perfect, as some of its behavior is `seemingly inconsistent `_. ``ResourceReader`` is the best interface for importing non-module *resource* data to date. Unfortunately, that API requires Python 3.7. And a lot of the Python universe hasn't yet fully adopted Python 3.7 and its APIs. This means that Python projects in the wild tend to target the *lowest common denominator* for loading *resource* data. And this solution tends to be to rely on ``__file__`` (directly or abstracted away) for deriving paths to things because ``__file__`` has worked nearly everywhere for seemingly forever. .. important:: PyOxidizer supports the `ResourceReader `_ interface on module loaders and highly encourages Python libraries and applications to adopt it as the preferred mechanism for loading resources data. Let's talk about what this means in practice. Say you have resources next to a Python module. Legacy code in a module might do something like the following: .. code-block:: python def get_resource(name): """Return a file handle on a named resource next to this module.""" module_dir = os.path.abspath(os.path.dirname(__file__)) # Warning: there is a path traversal attack possible here if # name continues values like ../../../../../etc/password. resource_path = os.path.join(module_dir, name) return open(resource_path, 'rb') Modern code targeting Python 3.7+ can use the `ResourceReader` API directly: .. code-block:: python def get_resource(name): """Return a file handle on a named resource next to this module.""" # get_resource_reader() may not exist or may return None, which this # code doesn't handle. reader = __loader__.get_resource_reader(__name__) return reader.open_resource(name) Alternatively, you can use the functions in `importlib.resources `_: .. code-block:: python import importlib.resources with importlib.resources.open_binary('mypackage', 'resource-name') as fh: data = fh.read() The ``importlib.resources`` functions are glorified wrappers around the low-level interfaces on module loaders. But they do provide some useful functionality, such as additional error checking and automatic importing of modules, making them useful in many scenarios, especially when loading resources outside the current package/module. See the `importlib_resources documentation site `_ for more. ``ResourceReader`` and ``importlib.resources`` were introduced in Python 3.7. So if you want your code to remain compatible with older Python versions, you will need to write an abstraction for obtaining resources. Try something like the following: .. code-block:: python import importlib try: import importlib.resources # Defeat lazy module importers. importlib.resources.open_binary HAVE_RESOURCE_READER = True except ImportError: HAVE_RESOURCE_READER = False try: import pkg_resources # Defeat lazy module importers. pkg_resources.resource_stream HAVE_PKG_RESOURCES = True except ImportError: HAVE_PKG_RESOURCES = False def get_resource(package, resource): """Return a file handle on a named resource in a Package.""" # Prefer ResourceReader APIs, as they are newest. if HAVE_RESOURCE_READER: # If we're in the context of a module, we could also use # ``__loader__.get_resource_reader(__name__).open_resource(resource)``. # We use open_binary() because it is simple. return importlib.resources.open_binary(package, resource) # Fall back to pkg_resources. if HAVE_PKG_RESOURCES: return pkg_resources.resource_stream(package, resource) # Fall back to __file__. # We need to first import the package so we can find its location. # This could raise an exception! mod = importlib.import_module(package) # Undefined __file__ will raise NameError on variable access. try: package_path = os.path.abspath(os.path.dirname(mod.__file__)) except NameError: package_path = None if package_path is not None: # Warning: there is a path traversal attack possible here if # resource contains values like ../../../../etc/password. Input # must be trusted or sanitized before blindly opening files or # you may have a security vulnerability! resource_path = os.path.join(package_path, resource) return open(resource_path, 'rb') # Could not resolve package path from __file__. raise Exception('do not know how to load resource: %s:%s' % ( package, resource)) (The above code is dedicated to the public domain and can be used without attribution.) The above code is just a demonstration. It may *just work* for your needs. It may need additional tweaking. The state of resource management in Python has historically been a mess. So don't be surprised if you need to modify code to support the modern resource interfaces. But this effort should be well spent, as the new resource APIs are hopefully the most future compatible. And, using them will enable applications built with PyOxidizer to import resources data from memory! .. _pitfall_extension_modules: C and Other Native Extension Modules ==================================== Many Python packages compile *extension modules* to native code. (Typically C is used to implement extension modules.) The way this typically works is some build system (often ``distutils`` via a ``setup.py`` script) produces a shared library file containing the extension. On Linux and macOS, the file extension is typically ``.so``. On Windows, it is ``.pyd``. Python's importing mechanism looks for these files in addition to normal ``.py`` and ``.pyc`` files when an ``import`` is requested. PyOxidizer currently has :ref:`limited support ` for extension modules. Under some circumstances, building extension modules as part of regular package build machinery *just works* and the resulting extension module can be embedded in the produced binary. The way PyOxidizer achieves this is a bit crude, but effective. When PyOxidizer invokes ``pip`` or ``setup.py`` to build a package, it installs a modified version of ``distutils`` into the invoked Python's ``sys.path``. This modified ``distutils`` changes the behavior of some key build steps (notably how C extensions are built) such that the build emits artifacts that PyOxidizer can use to integrate the extension module into a custom binary. For example, on Linux, PyOxidizer copies the intermediate object files produced by the build and links them into the same binary containing Python: PyOxidizer completely ignores the shared library that is or would typically be produced. If ``setup.py`` scripts are following the traditional pattern of using `distutils.core.Extension `_ to define extension modules, things tend to *just work* (assuming extension modules are supported by PyOxidizer for the target platform). However, if ``setup.py`` scripts are doing their own monkeypatching of ``distutils``, rely on custom build steps or types to compile extension modules, or invoke separate Python processes to interact with ``distutils``, things may break. If you run into an extension module packaging problem that isn't recorded here or on the :ref:`static page `, please `file an issue `_ so it may be tracked. Identifying PyOxidizer ====================== Python code may want to know whether it is running in the context of PyOxidizer. At packaging time, ``pip`` and ``setup.py`` invocations made by PyOxidizer should set a ``PYOXIDIZER=1`` environment variable. ``setup.py`` scripts, etc can look for this environment variable to determine if they are being packaged by PyOxidizer. At run-time, PyOxidizer will always set a ``sys.oxidized`` attribute with value ``True``. So, Python code can test whether it is running in PyOxidizer like so:: import sys if getattr(sys, 'oxidized', False): print('running in PyOxidizer!')