Packaging User Guide

So you want to package a Python application using PyOxidizer? You’ve come to the right place to learn how! Read on for all the details on how to oxidize your Python application!

First, you’ll need to install PyOxidizer. See Installing for instructions.

Creating a PyOxidizer Project

The process for oxidizing every Python application looks the same: you start by creating a new PyOxidizer configuration file via the pyoxidizer init-config-file command:

# Create a new configuration file in the directory "pyapp"
$ pyoxidizer init-config-file pyapp

Behind the scenes, PyOxidizer works by leveraging a Rust project to build binaries embedding Python. The auto-generated project simply instantiates and runs an embedded Python interpreter. If you would like your built binaries to offer more functionality, you can create a minimal Rust project to embed a Python interpreter and customize from there:

# Create a new Rust project for your application in ~/src/myapp.
$ pyoxidizer init-rust-project ~/src/myapp

The auto-generated configuration file and Rust project will alunch a Python REPL by default. And the pyoxidizer executable will look in the current directory for a pyoxidizer.bzl configuration file. Let’s test that the new configuration file or project works:

$ pyoxidizer run
...
   Compiling pyapp v0.1.0 (/home/gps/src/pyapp)
    Finished dev [unoptimized + debuginfo] target(s) in 53.14s
writing executable to /home/gps/src/pyapp/build/x86_64-unknown-linux-gnu/debug/exe/pyapp
>>>

If all goes according to plan, you just built a Rust executable which contains an embedded copy of Python. That executable started an interactive Python debugger on startup. Try typing in some Python code:

>>> print("hello, world")
hello, world

It works!

(To exit the REPL, press CTRL+d or CTRL+z or import sys; sys.exit(0) from the REPL.)

Note

If you have built a Rust project before, the output from building a PyOxidizer application may look familiar to you. That’s because under the hood Cargo - Rust’s package manager and build system - is doing a lot of the work to build the application. If you are familiar with Rust development, you can use cargo build and cargo run directly. However, Rust’s build system is only responsible for build binaries and some of the higher-level functionality from PyOxidizer’s configuration files (such as application packaging) will likely not be performed unless tweaks are made to the Rust project’s build.rs.

Now that we’ve got a new project, let’s customize it to do something useful.

Packaging an Application from a PyPI Package

In this section, we’ll show how to package the pyflakes program using a published PyPI package. (Pyflakes is a Python linter.)

First, let’s create an empty project:

$ pyoxidizer init-config-file pyflakes

Next, we need to edit the configuration file to tell PyOxidizer about pyflakes. Open the pyflakes/pyoxidizer.bzl file in your favorite editor.

Find the make_exe() function. This function returns a PythonExecutable instance which defines a standalone executable containing Python. This function is a registered target, which is a named entity that can be individually built or run. By returning a PythonExecutable instance, this function/target is saying build an executable containing Python.

The PythonExecutable type holds all state needed to package and run a Python interpreter. This includes low-level interpreter configuration settings to which Python resources (like source and bytecode modules) are embedded in that executable binary. This type exposes an add_python_resources() method which adds an iterable of objects representing Python resources to the set of embedded resources.

Elsewhere in this function, the dist variable holds an instance of PythonDistribution. This type represents a Python distribution, which is a fancy way of saying an implementation of Python. In addition to defining the files constituting that distribution, a PythonDistribution exposes methods for performing Python packaging. One of those methods is pip_install(), which invokes pip install using that Python distribution.

To add a new Python package to our executable, we call dist.pip_install() then add the results to our PythonExecutable instance. This is done like so:

exe.add_python_resources(dist.pip_install(["pyflakes==2.1.1"]))

The inner call to dist.pip_install() will effectively run pip install pyflakes==2.1.1 and collect a set of installed Python resources (like module sources and bytecode data) and return that as an iterable data structure. The exe.add_python_resources() call will then embed these resources in the built executable binary.

Next, we tell PyOxidizer to run pyflakes when the interpreter is executed:

run_eval="from pyflakes.api import main; main()",

This says to effectively run the Python code eval(from pyflakes.api import main; main()) when the embedded interpreter starts.

The new make_exe() function should look something like the following (with comments removed for brevity):

def make_exe():
    dist = default_python_distribution()

    config = PythonInterpreterConfig(
        run_eval="from pyflakes.api import main; main()",
    )

    exe = dist.to_python_executable(
        name="pyflakes",
        config=config,
        extension_module_filter="all",
        include_sources=True,
        include_resources=False,
        include_test=False,
    )

    exe.add_python_resources(dist.pip_intsall(["pyflakes==2.1.1"]))

    return exe

With the configuration changes made, we can build and run a pyflakes native executable:

# From outside the ``pyflakes`` directory
$ pyoxidizer run --path /path/to/pyflakes/project -- /path/to/python/file/to/analyze

# From inside the ``pyflakes`` directory
$ pyoxidizer run -- /path/to/python/file/to/analyze

# Or if you prefer the Rust native tools
$ cargo run -- /path/to/python/file/to/analyze

By default, pyflakes analyzes Python source code passed to it via stdin.

What Can Go Wrong

Ideally, packaging your Python application and its dependencies just works. Unfortunately, we don’t live in an ideal world.

PyOxidizer breaks various assumptions about how Python applications are built and distributed. When attempting to package your application, you will inevitably run into problems due to incompatibilities with PyOxidizer.

The Packaging Pitfalls documentation can serve as a guide to identify and work around these problems.

Packaging Additional Files

By default PyOxidizer will embed Python resources such as modules into the compiled executable. This is the ideal method to produce distributable Python applications because it can keep the entire application self-contained to a single executable and can result in performance wins.

But sometimes embedded resources into the binary isn’t desired or doesn’t work. Fear not: PyOxidizer has you covered!

Let’s give an example of this by attempting to package black, a Python code formatter.

We start by creating a new project:

$ pyoxidizer init-config-file black

Then edit the pyoxidizer.bzl file to have the following:

def make_exe():
    dist = default_python_distribution()

    config = PythonInterpreterConfig(
        run_module="black",
    )

    exe = dist.to_python_executable(
        name="black",
    )

    exe.add_python_resources(dist.pip_intsall(["black==19.3b0"]))

    return exe

Then let’s attempt to build the application:

$ pyoxidizer build --path black
processing config file /home/gps/src/black/pyoxidizer.bzl
resolving Python distribution...
...

Looking good so far!

Now let’s try to run it:

$ pyoxidizer run --path black
Traceback (most recent call last):
  File "black", line 46, in <module>
  File "blib2to3.pygram", line 15, in <module>
NameError: name '__file__' is not defined
SystemError

Uh oh - that’s didn’t work as expected.

As the error message shows, the blib2to3.pygram module is trying to access __file__, which is not defined. As explained by Reliance on __file__, PyOxidizer doesn’t set __file__ for modules loaded from memory. This is perfectly legal as Python doesn’t mandate that __file__ be defined. So black (and every other Python file assuming the existence of __file__) is arguably buggy.

Let’s assume we can’t easily change the offending source code to work around the issue.

To fix this problem, we change the configuration file to install black relative to the built application. This requires changing our approach a little. Before, we ran dist.pip_install() from make_exe() to collect Python resources and added them to a PythonEmbeddedResources instance. This meant those resources were embedded in the self-contained PythonExecutable instance returned from make_exe().

Our auto-generated pyoxidizer.bzl file also contains an install target defined by the make_install() function. This target produces an FileManifest, which represents a collection of relative files and their content. When this type is resolved, those files are manifested on the filesystem. To package black’s Python resources next to our executable instead of embedded within it, we need to move the pip_install() invocation from make_exe() to make_install().

Change your configuration file to look like the following:

def make_python_dist():
    return default_python_distribution()

def make_exe(dist):
    python_config = PythonInterpreterConfig(
        run_module="black",
        sys_paths=["$ORIGIN/lib"],
    )

    return dist.to_python_executable(
        name="black",
        config=python_config,
        extension_module_filter='all',
        include_sources=True,
        include_resources=False,
        include_test=False,
    )


def make_install(dist, exe):
    files = FileManifest()

    files.add_python_resource(".", exe)

    files.add_python_resources("lib", dist.pip_install(["black==19.3b0"]))

    return files

register_target("python_dist", make_python_dist)
register_target("exe", make_exe, depends=["python_dist"])
register_target("install", make_install, depends=["python_dist", "exe"], default=True)

resolve_targets()

There are a few changes here.

We added a new make_dist() function and python_dist target to represent obtaining the Python distribution. This isn’t strictly required, but it helps avoid redundant work during execution.

The PythonInterpreterConfig construction adds a sys_paths=["$ORIGIN/lib"] argument. This argument says adjust ``sys.path`` at run-time to include the ``lib`` directory next to the executable file. It allows the Python interpreter to import Python files on the filesystem instead of just from memory.

The make_install() function/target has also gained a call to files.add_python_resources(). This method call takes the Python resources collected from running pip install black==19.3b0 and adds them to the FileManifest instance under the lib directory. When the FileManifest is resolved, those Python resources will be manifested as files on the filesystem (e.g. as .py and .pyc files).

With the new configuration in place, let’s re-build the application:

$ pyoxidizer build --path black install
...
packaging application into /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug
purging /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug
copying /home/gps/src/black/build/target/x86_64-unknown-linux-gnu/debug/black to /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug/black
resolving packaging state...
installing resources into 1 app-relative directories
installing 46 app-relative Python source modules to /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug/lib
...
black packaged into /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug

If you examine the output, you’ll see that various Python modules files were written to the output directory, just as our configuration file requested!

Let’s try to run the application:

$ pyoxidizer run --path black --target install
No paths given. Nothing to do 😴

Success!

Trimming Unused Resources

By default, packaging rules are very aggressive about pulling in resources such as Python modules. For example, the entire Python standard library is embedded into the binary by default. These extra resources take up space and can make your binary significantly larger than it could be.

It is often desirable to prune your application of unused resources. For example, you may wish to only include Python modules that your application uses. This is possible with PyOxidizer.

Essentially, all strategies for managing the set of packaged resources boil down to crafting config file logic that chooses which resources are packaged.

But maintaining explicit lists of resources can be tedious. PyOxidizer offers a more automated approach to solving this problem.

The PythonInterpreterConfig(...)` type defines a write_modules_directory_env setting, which when enabled will instruct the embedded Python interpreter to write the list of all loaded modules into a randomly named file in the directory identified by the environment variable defined by this setting. For example, if you set write_modules_directory_env="PYOXIDIZER_MODULES_DIR" and then run your binary with PYOXIDIZER_MODULES_DIR=~/tmp/dump-modules, each invocation will write a ~/tmp/dump-modules/modules-* file containing the list of Python modules loaded by the Python interpreter.

One can therefore use write_modules_directory_env to produce files that can be referenced in a different build target to filter resources through a set of only include names.

TODO this functionality was temporarily dropped as part of the Starlark port.

Adding Extension Modules At Run-Time

Normally, Python extension modules are compiled into the binary as part of the embedded Python interpreter.

PyOxidizer also supports providing additional extension modules at run-time. This can be useful for larger Rust applications providing extension modules that are implemented in Rust and aren’t built through normal Python build systems (like setup.py).

If the PythonConfig Rust struct used to construct an embedded Python interpreter contains a populated extra_extension_modules field, the extension modules listed therein will be made available to the Python interpreter.

Please note that Python stores extension modules in a global variable. So instantiating multiple interpreters via the pyembed interfaces may result in duplicate entries or unwanted extension modules being exposed to the Python interpreter.

Masquerading As Other Packaging Tools

Tools to package and distribute Python applications existed several years before PyOxidizer. Many Python packages have learned to perform special behavior when the _fingerprint* of these tools is detected at run-time.

First, PyOxidizer has its own fingerprint: sys.oxidized = True. The presence of this attribute can indicate an application running with PyOxidizer. Other applications are discouraged from defining this attribute.

Since PyOxidizer’s run-time behavior is similar to other packaging tools, PyOxidizer supports falsely identifying itself as these other tools by emulating their fingerprints.

The EmbbedPythonConfig configuration section defines the boolean flag sys_frozen to control whether sys.frozen = True is set. This can allow PyOxidizer to advertise itself as a frozen application.

In addition, the sys_meipass boolean flag controls whether a sys._MEIPASS = <exe directory> attribute is set. This allows PyOxidizer to masquerade as having been built with PyInstaller.

Warning

Masquerading as other packaging tools is effectively lying and can be dangerous, as code relying on these attributes won’t know if it is interacting with PyOxidizer or some other tool. It is recommended to only set these attributes to unblock enabling packages to work with PyOxidizer until other packages learn to check for sys.oxidized = True. Setting sys._MEIPASS is definitely the more risky option, as a case can be made that PyOxidizer should set sys.frozen = True by default.