.. _packaging:
====================
Packaging User Guide
====================
So you want to package a Python application using ``PyOxidizer``? You've come
to the right place to learn how! Read on for all the details on how to
*oxidize* your Python application!
First, you'll need to install ``PyOxidizer``. See :ref:`installing` for
instructions.
Creating a PyOxidizer Project
=============================
The process for *oxidizing* every Python application looks the same: you
start by creating a new ``PyOxidizer`` configuration file via the
``pyoxidizer init-config-file`` command::
# Create a new configuration file in the directory "pyapp"
$ pyoxidizer init-config-file pyapp
Behind the scenes, ``PyOxidizer`` works by leveraging a Rust project to
build binaries embedding Python. The auto-generated project simply
instantiates and runs an embedded Python interpreter. If you would like
your built binaries to offer more functionality, you can create a minimal
Rust project to embed a Python interpreter and customize from there::
# Create a new Rust project for your application in ~/src/myapp.
$ pyoxidizer init-rust-project ~/src/myapp
The auto-generated configuration file and Rust project will alunch a Python
REPL by default. And the ``pyoxidizer`` executable will look in the current
directory for a ``pyoxidizer.bzl`` configuration file. Let's test that the
new configuration file or project works::
$ pyoxidizer run
...
Compiling pyapp v0.1.0 (/home/gps/src/pyapp)
Finished dev [unoptimized + debuginfo] target(s) in 53.14s
writing executable to /home/gps/src/pyapp/build/x86_64-unknown-linux-gnu/debug/exe/pyapp
>>>
If all goes according to plan, you just built a Rust executable which
contains an embedded copy of Python. That executable started an interactive
Python debugger on startup. Try typing in some Python code::
>>> print("hello, world")
hello, world
It works!
(To exit the REPL, press CTRL+d or CTRL+z or ``import sys; sys.exit(0)`` from
the REPL.)
.. note::
If you have built a Rust project before, the output from building a
``PyOxidizer`` application may look familiar to you. That's because under the
hood Cargo - Rust's package manager and build system - is doing a lot of the
work to build the application. If you are familiar with Rust development,
you can use ``cargo build`` and ``cargo run`` directly. However, Rust's
build system is only responsible for build binaries and some of the
higher-level functionality from ``PyOxidizer``'s configuration files (such
as application packaging) will likely not be performed unless tweaks are
made to the Rust project's ``build.rs``.
Now that we've got a new project, let's customize it to do something useful.
Packaging an Application from a PyPI Package
============================================
In this section, we'll show how to package the
`pyflakes `_ program using a published
PyPI package. (Pyflakes is a Python linter.)
First, let's create an empty project::
$ pyoxidizer init-config-file pyflakes
Next, we need to edit the :ref:`configuration file ` to tell
PyOxidizer about pyflakes. Open the ``pyflakes/pyoxidizer.bzl`` file in your
favorite editor.
Find the ``make_exe()`` function. This function returns a
:ref:`PythonExecutable ` instance which defines
a standalone executable containing Python. This function is a registered
*target*, which is a named entity that can be individually built or run.
By returning a ``PythonExecutable`` instance, this function/target is saying
*build an executable containing Python*.
The ``PythonExecutable`` type holds all state needed to package and run
a Python interpreter. This includes low-level interpreter configuration
settings to which Python resources (like source and bytecode modules)
are embedded in that executable binary. This type exposes an
:ref:`add_python_resources() `
method which adds an iterable of objects representing Python resources to the
set of embedded resources.
Elsewhere in this function, the ``dist`` variable holds an instance of
:ref:`PythonDistribution `. This type
represents a Python distribution, which is a fancy way of saying
*an implementation of Python*. In addition to defining the files
constituting that distribution, a ``PythonDistribution`` exposes
methods for performing Python packaging. One of those methods is
:ref:`pip_install() `,
which invokes ``pip install`` using that Python distribution.
To add a new Python package to our executable, we call
``dist.pip_install()`` then add the results to our ``PythonExecutable``
instance. This is done like so:
.. code-block:: python
exe.add_python_resources(dist.pip_install(["pyflakes==2.1.1"]))
The inner call to ``dist.pip_install()`` will effectively run
``pip install pyflakes==2.1.1`` and collect a set of installed
Python resources (like module sources and bytecode data) and return
that as an iterable data structure. The ``exe.add_python_resources()``
call will then embed these resources in the built executable binary.
Next, we tell PyOxidizer to run ``pyflakes`` when the interpreter is executed:
.. code-block:: python
run_eval="from pyflakes.api import main; main()",
This says to effectively run the Python code
``eval(from pyflakes.api import main; main())`` when the embedded interpreter
starts.
The new ``make_exe()`` function should look something like the following (with
comments removed for brevity):
.. code-block:: python
def make_exe():
dist = default_python_distribution()
config = PythonInterpreterConfig(
run_eval="from pyflakes.api import main; main()",
)
exe = dist.to_python_executable(
name="pyflakes",
config=config,
extension_module_filter="all",
include_sources=True,
include_resources=False,
include_test=False,
)
exe.add_python_resources(dist.pip_intsall(["pyflakes==2.1.1"]))
return exe
With the configuration changes made, we can build and run a ``pyflakes``
native executable::
# From outside the ``pyflakes`` directory
$ pyoxidizer run --path /path/to/pyflakes/project -- /path/to/python/file/to/analyze
# From inside the ``pyflakes`` directory
$ pyoxidizer run -- /path/to/python/file/to/analyze
# Or if you prefer the Rust native tools
$ cargo run -- /path/to/python/file/to/analyze
By default, ``pyflakes`` analyzes Python source code passed to it via
stdin.
What Can Go Wrong
=================
Ideally, packaging your Python application and its dependencies *just works*.
Unfortunately, we don't live in an ideal world.
PyOxidizer breaks various assumptions about how Python applications are
built and distributed. When attempting to package your application, you will
inevitably run into problems due to incompatibilities with PyOxidizer.
The :ref:`pitfalls` documentation can serve as a guide to identify and work
around these problems.
Packaging Additional Files
==========================
By default PyOxidizer will embed Python resources such as modules into
the compiled executable. This is the ideal method to produce distributable
Python applications because it can keep the entire application self-contained
to a single executable and can result in
:ref:`performance wins `.
But sometimes embedded resources into the binary isn't desired or doesn't
work. Fear not: PyOxidizer has you covered!
Let's give an example of this by attempting to package
`black `_, a Python code formatter.
We start by creating a new project::
$ pyoxidizer init-config-file black
Then edit the ``pyoxidizer.bzl`` file to have the following:
.. code-block:: python
def make_exe():
dist = default_python_distribution()
config = PythonInterpreterConfig(
run_module="black",
)
exe = dist.to_python_executable(
name="black",
)
exe.add_python_resources(dist.pip_intsall(["black==19.3b0"]))
return exe
Then let's attempt to build the application::
$ pyoxidizer build --path black
processing config file /home/gps/src/black/pyoxidizer.bzl
resolving Python distribution...
...
Looking good so far!
Now let's try to run it::
$ pyoxidizer run --path black
Traceback (most recent call last):
File "black", line 46, in
File "blib2to3.pygram", line 15, in
NameError: name '__file__' is not defined
SystemError
Uh oh - that's didn't work as expected.
As the error message shows, the ``blib2to3.pygram`` module is trying to
access ``__file__``, which is not defined. As explained by :ref:`no_file`,
``PyOxidizer`` doesn't set ``__file__`` for modules loaded from memory. This is
perfectly legal as Python doesn't mandate that ``__file__`` be defined. So
``black`` (and every other Python file assuming the existence of ``__file__``)
is arguably buggy.
Let's assume we can't easily change the offending source code to work around
the issue.
To fix this problem, we change the configuration file to install ``black``
relative to the built application. This requires changing our approach a
little. Before, we ran ``dist.pip_install()`` from ``make_exe()`` to collect
Python resources and added them to a ``PythonEmbeddedResources`` instance.
This meant those resources were embedded in the self-contained
``PythonExecutable`` instance returned from ``make_exe()``.
Our auto-generated ``pyoxidizer.bzl`` file also contains an ``install``
*target* defined by the ``make_install()`` function. This target produces
an ``FileManifest``, which represents a collection of relative files
and their content. When this type is *resolved*, those files are manifested
on the filesystem. To package ``black``'s Python resources next to our
executable instead of embedded within it, we need to move the ``pip_install()``
invocation from ``make_exe()`` to ``make_install()``.
Change your configuration file to look like the following:
.. code-block:: python
def make_python_dist():
return default_python_distribution()
def make_exe(dist):
python_config = PythonInterpreterConfig(
run_module="black",
sys_paths=["$ORIGIN/lib"],
)
return dist.to_python_executable(
name="black",
config=python_config,
extension_module_filter='all',
include_sources=True,
include_resources=False,
include_test=False,
)
def make_install(dist, exe):
files = FileManifest()
files.add_python_resource(".", exe)
files.add_python_resources("lib", dist.pip_install(["black==19.3b0"]))
return files
register_target("python_dist", make_python_dist)
register_target("exe", make_exe, depends=["python_dist"])
register_target("install", make_install, depends=["python_dist", "exe"], default=True)
resolve_targets()
There are a few changes here.
We added a new ``make_dist()`` function and ``python_dist`` *target* to
represent obtaining the Python distribution. This isn't strictly required,
but it helps avoid redundant work during execution.
The ``PythonInterpreterConfig`` construction adds a ``sys_paths=["$ORIGIN/lib"]``
argument. This argument says *adjust ``sys.path`` at run-time to include the
``lib`` directory next to the executable file*. It allows the Python
interpreter to import Python files on the filesystem instead of just from
memory.
The ``make_install()`` function/target has also gained a call to
``files.add_python_resources()``. This method call takes the Python resources
collected from running ``pip install black==19.3b0`` and adds them to the
``FileManifest`` instance under the ``lib`` directory. When the ``FileManifest``
is resolved, those Python resources will be manifested as files on the
filesystem (e.g. as ``.py`` and ``.pyc`` files).
With the new configuration in place, let's re-build the application::
$ pyoxidizer build --path black install
...
packaging application into /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug
purging /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug
copying /home/gps/src/black/build/target/x86_64-unknown-linux-gnu/debug/black to /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug/black
resolving packaging state...
installing resources into 1 app-relative directories
installing 46 app-relative Python source modules to /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug/lib
...
black packaged into /home/gps/src/black/build/apps/black/x86_64-unknown-linux-gnu/debug
If you examine the output, you'll see that various Python modules files were
written to the output directory, just as our configuration file requested!
Let's try to run the application::
$ pyoxidizer run --path black --target install
No paths given. Nothing to do 😴
Success!
Trimming Unused Resources
=========================
By default, packaging rules are very aggressive about pulling in
resources such as Python modules. For example, the entire Python standard
library is embedded into the binary by default. These extra resources take up
space and can make your binary significantly larger than it could be.
It is often desirable to *prune* your application of unused resources. For
example, you may wish to only include Python modules that your application
uses. This is possible with ``PyOxidizer``.
Essentially, all strategies for managing the set of packaged resources
boil down to crafting config file logic that chooses which resources
are packaged.
But maintaining explicit lists of resources can be tedious. ``PyOxidizer``
offers a more automated approach to solving this problem.
The :ref:`config_python_interpreter_config` type defines a
``write_modules_directory_env`` setting, which when enabled will instruct
the embedded Python interpreter to write the list of all loaded modules
into a randomly named file in the directory identified by the environment
variable defined by this setting. For example, if you set
``write_modules_directory_env="PYOXIDIZER_MODULES_DIR"`` and then
run your binary with ``PYOXIDIZER_MODULES_DIR=~/tmp/dump-modules``,
each invocation will write a ``~/tmp/dump-modules/modules-*`` file
containing the list of Python modules loaded by the Python interpreter.
One can therefore use ``write_modules_directory_env`` to produce files
that can be referenced in a different build *target* to filter resources
through a set of *only include* names.
TODO this functionality was temporarily dropped as part of the Starlark
port.
Adding Extension Modules At Run-Time
====================================
Normally, Python extension modules are compiled into the binary as part
of the embedded Python interpreter.
``PyOxidizer`` also supports providing additional extension modules at run-time.
This can be useful for larger Rust applications providing extension modules
that are implemented in Rust and aren't built through normal Python
build systems (like ``setup.py``).
If the ``PythonConfig`` Rust struct used to construct an embedded Python
interpreter contains a populated ``extra_extension_modules`` field, the
extension modules listed therein will be made available to the Python
interpreter.
Please note that Python stores extension modules in a global variable.
So instantiating multiple interpreters via the ``pyembed`` interfaces may
result in duplicate entries or unwanted extension modules being exposed to
the Python interpreter.
Masquerading As Other Packaging Tools
=====================================
Tools to package and distribute Python applications existed several
years before ``PyOxidizer``. Many Python packages have learned to perform
special behavior when the _fingerprint* of these tools is detected at
run-time.
First, ``PyOxidizer`` has its own fingerprint: ``sys.oxidized = True``. The
presence of this attribute can indicate an application running with
``PyOxidizer``. Other applications are discouraged from defining this
attribute.
Since ``PyOxidizer``'s run-time behavior is similar to other packaging
tools, ``PyOxidizer`` supports falsely identifying itself as these other
tools by emulating their fingerprints.
The ``EmbbedPythonConfig`` configuration section defines the
boolean flag ``sys_frozen`` to control whether ``sys.frozen = True``
is set. This can allow ``PyOxidizer`` to advertise itself as a *frozen*
application.
In addition, the ``sys_meipass`` boolean flag controls whether a
``sys._MEIPASS = `` attribute is set. This allows
``PyOxidizer`` to masquerade as having been built with PyInstaller.
.. warning::
Masquerading as other packaging tools is effectively lying and can
be dangerous, as code relying on these attributes won't know if
it is interacting with ``PyOxidizer`` or some other tool. It is
recommended to only set these attributes to unblock enabling
packages to work with ``PyOxidizer`` until other packages learn to
check for ``sys.oxidized = True``. Setting ``sys._MEIPASS`` is
definitely the more risky option, as a case can be made that
PyOxidizer should set ``sys.frozen = True`` by default.