Technical Notes

CPython Initialization

Most code lives in pylifecycle.c.

Call tree with Python 3.7:

``Py_Initialize()``
  ``Py_InitializeEx()``
    ``_Py_InitializeFromConfig(_PyCoreConfig config)``
      ``_Py_InitializeCore(PyInterpreterState, _PyCoreConfig)``
        Sets up allocators.
        ``_Py_InitializeCore_impl(PyInterpreterState, _PyCoreConfig)``
          Does most of the initialization.
          Runtime, new interpreter state, thread state, GIL, built-in types,
          Initializes sys module and sets up sys.modules.
          Initializes builtins module.
          ``_PyImport_Init()``
            Copies ``interp->builtins`` to ``interp->builtins_copy``.
          ``_PyImportHooks_Init()``
            Sets up ``sys.meta_path``, ``sys.path_importer_cache``,
            ``sys.path_hooks`` to empty data structures.
          ``initimport()``
            ``PyImport_ImportFrozenModule("_frozen_importlib")``
            ``PyImport_AddModule("_frozen_importlib")``
            ``interp->importlib = importlib``
            ``interp->import_func = interp->builtins.__import__``
            ``PyInit__imp()``
              Initializes ``_imp`` module, which is implemented in C.
            ``sys.modules["_imp"} = imp``
            ``importlib._install(sys, _imp)``
            ``_PyImportZip_Init()``

      ``_Py_InitializeMainInterpreter(interp, _PyMainInterpreterConfig)``
        ``_PySys_EndInit()``
          ``sys.path = XXX``
          ``sys.executable = XXX``
          ``sys.prefix = XXX``
          ``sys.base_prefix = XXX``
          ``sys.exec_prefix = XXX``
          ``sys.base_exec_prefix = XXX``
          ``sys.argv = XXX``
          ``sys.warnoptions = XXX``
          ``sys._xoptions = XXX``
          ``sys.flags = XXX``
          ``sys.dont_write_bytecode = XXX``
        ``initexternalimport()``
          ``interp->importlib._install_external_importers()``
        ``initfsencoding()``
          ``_PyCodec_Lookup(Py_FilesystemDefaultEncoding)``
            ``_PyCodecRegistry_Init()``
              ``interp->codec_search_path = []``
              ``interp->codec_search_cache = {}``
              ``interp->codec_error_registry = {}``
              # This is the first non-frozen import during startup.
              ``PyImport_ImportModuleNoBlock("encodings")``
            ``interp->codec_search_cache[codec_name]``
            ``for p in interp->codec_search_path: p[codec_name]``
        ``initsigs()``
        ``add_main_module()``
          ``PyImport_AddModule("__main__")``
        ``init_sys_streams()``
          ``PyImport_ImportModule("encodings.utf_8")``
          ``PyImport_ImportModule("encodings.latin_1")``
          ``PyImport_ImportModule("io")``
          Consults ``PYTHONIOENCODING`` and gets encoding and error mode.
          Sets up ``sys.__stdin__``, ``sys.__stdout__``, ``sys.__stderr__``.
        Sets warning options.
        Sets ``_PyRuntime.initialized``, which is what ``Py_IsInitialized()``
        returns.
        ``initsite()``
          ``PyImport_ImportModule("site")``

CPython Importing Mechanism

Lib/importlib defines importing mechanisms and is 100% Python.

Programs/_freeze_importlib.c is a program that takes a path to an input .py file and path to output .h file. It initializes a Python interpreter and compiles the .py file to marshalled bytecode. It writes out a .h file with an inline const unsigned char _Py_M__importlib array containing bytecode.

Lib/importlib/_bootstrap_external.py compiled to Python/importlib_external.h with _Py_M__importlib_external[].

Lib/importlib/_bootstrap.py compiled to Python/importlib.h with _Py_M__importlib[].

Python/frozen.c has _PyImport_FrozenModules[] effectively mapping _frozen_importlib to importlib._bootstrap and _frozen_importlib_external to importlib._bootstrap_external.

initimport() calls PyImport_ImportFrozenModule("_frozen_importlib"), effectively import importlib._bootstrap. Module import doesn’t appear to have meaningful side-effects.

importlib._bootstrap.__import__ is installed as interp->import_func.

C implemented _imp module is initialized.

importlib._bootstrap._install(sys, _imp is called. Calls _setup(sys, _imp) and adds BuiltinImporter and FrozenImporter to sys.meta_path.

_setup() defines globals _imp and sys. Populates __name__, __loader__, __package__, __spec__, __path__, __file__, __cached__ on all sys.modules entries. Also loads builtins _thread, _warnings, and _weakref.

Later during interpreter initialization, initexternal() effectively calls importlib._bootstrap._install_external_importers(). This runs import _frozen_importlib_external, which is effectively import importlib._bootstrap_external. This module handle is aliased to importlib._bootstrap._bootstrap_external.

importlib._bootstrap_external import doesn’t appear to have significant side-effects.

importlib._bootstrap_external._install() is called with a reference to importlib._bootstrap. _setup() is called.

importlib._bootstrap._setup() imports builtins _io, _warnings, _builtins, marshal. Either posix or nt imported depending on OS. Various module-level attributes set defining run-time environment. This includes _winreg. SOURCE_SUFFIXES and EXTENSION_SUFFIXES are updated accordingly.

importlib._bootstrap._get_supported_file_loaders() returns various loaders. ExtensionFileLoader configured from _imp.extension_suffixes(). SourceFileLoader configured from SOURCE_SUFFIXES. SourcelessFileLoader configured from BYTECODE_SUFFIXES.

FileFinder.path_hook() called with all loaders and result added to sys.path_hooks. PathFinder added to sys.meta_path.

sys.modules After Interpreter Init

Module Type Source
__main__   add_main_module()
_abc builtin abc
_codecs builtin initfsencoding()
_frozen_importlib frozen initimport()
_frozen_importlib_external frozen initexternal()
_imp builtin initimport()
_io builtin importlib._bootstrap._setup()
_signal builtin initsigs()
_thread builtin importlib._bootstrap._setup()
_warnings builtin importlib._bootstrap._setup()
_weakref builtin importlib._bootstrap._setup()
_winreg builtin importlib._bootstrap._setup()
abc py  
builtins builtin _Py_InitializeCore_impl()
codecs py encodings via initfsencoding()
encodings py initfsencoding()
encodings.aliases py encodings
encodings.latin_1 py init_sys_streams()
encodings.utf_8 py init_sys_streams() + initfsencoding()
io py init_sys_streams()
marshal builtin importlib._bootstrap._setup()
nt builtin importlib._bootstrap._setup()
posix builtin importlib._bootstrap._setup()
readline builtin  
sys builtin _Py_InitializeCore_impl()
zipimport builtin initimport()

Modules Imported by site.py

_collections_abc _sitebuiltins _stat atexit genericpath os os.path posixpath rlcompleter site stat

Random Notes

Frozen importer iterates an array looking for module names. On each item, it calls _PyUnicode_EqualToASCIIString(), which verifies the search name is ASCII. Performing an O(n) scan for every frozen module if there are a large number of frozen modules could contribute performance overhead. A better frozen importer would use a map/hash/dict for lookups. This //may// require CPython API breakages, as the PyImport_FrozenModules data structure is documented as part of the public API and its value could be updated dynamically at run-time.

importlib._bootstrap cannot call import because the global import hook isn’t registered until after initimport().

importlib._bootstrap_external is the best place to monkeypatch because of the limited run-time functionality available during importlib._bootstrap.

It’s a bit wonky that Py_Initialize() will import modules from the standard library and it doesn’t appear possible to disable this. If site.py is disabled, non-extension builtins are limited to codecs, encodings, abc, and whatever encodings.* modules are needed by initfsencoding() and init_sys_streams().

An attempt was made to freeze the set of standard library modules loaded during initialization. However, the built-in extension importer doesn’t set all of the module attributes that are expected of the modules system. The from . import aliases in encodings/__init__.py is confused without these attributes. And relative imports seemed to have issues as well. One would think it would be possible to run an embedded interpreter with all standard library modules frozen, but this doesn’t work.