Notes on Python package setup

Creating Python packages is a little fiddlier than one might hope.

Operating system prerequisites

If your package has non-standard system dependencies, there are a number of options:

  1. Manual installation by the user.

  2. Packaging your Python code within an OS package, such as:

    • a .deb file, for Debian Linux, easily installable via gdebi, e.g. sudo gdebi DEBFILE;

    • an .rpm file, for Red Hat Linux and derivatives, creatable from a .deb file via alien, and easily installable via sudo yum install RPMFILE;

    • Windows packaging.

  3. Docker

    In general, this is preferred, because it guarantees the OS environment exactly, is fairly simple to install, and performance remains good.

Python package dependencies: install_requires versus requirements.txt

Remember

Code in setup.py needs to cope with (a) installation, as in pip install ., and (b) package creation, as in python setup.py sdist.

Background

This is a standard Python problem: “my_package depends on other_package version 1.2.3”.

  • requirements.txt is read by “bots” such as Dependabot on Github, so if this is your primary list of requirements, automatic pull requests will work. It’s also read when users do a manual installation from it. And it’s read by PyCharm and other IDEs.

    • But possibly it works without this file? Yes; it should. See below.

  • The setup(..., install_requires=[...]) parameter in setup.py is read by pip.

How do these differ? See

Note:

  • It’s better to specify a dependency version range when you are providing libraries, and an exact version when you are providing an application.

  • When you run a user-defined script from your package, it calls pkg_resources.load_entry_point(dist, group, name) (this is part of setuptools). See https://setuptools.readthedocs.io/en/latest/pkg_resources.html. This doesn’t seem to re-call setup.py or check requirements.txt.

  • pip uses install_requires and not requirements.txt when installing your package.

Options

One “single-source” approach is to define a variable such as INSTALL_REQUIRES in setup.py that is used in the setup(..., install_requires=INSTALL_REQUIRES) and is used to write requirements.txt (e.g. via an extra call by the developer: python setup.py --extras).

Another is to read and parse requirements.txt in setup.py.

Experimenting with a package that has a simple requirement for semantic_version:

  • No requirements specified: code will crash at runtime with ModuleNotFoundError: No module named 'semantic_version'

  • install_requires only:

    • PyCharm notices (even if indirected via a variable).

      But note: it will cope with simple indirection, e.g.

      REQUIREMENTS = [
          "semantic_version==32.8.4",
      ]
      setup(
          ...,
          install_requires=REQUIREMENTS,
      )
      

      but not with more complex indirection, e.g.

      REQUIREMENTS_TEXT = """
          semantic_version==32.8.4
      """
      REQUIREMENTS = []
      with StringIO(REQUIREMENTS_TEXT) as f:
          for line in f.readlines():
              line = line.strip()
              if (not line) or line.startswith('#') or line.startswith('--'):
                  continue
              REQUIREMENTS.append(line)
      setup(
          ...,
          install_requires=REQUIREMENTS,
      )
      
    • Dependabot is meant to notice. Its code suggests it will cope with arbitrary indirection: https://github.com/dependabot/dependabot-core/blob/main/python/helpers/lib/parser.py

    • pip install does what’s required and the code runs.

  • requirements.txt only:

    • PyCharm notices.

    • We know Dependabot notices.

    • pip install does NOT install the necessary dependencies.

    So this option is useless.

The next question is whether requirements.txt is necessary at all. One view (e.g. Reddit above) is that it can be kept for development environments, i.e. the extras required for development but not for running your package.

Conclusion

  • For package distribution, install_requires in setup.py is mandatory, and requirements.txt is optional and therefore perhaps best avoided so that automatic code analysis tools don’t get confused.

Data and other non-Python files: setup.py versus MANIFEST.in

Here’s another tricky thing. In setup.py, you have package_data and include_package_data arguments to setup(). There is also the file MANIFEST.in.

# # or MANIFEST.in ? # - https://stackoverflow.com/questions/24727709/i-dont-understand-python-manifest-in # noqa # - https://stackoverflow.com/questions/1612733/including-non-python-files-with-setup-py # noqa # # or both? # - https://stackoverflow.com/questions/3596979/manifest-in-ignored-on-python-setup-py-install-no-data-files-installed # noqa # … MANIFEST gets the files into the distribution # … package_data gets them installed in the distribution # # data_files is from distutils, and we’re using setuptools # - https://docs.python.org/3.5/distutils/setupscript.html#installing-additional-files # noqa

See:

… the last, in particular, suggesting that both MANIFEST.in (required for sdist) and package_data (used for install) are necessary. However, it seems that you can use just MANIFEST.in if you specify include_package_data=True.

For complex file specification, you could use Python and then write to MANIFEST.in, but actually the manifest syntax is quite good:

So, the two realistic options are:

  1. Have a setup.py that auto-writes MANIFEST.in when required.

  2. Specify MANIFEST.in properly and use include_package_data=True. This is probably better. See in particular https://ep2015.europython.eu/media/conference/slides/less-known-packaging-features-and-tricks.pdf

Conclusion

Use MANIFEST.in plus setup(..., include_package_data=True). Use the full syntax available for MANIFEST.in.

To find all extensions (for the global-exclude command), use:

find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u

Beware a nasty caching effect

Consider deleting any old MY_PACKAGE_NAME.egg_info directory from within setup.py, before calling setup(). This may be particularly applicable for packages that ship “data”. See http://blog.codekills.net/2011/07/15/lies,-more-lies-and-python-packaging-documentation-on–package_data-/

Like this, for example:

# setup.py

import os
import shutil

PACKAGE_NAME = "MY_PACKAGE_NAME"
THIS_DIR = os.path.abspath(os.path.dirname(__file__))  # contains setup.py
EGG_DIR = os.path.join(THIS_DIR, PACKAGE_NAME + ".egg-info")

shutil.rmtree(EGG_DIR, ignore_errors=True)

setup(...)

This is perhaps meant to be unnecessary, per https://stackoverflow.com/questions/3779915/why-does-python-setup-py-sdist-create-unwanted-project-egg-info-in-project-r, but maybe isn’t.

It appears to be unnecessary once you shift to MANIFEST.in and include_package_data=True.