setuptools Tutorial: Getting Started (Part 1 of 2)

This will be the first installment in a two part series of setuptools-related tutorials. In this tutorial, I explore in detail important aspects of setuptools, setup.py directives, and tips and tricks to getting started. In the next tutorial, I will provide step-by-step instructions on creating a simple project from the ground-up and demonstrate how setuptools can be used to make packaging and installing it a breeze. The second part will essentially be a hands-on rehash of the information covered here, so you may wish to read this post in detail before diving in to the next part.


Contents

Other Issues: Questions and Answers

Introduction

setuptools is one of several distribution systems available for Python and is similar to the distutils package that ships the Python’s standard library. setuptools does have some enhancements over distutils and shipts with a few extra utilities that simplify package installation and management from PyPI.

Since this is an introductory tutorial on getting started with setuptools, I won’t discuss setuptools’ history or its relative advantages and disadvantages in comparison to other distribution systems. Furthermore, because it seems that a common motif among complaints directed at setuptools are the result of setuptools’ reliance on a general understanding of distutils, this tutorial aims to clarify many of those areas. Plus, it serves as a useful exercise for myself to learn a bit more about setuptools!

PEAK’s setuptools page is verbose but it can be difficult for beginners and individuals unfamiliar with setuptools or distutils to gather a condensed version of the information they need to know to get started. This post aggregates much of the data I have found and attempts to not only condense this information but also provide useful, real world examples. In particular, I have highlighted certain esoteric but useful things like configuring setuptools to install scripts into /usr/bin or copying a sample configuration file to other directories like /etc; if you’re distributing a Python application that needs either of these features, you might wish to jump straight to the section that covers this.

At the time of this writing, I have already used setuptools in one project. I recall the difficulty I encountered in learning more about the utility and its features, particularly for someone who is unfamiliar with distutils. Thus, I have elected to write this tutorial to fulfill two goals: first to serve as a means of teaching myself setuptools’ features by explaining it (the best way to reinforce what you learn is to teach it, after all!), and second to assist individuals who have a need to use setuptools in a project but have no idea where to begin. Most of the first few sections of this tutorial are targeted toward the latter group, and I provide details on most required steps, including examples.

Getting Started

Obtaining setuptools is an obvious first step to getting started with packaging your Python projects. Fortunately, installing setuptools is easy, and if it isn’t pre-installed on your distribution, it’s easy enough to find.

Advanced users: You can certainly place a copy of setuptools into a virtualenv if you’d rather avoid polluting your system-wide site-packages. I won’t be discussing this (I thought about it and elected not to) because it’s a topic far beyond the scope of a setuptools tutorial. If you want a virtual environment, you can probably start here for details on how to install virtualenv and set up an isolated environment. If you don’t have easy_install installed, the most trouble free way is to obtain virtualenv is to install setuptools first. If you don’t have root access, see the section Help, I don’t have root, but would like to create my own packages for some direction on installing setuptools in your own $HOME.

Installing Setuptools

Most distributions should come with setuptools pre-installed. If not, follow the instructions to install setuptools. If you’re looking for a quick summary, here’s a brief list:

Setuptools – Ubuntu:
apt-get install python-setuptools

Setuptools – Gentoo:
emerge setuptools

Setuptools – FreeBSD:
$ cd /usr/ports/devel/py-setuptools && make && make install

Or if you have portinstall:

$ portinstall py-setuptools

If your version of setuptools is painfully antiquated, you may need to upgrade it to access all features illustrated in this tutorial. You can accomplish this by running:

easy_install -U setuptools

Setting up your Directory Structure

One thing that annoyed me ever so slightly about switching to the setuptools paradigm was the slight change in directory structure required to accommodate the package manager. (I’m sure it isn’t necessarily required, but it is strongly recommended–deviate at your own risk!) In general, you’ll want a structure that looks vaguely like the following (taken from Ian Bicking’s presentation on setuptools):

PackageName/
    setup.py
    packagename/
            __init__.py
            sourcefile.py
            sourcefile2.py
    tests/
    docs/

This is a slightly simplified version of Mr. Bicking’s example layout, and for simple projects, you could do without the tests or docs folders. Be aware that complicated projects will have a use for these, so remove them only if you’re certain you won’t need unit tests or documentation. TurboGears and Pylons both make use of a similar structure except with greater breadth.

Incidentally, here’s where the first catch is typically encountered with a setuptools project. If you don’t create a directory layout similar to the one illustrated above complete with package hints (notice the __init__.py under “packagename“), you’re going to run into enormous difficulty. I realize I’ve mentioned this before, but I feel it’s important to repeat: deviate at your own risk. If you don’t, you might get lucky, or, setuptools may appear to work but never actual do what you expected. Take my word for it and structure your project sources like this.

If you’re using Eclipse’s PyDev or another IDE that generates project layouts for you, you’ll probably discover that it creates a layout similar to this (minus the setup.py). Don’t worry, it shouldn’t change things. However, you should realize that your package directory–that is, the directory with the __init__.py file(s)–will typically be named src. Substitute that for the actual project name in the examples to follow. In fact, my Watcher project illustrated in this section does precisely this!

Let’s take a brief look at the directory structure used in one of my own projects. This illustrates best what not to do:

watcher/
    setup.py
    watcher/
        docs/
        examples/
        src/
            watcher/
            config/
            core/
        tests/

The entire package is contained under src; however, you’ll note that this is where I made a mistake. I’m going to explain it here so you won’t do the same thing: don’t create unnecessary sub-directories. If you do, it might not break anything, but your namespaces will become grotesquely complex. In this case, my entire project was not contained immediately under src–it was contained under src/watcher. My original intent was to create the actual Watcher sources as a separate project within the Watcher project. That never happened, because I realized it would be much easier to fork off the supporting projects into separate packages and call them from Watcher as a dependency.

If I knew then what I knew now, my project layout would have looked something like this:

watcher/
    setup.py
    docs/
    tests/
    examples/
    watcher/
        watcher/
        config/
        core/

For the rest of this tutorial, I will assume that the Watcher sources are structured as above. If you’re like me, you’ll probably be tempted to place everything related to one project under a meaningful name and then place the sources under that. Don’t do this.


setup.py

The next step in creating a project designed to be packaged with setuptools is to create a setup.py. This file gives setuptools all the information it needs to know about your project to create .egg distribution files, install it to site-packages, and many other things! In fact, there’s so much setuptools can do that I’ll be hard-pressed to illustrate them all here. I won’t cover building projects with custom C extensions via setuptools–if you need to know how to do that, you can find more at PEAK’s site.

First, we’ll examine an existing setup.py. Here’s one that I use for Watcher:

#!/usr/bin/env python
from setuptools import setup, find_packages
 
setup (
    name = "Watcher",
    version = "0.1",
    description="Watcher is a utility for automatically blocking SSH probes and more.",
    long_description="""\
Watcher is a utility for monitoring message logs and tracking repeated
connection events from remote addresses. Watcher is fully configurable and may
be used to block SSH probes, DNS reflector DDoS participation attempts, and
more!""",
    author="Benjamin A. Shelton",
    author_email="", # Removed to limit spam harvesting.
    url="http://bashelton.com/",
    #package_dir = {'': 'src'}, # See packages below
    package_data = {'': ['*.xml']},
    packages = find_packages(exclude="test"),
    # Use this line if you've uncommented package_dir above.
    #packages = find_packages("src", exclude="tests"),
 
    entry_points = {
        'console_scripts': ['watcher = watcher.core.main:main']
                    },
 
    download_url = "http://bashelton.com/download/",
    zip_safe = True
)

The first half-dozen or so items are boilerplate setup cruft and should be self-explanatory. However, there are certain requirements you must meet when filling these fields out:

  • name – Provides the package name. This is also used as your package name if you choose to register it with PyPi. Avoid using spaces and keep it short.
  • description – Displayed by PyPI.
  • long_description – Displayed by PyPI.
  • url – Used by PyPI and links to your download site. This URL may be used to locate your package download.
  • download_url – Also used by PyPI but if this is present, it will use the URL here to locate your package. Make sure this points either directly to your project distribution files or to a top level document that can be used to locate them.

If you’re not intending to release your package via PyPI, you have little reason to worry much about the first few entries of your application’s setup.py. However…

Configuring setup.py Internals, the Important Stuff

This is where you need to pay particular attention to the values of:

  • package_dir
  • package_data
  • packages
  • zip_safe

Whether or not you’ll be using each of these depends largely on your individual project requirements. Let’s look at package_dir, package_data, and packages in detail. For more advanced topics, see the next section: Advanced Options in setup.py.

package_dir

This directive tells setuptools where the packages live in your source directory. Here’s a catch if you read most of the official setuptools documentation: Most of what you’ll read is unclear about the purposes of this directive until you dig through the distutils docs. (The PEAK documentation even references this a being a “normal distutils thing.”) Interestingly, you won’t need this directive in most cases. As I understand it, this directive is only needed if you provide a specific directory as an argument to find_packages. You can safely ignore this directive if you omit the first argument to find_packages; see packages below.

Also, while we’re on the topic of package_dir, there’s a subtle syntax oddity PEAK’s setuptools documentation doesn’t discuss clearly but is worthwhile for us to mention here. Let’s take a look at this line in a little more detail:

    package_dir = {'': 'src'},

Notice the empty string (”)? When passed to package_dir (and you’ll see it again for other directives), the empty string tells setuptools that all of your packages live under the directory specified–in this case src. If you have multiple source folders under a single directory, you may have to include the package_dir directive if find_packages() doesn’t work. If you only have one source folder and that folder lives under the same root directory as setup.py, you can omit this directive. Read packages for more.

package_data

The package_data directive tells setuptools where the data for your project lives. As with package_dir, the empty string (”) used as a dictionary key tells setuptools that this applies to all packages. In other words, any file setuptools finds inside a package directory that contains a file matching the string or glob (in our case “*.xml”) will be maintained as a data file. This is useful, because if you don’t include those files (or globs matching a variety of files) in the package_data directive, you’ll soon discover that setuptools is mysteriously omitting them from your packaged project!

Also, be aware that arguments to the package_data keys must be lists. In our case, we used:

    package_data = {'': ['*.xml']},

If we wanted to add other items, such as text files, we would have written:

    package_data = {'': ['*.xml', '*.txt']},

Failure to use lists ([]) as the values for package_data keys will probably result in setuptools generating unusual errors.

packages

The packages directive tells setuptools what packages (and directories) to include and exclude from your build. While it’s possible to specify these manually, setuptools provides a handy utility for automatically traversing our directory structure and locating anything that looks like a package (but it has to include an __init__.py file). Since this tutorial is aimed at individuals just starting with setuptools, I won’t cover how to configure the packages directive by hand. You’ll generally find fewer headaches if you let the software do all the work for you. Let’s take a look at a couple of examples for the Watcher script.

If all of your package directories live under the same root folder as setup.py, you can omit including them in the find_packages call, like this:

    packages = find_packages(exclude="test"),

Of course, if this doesn’t work and you have multiple nested directories like I created with Watcher, you might instead need to write this:

    package_dir = {'': 'src'}, # Our packages live under src but src is not a package itself
    packages = find_packages("src", exclude="tests"),

But remember what I said about limiting unnecessary headaches! If you find you’re needing to add both package_dir and packages, your directory layout is probably confusing find_packages(). The best solution is to modify and simplify your directory structure.

Also note that in our examples, I included the argument exclude="tests" because there is a possibility that I may add a test suite to Watcher in the future. If I do, I don’t want setuptools to think it’s a package when generating the Watcher distribution. Thus, whatever you specify as an argument here will be ignored during package generation.

exclude can also accept list arguments. For example, to exclude multiple directories, you can include them as a list argument:

    packages = find_packages(exclude=["tests", "utils"])

Advanced Options in setup.py

I didn’t discuss a few setup.py options earlier and for good reason. You probably won’t be using them for your first project. However, as your project grows, you might need to consider some of setuptools’ additional features. Let’s take a look at:

  • entry_points
  • download_url
  • zip_safe
  • install_requires

entry_points

setuptools can be extended to include custom commands and most of its internals can be modified. That’s where entry_points comes in handy. However, you can also use entry_points to have setuptools create a launch script for your application that will be installed into /usr/bin (or /usr/local/bin, depending on your platform). This is extremely handy if you’ve written a utility in Python and want some method for semi-automatic installation of scripts that call your application in various incantations. Here’s an example:

    entry_points = {
        'console_scripts': ['watcher = watcher.core.main:main']
    }

The format of the console_scripts line works as follows:

'script_name_to_install = entry_function'

Where script_name_to_install is the name of the script you want installed to /usr/bin and entry_function is the single function call that launches your application. This function can perform any processing–running getopt or forking a daemon process–or you can delegate the processing to other modules in your project.

download_url

The download_url directive isn’t used if someone downloads your .egg and installs it manually. However, if you register your project on PyPi, the download_url (and url) directives are examined for links to your project. It is important that this URL points to either the .egg itself or to a page from which the .egg can be downloaded. If you’re planning on releasing your project to PyPi, you will definitely want to read this page.

zip_safe

Zip safety tells setuptools whether or not your project can operate correctly from a .egg file. There’s a detailed discussion on this at PEAK and it effectively boils down to this:

  • Projects containing C extensions cannot be used from within a .egg
  • Project containing data files are not considered safe
  • If the project relies on __file__ or __path__ calls, it very likely won’t work from a .egg

These conditions are determined by the bdist_egg command if no zip_safe directive is set. You can override it, and you certainly may want to, but most simple projects are considered zip safe. If you’re including fairly simple data files, like XML or plain text, bdist_egg won’t consider your project zip safe; however, XML files (and most other files) work just fine from within a .egg. In this case, you need to set this directive to True from within your setup.py’s setup() call.

    zip_safe = True

install_requires

Most projects will eventually require dependencies. You can instruct setuptools to verify these dependencies are met during installation by simply attaching this to your setup.py’s setup() call:

    install_requires=['Twisted>=8.2.0']

Version declarations, such as “>=8.2.0″ in this example, are handled logically and should offer no surprises.

Creating and Installing Packages

setuptools comes packaged with about two dozen commands. Most developers will only typically use a small subset of these (unless you’re building binary packages, too). The ones I use most frequently are:

  • install – Installs a package into site-packages.
  • bdist_egg – Creates an egg distribution file.
  • develop – Installs the project in “development” mode, allowing you to make in-place source changes.

Although there are commands for creating source distributions, managing custom extensions, and C libraries, the ones I tend to use the most match the requirements of an overwhelming majority of setuptools packaged projects. I won’t discuss install here as it’s fairly obvious what the command does.

bdist_egg

Whether you’re planning on releasing your project to PyPi or posting it on another host for downloading separately, .egg distributions are the best and most self-contained method of distributing python projects. Incidentally, creating an egg distribution is also amazingly easy. Here’s what you need to do:

Change to the root of your source directory. You should have a setup.py file located here; if so, simply run:

$ python setup.py bdist_egg

If all goes well, Python will create a few extra sub-directories, and you’ll have a new .egg file located under dist/. Simply copy that file out of your dist folder:

$ cp dist/*.egg someotherdir/

And that’s it!

develop

setuptools also provides an aliasing mechanism for installing a “pseudo-package” into your site-packages for development purposes. This method is certainly the best one to choose for authoring packages; you needn’t install and reinstall after every minor change and can continue working on your package just as it would exist if it were installed. The develop command works by creating aliases in your site-packages directory that point to the source directories of your project. Since the package is already pointing to your sources, any changes you make are immediately reflected in the installed version of the project.

As with installing setuptools in isolation from the system’s site-packages, using the develop command to install your sources system-wide is not necessarily something you want to do with development code. The develop command allows us to specify a local directory to use for our site-packages path. If you haven’t followed the instructions (below) for installing setuptools in isolation from the base system, I’ll provide a quick way to get started with develop running from your $HOME.

Since running develop in isolation from the system-wide site-packages directory is important, you do need to be aware of certain issues. First, any dependencies you specified in install_requires will be downloaded and installed. Second, unless you include the system’s site-packages path in your $PYTHONPATH, you won’t have access to any of the Python modules installed system-wide. In other words: Be aware that this will run develop in complete and total isolation from the rest of Python. Generally, this is what you want; if not, you can always add the system’s site-packages to you $PYTHONPATH.

First, if you haven’t created a temporary directory to store your local site-packages, do so now:

$ mkdir -p ~/.python/lib/python2.5/site-packages

(You may need to change the Python version in the command above to match your own Python install, such as python2.6.)

Then, add this to your $PYTHONPATH. If you’re creating system scripts, you might need to add part of this path to your $PATH:

$ export PYTHONPATH=~/.python/lib/python2-5/site-packages
$export PATH=~/.python/bin:$PATH

If you want to keep this handy, you could append these lines to a separate script such as ~/.python/.pathrc and then switch to it whenever you’re working on your Python project:

$ source ~/.python/.pathrc

Once you have a directory framework set up, run the develop command from your project root (which should contain your setup.py):

$ python setup.py develop --prefix=~/.python

Assuming everything works well, setuptools will generate several dozen lines of text. You can then examine the content of your ~/.python directory and see if it has generated any files for you. If you’re building with console_scripts, those will be placed in ~/.python/bin. Likewise, any dependencies will be located in ~/.python/lib/python2.5/site-packages.

Plus, since you’ve used the develop command, you’ll notice a file with $YOURPROJECTNAME.egg-link. This file is what performs the magic for you: Its presence essentially redirects Python (using setuptools, of course) to your development directory. Thus, any changes you make to your own sources will be immediately reflected by your project.

Advanced Topics

setuptools is powerful enough to help you create installers that perform a wide variety of tasks with little difficulty. Unfortunately, not all of these topics are well-illustrated in a single place. In this section, I explore how you can use setuptools to install configuration scripts to /etc by copying a template from your freshly installed .egg.

First, we’ll make some assumptions. Assume your project directory layout is something like this:

project/
    src/
        config/
            config.sample.xml
            config.xml
        __init__.py
        project.py
    setup.py

You have two configuration files: src/config/config.xml which is used for development and src/config/config.sample.xml which is your skeleton configuration script. You also have a setup.py that looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/usr/bin/env python
 
from setuptools import setup, find_packages
 
setup (
    name = "Project",
    version = "1.0",
    description="This is my project. There are many like it, but this one is mine.",
    author="Benjamin A. Shelton",
    author_email="", # Removed.
    package_data = {'': ['*.xml']},
    packages = find_packages(exclude="tests"),
    zip_safe = True
)

Now, let’s assume that when someone runs the installer, you want to copy your sample configuration file from src/config/config.sample.xml to /etc/myproject/config.xml. Here’s where it gets tricky.

First, you’ll need to include a couple of items from pkg_resources:

from pkg_resources import Requirement, resource_filename

And then, to determine the file name of your configuration file, you’ll need to do this:

filename = resource_filename(Requirement.parse("Project"),
                                        "config/config.sample.xml")

This will provide you with a path you can then use to copy the sample configuration file. Let’s take a look at a longer script that employs all of these features.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def copyConfig():
    '''copyConfig()
Copies the sample configuration if necessary to /etc/myproject/config.xml.'''
    from pkg_resources import Requirement, resource_filename
 
    # Get our file.
    filename = resource_filename(Requirement.parse("Project"),
                                            "config/config.sample.xml")
 
    try:
        import shutil
 
        # Create the directory.
        if not os.path.exists("/etc/myproject"):
            os.mkdir("/etc/myproject")
 
        # Copy the source file. Don't clobber existing files.
        if not os.path.exists("/etc/myproject/config.xml"):
            shutil.copyfile(filename, "/etc/myproject/config.xml")
 
    except IOError:
        print "Unable to copy configuration file to /etc/myproject/config.xml."

Surprisingly, that’s all there is to it! If you need to copy additional files from your project to other locations, simply modify this example to suit your needs.

Questions and Answers

Help, I don’t have root, but would like to create my own packages!

There are plenty of good reasons to be developing on a machine where you don’t have root access; I certainly won’t question why. Perhaps you’re developing on a shared system, or perhaps your local BOFH has decided that no one should have root access on their local workstations. Obviously, if you do have root access on your own workstation, server, or happen to be running this cruft from within a virtual machine, you have little need to read this section unless you intend to isolate setuptools. (Scroll up!)

For those poor souls who have a distinct need to keep reading, I feel your plight.

First, it’s important to ensure that you really do have a genuine need to install setuptools separately from the system site-packages. It certainly won’t hurt anything if it’s already installed and you’re planning on creating your own separate install; in fact, your sysadmin may never have updated it eons ago. Be aware that you may run into headaches down the road if multiple versions of this utility are installed (including the one you’re using in your isolated environment), and you forget which you’re using during the build process. Take heed, and make certain you’re using the version you think you’re running.

Installing Setuptools–Non-root

You have a need to install setuptools in isolation from your base system. Thankfully, it’s pretty easy. Here’s what you’re going to have to do:

  • Download setuptools-0.6c9-py2.5.egg (or whichever .egg is appropriate for your version of Python)
  • Create a directory to store setuptools: $ mkdir -p ~/.python/lib/python2.5/site-packages (you may need to substitute your specific version of Python)
  • Add this directory to your PYTHONPATH: $ export PYTHONPATH=$HOME/.python/lib/python2.5/site-packages
  • Install setuptools into your home directory: $ sh setuptools-0.6c9-py2.5.egg --prefix=~/.python (I prefer to add it under ~/.python as I tend to have a ~/python on most of my systems; change this directory as you see fit, but it's important to remember that setuptools will create a bin and a lib directory under whatever path you pick!)

If setuptools complains that it can’t find a specific version of Python, you can trick it with the following:

$ ln -s /usr/bin/python python2.5 # substitute whatever version setuptools is complaining about
$ tpath=$PATH # temporary path
$ export PATH=$HOME:$PATH

Then run the commands listed above. When you’re done, restore your old path. We’ll need to restore it for later steps.

export PATH=$tpath

Once you’ve installed setuptools into your $HOME, you may need to add its bin to your path. This examples assumes you’ve installed setuptools into ~/.python:

$ export PATH=$HOME/.python/bin:$PATH

You should now have easy_install available for use. Test it:

$ easy_install --help

If easy_install prints its own usage, it’s installed and we can continue with adding other packages. If not, you may have missed a step. Repeat the steps listed in this section again (you may need to remove ~/.python and start over) until it works; otherwise, leave a comment.

Of course, once you close your shell and log off for the day, you’ll find that the next time you log in, easy_install is gone. Since we only temporarily changed the $PATH and $PYTHONPATH environment variables, we might find it useful to add the following to the ~/.bashrc start-up script. (Change these entries accordingly if you’re using another shell.)

export PATH=$HOME/.python/bin:$PATH
export PYTHONPATH=$HOME/.python:$PYTHONPATH

Then run $ source ~/.bashrc to make sure it took the changes and try running easy_install --help again. Since this only appends your new $PYTHONPATH to the existing one, it should not affect accessing system packages. If you find that system packages aren’t working and you need them, modify your $PYTHONPATH to include:

$ export PYTHONPATH=/usr/lib/python2.5/site-packages:$HOME/.python/lib/python2.5/site-packages

Where python2.5 is your specific version of Python.

***

One Response to “setuptools Tutorial: Getting Started (Part 1 of 2)”

Leave a comment

Valid tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>