Python Training – Part 5

Part 1 | Part 2 | Part 3 | Part 4

This is part 5 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Advertisement

Module Basics

Python modules are a way to physically organize your source code. It would be impractical to have the entire source code in a single .py file. You can extract parts of your source code (functions, classes, constants, etc.) to their own .py files and use them in your main program.

Single program

program.py:

PI = 3.14

def PiMalDaumen(daumen):
   return daumen * PI

print PiMalDaumen(5) / PI
Program and module

utility.py:

PI = 3.14

def PiMalDaumen(daumen):
   return daumen * PI

program.py:

import utility
if __name__ == "__main__":
    print (utility.PiMalDaumen(5)
           / utility.PI)

Note: Once a .py file has been imported, a .pyc file is created so that the module can be loaded faster the next time it is needed. Modules can also be extensions written in other languages such as C++. These extension modules are DLL files with the filename extension .pyd (or .dll in older Python versions).

What’s the difference between a Python module and a Python program?

Both Python programs and Python modules are just .py files. The difference is merely in usage. Apart from the different usage, the code in programs and modules is executed by the Python interpreter in the same way.

  • A program is a .py (or .pyw1) file that you double-click or run with “python program.py”
  • A module is a .py file that you import in your main program or in other modules to reuse the objects (functions, classes, constants, etc.) that it contains

1 The extension .pyw is used if you don’t want a console window to be opened when you double-click the file. Alternatively, you can use the extension .py and run the program with “pythonw program.py” instead of “python program.py”.

To be useful, a program contains some code that is executed right away when the .py file is double-clicked. A module, on the other hand, usually does not do anything at the moment it is imported. It only provides objects that can be used by the importing module later on.

Inside a .py file, you can find out whether you are the main program or a module:

def SomeFunc(a):
    print "To be used by others who import me or by myself."

if __name__ == "__main__":
    print "I was started as the main program."
    print "Do whatever I'm supposed to do."
else:
    print "I was imported as the module named", __name__

Note: You should always enclose code that should be executed when the main program is run in “if __name__ == “__main__””. This way, the code is not executed if you ever decide to import your main program as a module into a different program.

Syntax of “import”

The “import” keyword can be used in different ways. The following examples will import the module “mymodule.py”, which is shown here:

PI = 3.14

def SomeFunc(obj):
    return 5

class SomeClass:
    def SomeMethod():
        return 10

There are three forms of import:

  • import mymodule: This is the most basic form. The name “mymodule” now refers to a module object whose attributes are the objects defined at the top level of the imported module:
    import mymodule
    print mymodule.PI * mymodule.SomeFunc()
    obj = mymodule.SomeClass()
    print obj.SomeMethod()
  • import mymodule as m:This is similar to the first form, but it assigns a different name to the imported module within the importing module:
    import mymodule as m
    print m.PI * m.SomeFunc()
    obj = m.SomeClass()
    print obj.SomeMethod()

    Note: This is the same as this:

    import mymodule
    m = mymodule
    ...
  • from mymodule import SomeFunc:You can list the things you want to use from the imported module and use them without prefixing them with the module name:
    from mymodule import PI, SomeFunc
    print PI * SomeFunc()

    Note: This is the same as this:

    import mymodulePI = mymodule.PISomeFunc = mymodule.SomeFunc
    ...

    Note: You could write “from mymodule import *” to import everything from the given module at once. However, I strongly discourage you from doing this, because it makes the code less readable, and there’s a danger of conflicts between objects with the same names coming from different modules.

Where does Python look for .py files to import?

When Python encounters code like “import mymodule”, it looks for the module “mymodule” in these places:

  1. In the directory that contains the importing module
  2. In all the directories that are contained in the list “sys.path” at the time the import is executed (execute “import sys; print sys.path” to see what it contains)

Note: Python does not look in the current working directory unless the path “.” is explicitly contained in “sys.path”.

The list “sys.path” is filled in this way:

  • From hard-coded paths in the Python interpreter, e.g, “C:Python25Lib”, “C:Python25Libsite-packages”, etc.
  • From paths contained in the environment variable “PYTHONPATH” when Python was started
  • From paths listed in all files with the extension .pth found in “C:Python25” when Python was started
  • Explicitly by your program by modifying “sys.path”,e.g.:
    import sys
    # Ensure that modules located in the current working
    # directory take precedence over all other directories.
    # Note: This refers to the cwd at the time the import will
    # be executed, not necessarily the cwd at this very moment.
    sys.path.insert(0, ".")
    
    # Add a sub-directory of the current working directory.
    # Use the absolute path via os.getcwd() so that it doesn't
    # change when we change the cwd later, e.g. via os.chdir().
    import os
    sys.path.append(os.path.join(os.getcwd(), "subdir"))

    Note: Fiddling with “sys.path” inside the program is sometimes necessary, but most often it is not the right way to do things. Maybe you should organize your modules in packages (see section “Packages”) or extend the search path with a .pth file?

Modules are loaded only once

Each module is loaded only once. If two modules in the same program contain a line “import mymodule”, the module “mymodule” is loaded when the first one of them is executed. The second one receives a reference to the module that’s already loaded.

mymodule.py:

print "mymodule"
def X():
    return "X"
program.py:

print "program"
import mymodule
import utility
print "program:", mymodule.X()
utility.py:

print "utility"
import mymodule
print "utility:", mymodule.X()

When you execute program.py, the output is:

program

mymodule

utility

utility: X

program: X

Note: You can use “reload()” to force a module to be re-executed. This might be useful when you expect your modules to be modified while the program is being executed (when writing a debugger, for example), but shouldn’t be necessary otherwise.

Packages

Packages are a way to organize modules in directories.

Everything in a single directory

  main.py
  database_logic.py
  database_reports.py
  gui_window.py
  gui_button.py
  gui_backend_gtk.py
  gui_backend_win32.py
  gui_backend_osx.py
Grouped hierarchically

  main.py
  database
    logic.py
    reports.py
  gui
    window.py
    button.py 
    backends
      gtk.py
      win32.py
      osx.py

If there were no packages, you could be tempted to add all sub-directories to Python’s module search path:

Bad practice
import sys
sys.path += ["database", "gui", "gui\backends"]
import logic
from window import Window
import gtk

Packages provide a much cleaner way:

# Import the module databaselogic.py
import database.logic

# Import "Window" from the module guiwindow.py
from gui.window import Window

# Import the module guibackendsgtk.py and assign a shorter name
import gui.backends.gtk as backend

To make this work, all you need to do is place an empty file named “__init__.py” in each directory that should be treated as a package:

  main.py
  database
    __init__.py
    logic.py
    reports.py
  gui
    __init__.py
    window.py
    button.py
    backends
      __init__.py
      gtk.py
      win32.py
      osx.py

Intra-Package Imports

Consider this package hierarchy:

myapp
  __init__.py
  database
    __init__.py
    logic.py
    reports.py
  gui
    __init__.py
    window.py
    button.py
    backends
      __init__.py
      gtk.py
      win32.py

A module inside a package can import modules inside a sub-package normally:

# From   myappguiwindow.py
# Import myappguibackendsgtk.py
import backends.gtk

A module inside a sub-package can also import modules from other parts of the package hierarchy using absolute imports:

# From   myappguibackendsgtk.py
# Import myappdatabasereports.py
import myapp.database.reports

Note: For this to work, the top-level package “myapp” must be in the module search path (see “sys.path” in section “Module Basics”). If Python can’t find “myapp”, you will get the error “ImportError: No module named myapp”.

Use with care:

In Python 2.5 and later, you can also use relative imports (“.” refers to the current package, “..” to the parent package, “…” to the grand-parent package, etc.):

# From   myappguibackendsgtk.py
# Import myappguibutton.py
from .. import button

# Import myappdatabasereports.py
from ...database import reports

There are many subtleties involving relative imports. They are not just a straight translation from filesystem paths to import syntax. For example, this code only works when gtk.py was itself imported from somewhere outside the package using something like “import myapp.gui.backends.gtk”, not when you run “python gtk.py”.

Modules and Reflection

Wikipedia says, “reflection is the process by which a computer program can observe and modify its own structure and behavior.” Reflection is a useful and powerful tool and can be used with modules just like with any other object.

Import Modules with Names Determined at Runtime

You can use the “__import__()” function to load a module whose name is only known at runtime. This does not work:

Does not work
# We basically want to
#    import mymodule
# but the name "mymodule" is stored in a string variable
module_name = "mymodule"
import module_name as m   # Nope. This actually looks for a file
                          # named "module_name.py".
m.FuncInsideModule()

This works:

module_name = "mymodule"
m = __import__(module_name)
m.FuncInsideModule()

The previous example isn’t particularly useful yet, so here’s a real-world example where “__import__()” can be put to good use.

Consider a program that the user can extend by providing plug-in modules. The user is expected to place the .py files inside the “plugins” package and the program scans the directory at startup and loads the modules.

  program.py
  plugins
    __init__.py
    colorize.py
    sort.py
    filter.py

If we wanted to hard-code the plug-in names, we could write something like this:

Bad because everything’s hard-coded
# program.py
import plugins.colorize
import plugins.sort
import plugins.filter

plugin_modules = [plugins.colorize,
                  plugins.sort,
                  plugins.filter]
...
for m in plugin_modules:
    data = m.ApplyPlugin(data)

But having to add “import” statements to the program manually is tedious. We can do better by using “os.listdir()” to scan the directory and the “__import__()” function to import the plugins:

# program.py

import os

if __name__ == "__main__":
    plugin_files = os.listdir("plugins")
    plugin_modules = []
    for fn in plugin_files:
        if fn.endswith(".py") and fn != "__init__.py":
            module_name = os.path.splitext(fn)[0]
            import_name = "plugins.%s" % module_name
            plugin_modules.append(
                # basically do "from plugins import "
                __import__(import_name, fromlist=[module_name]))
    ...
    for m in plugin_modules:
        data = m.ApplyPlugin(data)

Inspecting Module Contents

Module objects provide several built-in attributes:

  • __name__: The module name, as specified in the “import” statement
  • __file__: The path to the .py file1from which the module was loaded.
    • Be careful! This might be a path relative to the current working directory at the time when the import was executed. The current working directory might have changed since then.
  • __dict__: A dict containing all objects (functions, classes, variables, etc.) that the module contains
  • __doc__: The docstring of the module

1 The file could also be a .pyc or .pyd file, or whatever the module was loaded from.

The following source code prints some information about the “os” module:

import os
print "Module os loaded from", os.__file__
print "Docstring:", os.__doc__
print "Contains the following objects:"
for name, obj in os.__dict__.iteritems():
    print name, ":", type(obj)

This will print something like this:

Module os loaded from c:python25libos.pyc
Docstring: OS routines for Mac, NT, or Posix depending on what system we're on.

This exports:
  - all functions from posix, nt, os2, mac, or ce, e.g. unlink, stat, etc.
...

Contains the following objects:
lseek : <type 'builtin_function_or_method'>
O_SEQUENTIAL : <type 'int'>
pathsep : <type 'str'>
execle : <type 'function'>
_Environ : <type 'classobj'>
urandom : <type 'builtin_function_or_method'>
execlp : <type 'function'>
...

You can also use dir(), getattr(), hasattr(), and setattr() to access the module’s contents just like with other objects.

As a real-world example, consider a program that runs test cases contained inside a user-provided Python module. A test case is any function whose name starts with “Test_”.

# test_runner.py
# Start with:
#    python test_runner.py 
import sys
if __name__ == "__main__":
    module_name = sys.argv[1]
    test_suite = __import__(module_name)
    print "Running", test_suite.__doc__
    if hasattr(test_suite, "InitTests"):
        init_func = getattr(test_suite, "InitTests")
        init_func()
    for obj_name in dir(test_suite):
        if obj_name.startswith("Test_"):
            test_func = getattr(test_suite, obj_name)
            print "Performing", test_func.__name__, "...",
            print test_func()

This is an example test suite:

# module_to_test.py

"My test suite"

def InitTests():
    print "Initializing..."

def Test_Case1():
    return 5 * 5 == 25

def Test_Case2():
    return -1 ** 0 == 0

This is the output of the program:

> python test_runner.py module_to_test
Testing Various test cases
Initializing...
Performing Test_Case1 ... True
Performing Test_Case2 ... False

Memory Leaks

In C++, a memory leak typically occurs because your program forgets about a chunk of memory that it reserved. The memory is never freed although your program doesn’t make any use of it. If this happens too often, memory usage of the program might reach critical levels.

In Python, a memory leak occurs because your program keeps references to unneeded objects.

Python uses reference counting and a garbage collector to prevent most types of memory leaks:

x = [1, 2, 3]
d = {4: x}
y = (x, d)
# Here, the list [1, 2, 3] has ref count 3.
d.clear()
# Now it's down to 2.
x = 0
# Now the tuple y has the only reference left.
del y
# The list [1, 2, 3] has ref count 0. The garbage collector
# can free its associated memory at any time now.

In order to produce a memory leak in Python (or rather, to cause undesired memory consumption), you have to accumulate references to objects that you wouldn’t otherwise need. Here’s a hypothetical and somewhat trivial example:

class File:
    def __init__(self, filename):
        self.m_filename = filename
        self.m_file_contents = open(filename).read()

processed_files = []
for fn in filename_list:
    f = File(fn)
    DoSomeImportantProcessing(f)
    processed_files.append(f)

print "Processed files:", [f.m_filename for f in processed_files]

Here, the list “processed_files” keeps references to “File” objects, although only the filenames of the objects will be needed after the loop. However, the “File” objects contain references to the file data. The peak memory usage of the program is the total size of all processed files. Of course, it would be more efficient to just store “processed_filenames”, like this:

processed_filenames = []
for fn in filename_list:
    f = File(fn)
    DoSomeImportantProcessing(f)
    processed_filenames.append(fn)

print "Processed files:", processed_filenames

The previous example isn’t a real memory leak. It’s just a case of undesired memory consumption that might not have been as obvious if the program was larger and more contrived.

Python can suffer from real memory leaks, though:

  • Memory leaks in C/C++ extension libraries (either caused by bugs in the libraries or by incorrect usage)
  • Reference cycles involving objects with overloaded “__del__()” methods

Here’s an example of a reference cycle:

Not a memory leak
class X():
    pass

x = X()
y = X()
z = X()
x.next = y
y.next = z
z.next = x
del x
del y
del z

In this case, although there’s a reference cycle, the Python garbage collector is able to break the cycle and all the memory is freed as expected.

Memory leak
class X():
    def __del__(self):
        pass

x = X()
y = X()
z = X()
x.next = y
y.next = z
z.next = x
del x
del y
del z
import gc
gc.collect() # let the garbage collector do its work right now
print gc.garbage
[<__main__.X instance>, <__main__.X instance>,
 <__main__.X instance>]

The objects “x”, “y”, and “z” are not garbage-collected. Python cannot decide which object to delete first, because it doesn’t know whether our implementation of “__del__()” relies on a specific order. Therefore, the memory of the three “X” objects cannot be freed. You can break the cycle manually, as described in the Python docs.

This problem affects your code only if you have cyclic references among your objects and the involved classes implement “__del__()” methods.

Further information:

Advertisement

Leave a Reply

Your email address will not be published. Required fields are marked *