realmike.org http://realmike.org/blog Python and C++, GNU/Linux, computer stuff... Wed, 27 May 2015 14:18:41 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.2 http://realmike.org/blog/wp-content/uploads/2015/03/realmikeqr-550de565v1_site_icon-32x32.pngrealmike.orghttp://realmike.org/blog 32 32 WIP: Mirror Your Linux Desktop to Your TVhttp://realmike.org/blog/2015/05/27/wip-mirror-your-linux-desktop-to-your-tv/ http://realmike.org/blog/2015/05/27/wip-mirror-your-linux-desktop-to-your-tv/#comments Wed, 27 May 2015 14:06:25 +0000 http://realmike.org/blog/?p=840 Continue reading WIP: Mirror Your Linux Desktop to Your TV ]]> A while back, I wrote a script to do a live screencast of your desktop to a smart TV. I am planning to continue development on that in order to make it more user-friendly. I will update this post with my progress.

May 27, 2015

I tried the old scripts and verified that they can be made to work on Ubuntu 15.04 (Vivid Vervet) on a Samsung UE40H7005 TV (2014 model).

Ffmpeg is deprecated in Ubuntu, so I am using avconv instead, adapting a command line that user Avio posted in a comment here. In the file dlna_fuse.py, replace the command line in class DlnaFuse with this:


cmd = ("parec -d %(pulseaudio_monitor)s | "
       " avconv -f x11grab -s 1280x720 -r 30 -i :0.0+0,0 -ab 192k "
       " -f s16le -ac 2 -ar 44100 -i - "
       " -vcodec libx264 -crf 30 -preset ultrafast -tune animation -threads 0 "
       " -acodec libmp3lame -f matroska - "
       " | %(live_filter)s - ") % locals()

This captures a 1280×720 section of the screen at 30 frame per second and the audio that is played by Pulseaudio and encodes them as H.264 video and MP3 audio in a Matroska container.

In my test, I started the capture two minutes before starting playback on the TV, and I kept streaming for about half an hour. Worked nicely. Next I will see how far I can bring down the latency.

 

]]>
http://realmike.org/blog/2015/05/27/wip-mirror-your-linux-desktop-to-your-tv/feed/ 0
WordPress: Adding AdSense and Google Analyticshttp://realmike.org/blog/2015/04/06/wordpress-adding-adsense-and-google-analytics/ http://realmike.org/blog/2015/04/06/wordpress-adding-adsense-and-google-analytics/#comments Sun, 05 Apr 2015 22:20:40 +0000 http://realmike.org/blog/?p=822 Continue reading WordPress: Adding AdSense and Google Analytics ]]> In this article, I will describe how to add code for Google AdSense and Google Analytics to the WordPress themes TwentyFourteen and TwentyFifteen. The goal is to show ads in the sidebar, after the post excerpt (i.e., at the “Read More” tag), and at the end of a post.

I recently switched the themes on my web sites to TwentyFourteen and TwentyFifteen. While doing this, I figured it’s time to add Google Analytics to the web sites and to change the way I display ads. Before, I had one ad in the sidebar. What I wanted is to have ads within the posts as well, automatically added by the theme code.

  • The first ad should appear after the introduction of the post.
    • For longer posts, I use the “Read More” tag (“<!–more–>” in the post markup) to separate the introduction (or “excerpt” in WordPress lingo), from the rest of the post. This is a good place for an ad: The ad appears near the top of the post (depending on the length of the excerpt) and within the flow of the text.
  • The second ad should appear at the end of the content, right before the “Share” links.
  • A third ad is shown inside a widget in the sidebar.

For short posts that don’t have a “Read More” tag, only the ads in the sidebar and at the end of the post will be shown. In addition, I wanted to have the possibility to insert an ad explicitly into the post markup using a “shortcode“.

Creating a Child Theme

Instead of modifying the TwentyFourteen and TwentyFifteen theme code directly, it is recommended to create a child theme.  (The WordPress codex has more information about child themes.)

For example, to create a child theme of TwentyFifteen, create the directory .../wp-content/themes/twentyfifteen-child and add two files: styles.css and functions.php.

The contents of styles.css (we’ll add to this later):

/*
 Theme Name:   Twenty Fifteen Child
 Theme URI:    http://realmike.org/twenty-fifteen-child/
 Description:  Twenty Fifteen Child Theme
 Author:       Michael Fötsch
 Author URI:   http://realmike.org
 Template:     twentyfifteen
 Version:      1.0.0
 License:      GNU General Public License v2 or later
 License URI:  http://www.gnu.org/licenses/gpl-2.0.html
 Tags:         black, blue, gray, pink, purple, white, yellow, dark, light, two-columns, left-sidebar, fixed-layout, responsive-layout, accessibility-ready, custom-background, custom-colors, custom-header, custom-menu, editor-style, featured-images, microformats, post-formats, rtl-language-support, sticky-post, threaded-comments, translation-ready
 Text Domain:  twenty-fifteen-child
*/

/* This is where you will add your custom CSS styles */

The important part is “Template: twentyfifteen”, which tells WordPress that this is a child theme of TwentyFifteen.

The contents of functions.php (we’ll add to this later):

/* Load the parent theme's CSS */
<?php 
add_action( 'wp_enqueue_scripts', 'theme_enqueue_styles' );
function theme_enqueue_styles() {
    wp_enqueue_style( 'parent-style', get_template_directory_uri() . '/style.css' );
}

Once you have created these files, the child theme will appear under “Appearance”→”Themes” in your WordPress dashboard. Apply the child theme to your site. (After you have done this, check your custom settings for navigation menus, custom color scheme, etc. and re-apply them if needed.)

In the next sections, we will add code to the child theme files.

Adding AdSense And Analytics Code to The <head>

In WordPress, themes can register functions that should be invoked at certain points during page rendering. In order to insert some JavaScript code into the HTML <head> element, the theme registers a function for the “wp_head” action. Paste the following code into the file functions.php to add the common code for Google Analytics and for AdSense. (The code in red is copy&pasted from my AdSense/Analytics dashboard. Replace it with the code that you get for your own Google account, especially the IDs marked with “>>> <<<“.)

// Insert Google Analytics and AdSense code into
// the <head>.
add_action('wp_head','hook_google_head');

function hook_google_head()
{
    // AdSense script
    // Note: This is the first line of the
    // code that AdSense gives me for an ad
    // unit. As all ad units on the page load
    // the same JavaScript file, loading the
    // file can be done once for the page, and
    // the first line can be removed from the
    // code of the individual ad units.
    $output = '<script src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js" async=""></script>';

    // Google Analytics
    $output = $output . "<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
  ga('create', '>>>specify the Tracking ID from your Analytics account<<<', 'auto');
</script>";
    echo $output; }

Inserting an Ad at The End of The Content

In WordPress, a theme can register a filter function to modify the contents of a page before it is sent back to the browser. One such filter function is “the_content“, which is invoked for the main content part of a post (i.e., the HTML code that corresponds to the post text, without navigation bars, footers, etc.).

Add this code to functions.php to insert an ad at the end of the content. (The code in red is copy&pasted from my AdSense dashboard. Replace it with the code that you get for your own Google account, especially the IDs marked with “>>> <<<“.)

// Helper function to retrieve ad code for a
// given slot.
// $slot is the AdSense ad unit ID. Make sure
// that each ad you insert into a page uses its
// own ad unit ID, otherwise some ads will not
// load.
function get_ad_code( $slot ) {

    // The JavaScript code for the ad unit comes
    // from the Google AdSense web site. The
    // first line of the code, the one that
    // loads the adsbygoogle.js JavaScript file,
    // is removed here. We inserted that line
    // into the <head>, so we don't have to
    // repeat it for each ad unit.
    return '<p><span class="adsense-title">Advertisement</span><br/>
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client=">>>AdSense Client ID (ca-pub-...)<<<"
     data-ad-slot="' . $slot . '"
     data-ad-format="auto"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</p>';

}

// Insert AdSense ads into the content.
add_filter('the_content', 'insert_adsense_into_page');
function insert_adsense_into_page($text) {
    // is_singular() for pages as well as
    // for posts, is_single() for posts only.
    if( is_singular() ) {
        // Insert an ad at the end of
        // the content
        $ad_code = get_ad_code( '>>>AdSense ad unit ID for end of content<<<' );
        $text = $text . $ad_code;
    }
    return $text;
}

The code inserts the word “Advertisement” before the ad and applies the CSS style “adsense-title” to it. In the child theme’s style.css file, add the CSS rules for this class, for example:

.adsense-title {
	font-size: .7em;
	color: #808080;
	text-transform: uppercase;
}

Inserting an Ad After The “<!–more–>” Tag

When you insert the “<!–more–>” markup into a post, the HTML contains a “<span id=’more-…’>” element in its place. When someone clicks the “Continue Reading…” link at the end of the post excerpt, the page scrolls to this “<span>” by default.

To insert an ad after the end of the excerpt, modify the function “insert_adsense_into_page” in functions.php as follows.

// Insert AdSense ads into the content.
add_filter('the_content', 'insert_adsense_into_page');
function insert_adsense_into_page($text) {
    // is_singular() for pages as well as
    // for posts, is_single() for posts only.
    if( is_singular() ) {
        // Insert an ad at the
        // "<!--more-->" tag, i.e.,
        // after the excerpt.
	// Don't insert if there is no
        // such tag.
        $ad_code = get_ad_code( '>>>AdSense ad unit ID for More tag<<<' );
        $start_pos = strpos($text, '<span id="more-');
        if ($start_pos !== false) {
            $end_pos = strpos($text, '</span>', $start_pos);
            $text_before = substr($text, 0, $end_pos);
            $text_after = substr($text, $end_pos);
            $text = $text_before . $ad_code . $text_after;
        }

        // Insert an ad at the end of
        // the content
        $ad_code = get_ad_code( '>>>AdSense ad unit ID for end of content<<<' );
        $text = $text . $ad_code;
    }
    return $text;
}

Note that you need to use different ad unit IDs for the two different ads. Otherwise, one of the ads will not be shown.

Adding an AdSense Shortcode

In addition to these automatically inserted ads, you can insert ads into a post manually by providing a shortcode. By adding the markup [adsense slot=">>>AdSense ad unit ID<<<"] into the post, an ad for the given ad unit ID should be inserted.

Add the following code to functions.php to make this shortcode available:

function adsense_shortcode( $atts ) {

    // Extract the slot ID from the attributes
    // if the shortcode was used in the form
    // '[adsense slot="slot ID"]'.
    // Use a default slot ID if it was used in
    // the form '[adsense]'.
    extract( shortcode_atts(
        array(
            'slot' => '>>>default AdSense ad unit ID<<<',
        ), $atts )
    );

    return get_ad_code( $slot );

}
add_shortcode( 'adsense', 'adsense_shortcode' );

Note that each ad that appears on a page must have a unique ad unit ID, so you must make sure to use a different slot ID for the [adsense] shortcode than you do for the automatically inserted ads.

Inserting an Ad in The Sidebar

To insert an ad in the sidebar, no code changes are required. Instead, go to the “Widgets” section in the WordPress dashboard and add a “Text”  widget to the sidebar. Paste the Google AdSense code into the Text widget. (You can leave out the first line of the code, the one that loads the file adsbygoogle.js, because we added this line to the <head> before.)

Again, make sure that you use a different ad unit ID than for the other ads on the page. Otherwise, some of the ads will not be shown.

]]>
http://realmike.org/blog/2015/04/06/wordpress-adding-adsense-and-google-analytics/feed/ 0
EuroPython 2013: Add Music to Pythonhttp://realmike.org/blog/2013/07/06/europython-2013-add-music-to-python/ http://realmike.org/blog/2013/07/06/europython-2013-add-music-to-python/#comments Sat, 06 Jul 2013 13:30:12 +0000 http://realmike.org/blog/?p=712 Continue reading EuroPython 2013: Add Music to Python ]]> I gave a talk about Libspotify at the EuroPython 2013 conference in Florence, Italy this week.

The talk shows you how to build cool stuff around music in your Python applications.

  • Libspotify to access the Spotify music streaming service: High-quality streaming, playlist management, music metadata
  • Echo Nest for recommendations, musical analysis, mashups
  • Last.fm for scrobbling and building a musical user profile

The video of the talk is available on YouTube.

Slides and sample code are also online.

I am planning to put up an article about the talk later, but you know how it is with finding the time to write articles…

]]>
http://realmike.org/blog/2013/07/06/europython-2013-add-music-to-python/feed/ 0
Embedding Python – Tutorial – Part 1http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/ http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/#comments Sun, 08 Jul 2012 16:56:42 +0000 http://realmike.org/blog/?p=680 Continue reading Embedding Python – Tutorial – Part 1 ]]> This is a follow-up to my talk at EuroPython 2012, “Supercharging C++ Code with Embedded Python“. An embedded Python interpreter allows users to extend the functionality of the program by writing Python plug-ins. In this series of tutorials, I will give you step-by-step instructions on how to use the Python/C API to do this.

I assume that you know how to write and compile C/C++ programs. If you have prior experience with writing Python extension modules, it may be helpful, although it’s not required. In my article about extending Python, you can find instructions for setting up your Makefiles/workspaces when working with the Python/C API.

The Example Program

In this part, we’re going to add Python plug-ins to a simple C++ program. The program reads lines of text from STDIN and outputs them unmodified to STDOUT.

#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    std::clog << "Type lines of text:" << std::endl;
    std::string input;
    while (true)
    {
        std::getline(std::cin, input);
        if (!std::cin.good())
        {
            break;
        }
        std::cout << input << std::endl;
    }
    return 0;
}

The user should now be able to write a Python plug-in that modifies the string before it is printed. These plug-ins will look something like this:

# elmer_fudd_filter.py
def filterFunc(s):
    return s.replace("r", "w").replace("l", "w")

# shout_filter.py
def filterFunc(s):
    return s.upper()

To make this work, we will link the program to the Python interpreter and use the Python/C API to import the plug-in module and to invoke the “filterFunc()” function inside it.

As a first step in that direction, we simply initialize the Python interpreter with “Py_Initialize()” and shut it down with “Py_Finalize()”.

#include <Python.h>
#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    Py_Initialize();
    std::clog << "Type lines of text:" << std::endl;
    std::string input;
    while (true)
    {
        std::getline(std::cin, input);
        if (!std::cin.good())
        {
            break;
        }
        std::cout << input << std::endl;
    }
    Py_Finalize();
    return 0;
}

According to the Python documentation, the include directive for “Python.h” should appear first in the C/C++ file. It doesn’t sound like a good idea to require this, but if you don’t put “Python.h” first, you might get compiler warnings like these:

In file included from /usr/include/python2.7/Python.h:8:0,
                 from program.cpp:3:
/usr/include/python2.7/pyconfig.h:1161:0: warning: "_POSIX_C_SOURCE" redefined [enabled by default]
/usr/include/features.h:164:0: note: this is the location of the previous definition
/usr/include/python2.7/pyconfig.h:1183:0: warning: "_XOPEN_SOURCE" redefined [enabled by default]
/usr/include/features.h:166:0: note: this is the location of the previous definition

To compile and link the program, you need to have the Python headers and the static Python library on your machine. On Windows, the headers are in the “include” directory of the Python installation, and the .lib file is in the “libs” directory. On GNU/Linux, these files come with the Python development package. On my Ubuntu 12.04 system, I had to install “python-dev”: “sudo apt-get install python-dev”.

Here’s an example command line to build the program using GCC:

g++ -o program program.cpp -I/usr/include/python2.7 -Wall -lpython2.7

If you are on Windows and using Visual Studio, see my article about extending Python if you need help setting up your project.

Invoking the Plug-In

Once this works, we can start adding the code to call the Python plug-in. To find the appropriate Python/C API calls, it always helps to think about the equivalent Python code first. Written in Python, our finished program might look something like this:

PLUGIN_NAME = "shout_filter"

def CallPlugIn(ln):
    plugin = __import__(PLUGIN_NAME)
    filterFunc = getattr(plugin, "filterFunc")
    args = (ln,)
    ln = filterFunc(*args)
    return ln

while True:
    ln = raw_input()
    if not ln:
        break
    ln = CallPlugIn(ln)
    print ln

The “CallPlugIn()” function imports the “shout_filter.py” module, retrieves the “filterFunc()” function, and invokes it. The function could certainly be written in a more concise, more Pythonic way. However, it’s easier to find the corresponding Python/C API calls when the code is broken down into its basic building blocks like “__import__()” and “getattr()”.

By digging through the Python/C API reference manual, we can find the API calls for each piece of Python code:

PythonPython/C API
__import__PyImport_Import()
getattrPyObject_GetAttrString()
args = (ln,)Py_BuildValue()
filterFunc(*args)PyObject_CallObject()

Thus, the first attempt at writing the “CallPlugIn()” function in C++ looks like this:

static const char* PLUGIN_NAME = "shout_filter";

std::string CallPlugIn(const std::string& ln)
{
    PyObject* pluginModule = PyImport_Import(PyString_FromString(PLUGIN_NAME));
    PyObject* filterFunc = PyObject_GetAttrString(pluginModule, "filterFunc");
    PyObject* args = Py_BuildValue("(s)", ln.c_str());
    PyObject* result = PyObject_CallObject(filterFunc, args);
    return PyString_AsString(result);
}

In the main routine, call the function before printing the line of text:

std::cout << CallPlugIn(input) << std::endl;

In its current form, the “CallPlugIn()” function has two major problems:

  • There’s no error checking. When anything goes wrong (the module can’t be imported, “filterFunc()” raises an exception, etc.), the program will likely crash.
  • There are memory leaks. We create a number of objects, but we never decrement their reference counts. Eventually, the program will run out of memory.

We will fix the leaks later. For now, let’s at least return an error message if anything goes wrong. Most API calls that return a “PyObject*” return a NULL pointer if an error occurred. In this case, it is the caller’s responsibility to handle/report the error and to clear Python’s internal error indicator with “PyErr_Clear()”. To print the traceback of the last error, use “PyErr_Print()”, which has the side effect of also clearing the error indicator. It’s important to clear the error as soon as possible, otherwise subsequent Python calls might fail in unexpected ways or give you misleading error messages.

std::string CallPlugIn(const std::string& ln)
{
    PyObject* pluginModule = PyImport_Import(PyString_FromString(PLUGIN_NAME));
    if (!pluginModule)
    {
        PyErr_Print();
        return "Error importing module";
    }
    PyObject* filterFunc = PyObject_GetAttrString(pluginModule, "filterFunc");
    if (!filterFunc)
    {
        PyErr_Print();
        return "Error retrieving 'filterFunc'";
    }
    PyObject* args = Py_BuildValue("(s)", ln.c_str());
    if (!args)
    {
        PyErr_Print();
        return "Error building args tuple";
    }
    PyObject* result = PyObject_CallObject(filterFunc, args);
    if (!result)
    {
        PyErr_Print();
        return "Error invoking 'filterFunc'";
    }
    const char* cResult = PyString_AsString(result);
    if (!cResult)
    {
        PyErr_Print();
        return "Error converting result to C string";
    }
    return cResult;
}

Create the file “shout_filter.py” in the same directory that you run the program from and add a valid “filterFunc()” function:

def filterFunc(ln):
    return ln.upper()

When you run the program now, you get an error (most likely): “Error importing module”. Why is that?

Normally, when importing a module, Python tries to find the module file next to the importing module (the module that contains the import statement). Python then tries the directories in “sys.path”. The current working directory is usually not considered. In our case, the import is performed via the API, so there is no importing module in whose directory Python could search for “shout_filter.py”. The plug-in is also not on “sys.path”. One way of enabling Python to find the plug-in is to add the current working directory to the module search path by doing the equivalent of “sys.path.append(‘.’)” via the API.

Py_Initialize();
PyObject* sysPath = PySys_GetObject((char*)"path");
PyList_Append(sysPath, PyString_FromString("."));

Run the program again and type a few lines of text. Everything should work now. You can also try to deliberately introduce errors into the plug-in function and see whether they are caught by the program.

When you think about it, this is quite an achievement: You just added a full-fledged scripting language to your program with an amazingly small amount of code.

Reference Counting

As I mentioned, this program leaks memory pretty badly. For example, every time we run “Py_BuildValue()”, a tuple and a string object are created, but they are never freed. The reference count (refcount) of the tuple is initially 1 and we never decrement it, so the object remains alive forever. The next time we run “CallPlugIn()”, a new object is created.

Whenever you receive a “PyObject*” via the Python C/API, you need to figure out whether you are responsible for decrementing its refcount. The API docs distinguish between these cases:

  • New reference. The refcount has been incremented before the object was returned to the caller. The caller is reponsible for decrementing the refcount with “Py_DECREF()” when the object is no longer needed.
  • Borrowed reference. The refcount has not been incremented before the object was returned. For example, the “PyTuple_GetItem()” function, which returns an item of a tuple, returns a borrowed reference. You can work with the item normally, at least as long the item is still in the tuple. When the tuple is destroyed, though, the item may be destroyed (if the refcount reaches zero). If you need to keep a reference to the item for longer, you are responsible for incrementing the item’s refcount yourself with “Py_INCREF()”.

The docs also talk about “stolen references“. Sometimes when you pass an object to an API, the API will “steal” the reference, which means that the API will take care of decrementing the refcount at some point and that the caller must refrain from doing the same.

The “CallPlugIn()” function with added reference counting:

std::string CallPlugIn(const std::string& ln)
{
    PyObject* name = PyString_FromString(PLUGIN_NAME);
    PyObject* pluginModule = PyImport_Import(name);
    Py_DECREF(name);
    if (!pluginModule)
    {
        PyErr_Print();
        return "Error importing module";
    }
    PyObject* filterFunc = PyObject_GetAttrString(pluginModule, "filterFunc");
    Py_DECREF(pluginModule);
    if (!filterFunc)
    {
        PyErr_Print();
        return "Error retrieving 'filterFunc'";
    }
    PyObject* args = Py_BuildValue("(s)", ln.c_str());
    if (!args)
    {
        PyErr_Print();
        Py_DECREF(filterFunc);
        return "Error building args tuple";
    }
    PyObject* resultObj = PyObject_CallObject(filterFunc, args);
    Py_DECREF(filterFunc);
    Py_DECREF(args);
    if (!resultObj)
    {
        PyErr_Print();
        return "Error invoking 'filterFunc'";
    }
    const char* resultStr = PyString_AsString(resultObj);
    if (!resultStr)
    {
        PyErr_Print();
        Py_DECREF(resultObj);
        return "Error converting result to C string";
    }
    std::string result = resultStr;
    Py_DECREF(resultObj);
    return result;
}

Note that I try to decrement the refcount of each object as soon as possible. For example, after retrieving the “filterFunc” callable from the “pluginModule” object, we can immediately decrement the refcount of the “pluginModule” object. The underlying module will not go away, since its reference count is not zero yet.

We also need to make sure that the refcount is properly decremented even if we leave the function early due to an error. For example, when we fail to build the arguments tuple, we decrement the refcount of the “filterFunc” (the only object we have a reference to at that point in the code) before leaving the function.

At the end of the function, we must not decrement the refcount of the “resultObj” string object before we have created a copy of the underlying C string. (The pointer returned by “PyString_AsString()” is only valid as long as the string object has a refcount greater than zero.)

We created another temporary object when setting up “sys.path”. This must be freed as well:

Py_Initialize();
PyObject* sysPath = PySys_GetObject((char*)"path");
PyObject* curDir = PyString_FromString(".");
PyList_Append(sysPath, curDir);
Py_DECREF(curDir);

Note that “Py_DECREF()” is not called on “sysPath” since that one is a borrowed reference.

You might already see the problem with this: It is way too easy to make mistakes. If you forget to call “Py_DECREF()”, you have a leak. If you call “Py_DECREF()” on a borrowed reference, you probably cause a crash. With error handling mixed in, it’s very easy to cause both kinds of problems. Using a C++ library that wraps PyObjects and takes care of reference counting solves these issues (mostly). This will be the topic of a future tutorial.

Debugging Memory Leaks

Sometimes you will need to debug memory issues no matter whether you are using a C++ wrapper or not. For this, it is very useful to have a debug version of the Python interpreter. On GNU/Linux, you can usually just install it from the repositories. For example, on my Ubuntu 12.04 system, I have to “sudo apt-get install python-dbg”. I then build the program with debug options:

g++ -o program program.cpp -I/usr/include/python2.7 -Wall -DPy_DEBUG -g -lpython2.7_d

(Don’t forget the “Py_DEBUG” preprocessor definition when linking against the debug interpreter. Otherwise, you might see crashes and errors like: “Fatal Python error: UNREF invalid object”.)

On Windows, you might have to compile a debug version of the Python interpreter yourself.

One of the things that a debug interpreter allows you to do is query the total reference count of all objects. By calling the function “sys.gettotalrefcount()” at different points in your program, you can check whether this number remains stable.

void PrintTotalRefCount()
{
#ifdef Py_REF_DEBUG
    PyObject* refCount = PyObject_CallObject(PySys_GetObject((char*)"gettotalrefcount"), NULL);
    std::clog << "total refcount = " << PyInt_AsSsize_t(refCount) << std::endl;
    Py_DECREF(refCount);
#endif
}

...

int main(int argc, char* argv)
{
    ...
    std::cout << CallPlugIn(input) << std::endl;
    PrintTotalRefCount();
    ...
}

If you try to remove one of the “Py_DECREF()” calls in “CallPlugIn()”, you will notice that the total reference count goes up after each invocation.

Possible Improvements

The “CallPlugIn()” function in its current form is slightly inefficient. We don’t really have to re-import the plug-in module and retrieve the “filterFunc()” function each time a line of text needs to be transformed. (It’s not as bad as it may appear, though. Once the module has been imported, it remains in “sys.modules”, so each time we call “PyImport_Import()”, we receive a reference to the existing module object.) One possible optimization would be to keep a reference to the “filterFunc” object during the entire lifetime of the program. Then, for each invocation of “CallPlugIn()”, we’d merely have to build an arguments tuple and invoke “filterFunc()”.

If you implement this optimization, though, try to keep your regular C++ code separate from the parts that require knowledge of the Python/C API. It is nice to only use standard C++ types in the interface of the “CallPlugIn()” and be able to keep the PyObjects and Python/C API calls an implementation detail.

Next Time

In the next part, we’ll start with a more complex project. Among other things, we will give the Python plug-ins access to the application classes.

You can download the complete source code for this tutorial.

]]>
http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/feed/ 5
Supercharging C++ Code With Embedded Python – EuroPython 2012 Talkhttp://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/ http://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/#comments Thu, 05 Jul 2012 16:15:13 +0000 http://realmike.org/blog/?p=615 Continue reading Supercharging C++ Code With Embedded Python – EuroPython 2012 Talk ]]> This is the talk that I gave at EuroPython 2012 in Florence, Italy. It was a 60-minute talk, so it’s light on technical details. I am planning to publish follow-up articles that provide step-by-step instructions along with complete code examples. The first part of the tutorial is available. If you want to know when the next part will become available, subscribe to the RSS or add me on Google+ or on Twitter.


(Watch on YouTube.)

You can download the slides in PDF format and the slide sources in SVG format. (Please note the licensing restrictions.)

BTW, our team is hiring. If you’re interested in extending/embedding Python, or just interested in Python in general, you should definitely get in touch with us. Benefits of the position include an agile development process, a variety of projects to work on…and a beach within walking distance.

About Me / About SPIELO

I work as a software architect in the mathematics department at SPIELO International in Graz, Austria.

SPIELO International designs, manufactures and distributes cabinets, games, central systems and associated software for diverse gaming segments, including distributed government-sponsored markets and commercial casino markets.

Our team is responsible for the mathematical game engine that controls all payout-relevant aspects of the game. Part of the engine is an embedded Python interpreter.

Embedded Python: What is it? When to use it?

This talk is about embedding the Python interpreter in a C/C++ program and using the Python/C API to run Python scripts inside the program.

Embedding in a nutshell: Put Python inside your app to run scripts.

Here are some examples of what you can do with this technique.

Plug-in/extension language. This is the “Microsoft Word macro” use case, in which users can extend the functionality of the program by writing their own scripts. Let’s say a users wants to apply random formatting to each word in the text. A simple macro does the trick without requiring you to add a feature that most users don’t need:

Embedding example: Scripting a word processor.

Test automation. Automated testing exercises a program’s functionality by following pre-defined steps. That’s not much different from the macro scenario. Throw in a few “assert” statements and you have a test case. For testing purposes, the embedded Python scripts might have access to functionality that would be unsafe or useless for macros.

Game engines. Video games have been using built-in scripting languages for a long time. While performance-critical parts like graphics, physics, and low-level AI are written in C++, the scripts control things like high-level enemy behavior, map generation, and scripted events. Civilization IV is an example of a game that uses Python for this.

class Guard(Enemy):
    def OnGettingHit(self, actor):
        self.findCover()
        if self.distanceTo(actor) < self.maxShootingDistance:
            self.shootAt(actor)
        else:
            self.team.setAlarmed(True)

All of this can be done directly in C++, of course, but there are certain advantages to using an embedded Python interpreter:

  • Ease of use. You can’t expect the users of your program or the level designers on your team to write plug-ins in C++, but many of them will be willing to learn a bit of Python to get their jobs done.
  • Sandboxing. Plug-ins written in C++ can do pretty much anything on the computer. This may be a security issue. A stripped-down Python interpreter, on the other hand, provides a restricted execution environment for plug-ins.
  • Flexibility. Even if you do know C++, you can try new ideas faster inside a Python script. Also, Python’s reflection capabilities open up new possibilities for automated testing.

Extending Python Recap

Extending and embedding Python are closely related. Extending Python means taking existing C/C++ code and making its data types and functions available to Python programs. Whenever a C/C++ library offers “Python bindings”, it is a Python extension library that uses the Python/C API to plug into the interpreter.

Here’s the big picture.

Using an extension module from Python

When the Python code invokes the C++ function “Sum()”, three things need to happen:

  • The parameters, 5 and 3.2, need to be converted from Python objects to the “int” objects that the C++ code understands.
  • The parameters need to be passed to the Sum() function and the CPU needs to execute the function.
  • The return value (8 if you’ve been paying attention) needs to be converted form an “int” to a Python object.

The Python program is compiled to Python opcode and executed by the Python interpreter. The C++ code is compiled to CPU instructions and executed directly by the CPU. Furthermore, what Python sees when we say “5” or “3.2” is very different from what the compiled C++ code expects to see. These are very distinct (and seemingly incompatible) worlds.

Incompatible worlds: Python bytecodes vs. CPU instructions
Incompatible worlds: PyObjects vs. C++ data types

We said that the Python and C++ worlds are “seemingly incompatible”, but they are really the same world. At its lowest level, the Python interpreter is also just a C program. When you use Python’s built-in data types or the standard library modules, the interpreter calls C functions sooner or later. For example, here’s the (slightly simplified) C code of the “math.radians()” function that converts an angle from degrees to radians:

static PyObject* math_radians(PyObject* self, PyObject* arg)
{
    double x = PyFloat_AsDouble(arg);
    return PyFloat_FromDouble(x * PI / 180.0);
}

When this function is called in the Python code, for example as “math.radians(3.2)”, the argument is a Python object. The Python interpreter knows how to invoke the C function, as long as the C function accepts a Python object as the argument and returns a Python object as the result. Internally, the C code uses the Python interpreter’s “PyFloat_AsDouble()” and “PyFloat_FromDouble()” functions to convert between Python objects and the C “double” data type.

Back to our “Sum()” function. For the Python interpreter to be able to invoke it, we need to rewrite it to take “PyObject” arguments (in this case a tuple of parameters) and return a “PyObject” result. Better yet, instead of rewriting the original “Sum()”, we can introduce a wrapper around “Sum()” that conforms to the required interface:

static PyObject* WrapSum(PyObject* self, PyObject* args)
{
    PyObject* oa;
    PyObject* ob;
    PyArg_UnpackTuple(args, "pow", 2, 2, &oa, &ob);
    long a = PyInt_AsLong(oa);
    long b = PyInt_AsLong(ob);
    long result = Sum(a, b); // call the original Sum()
    return PyInt_FromLong(result);
}

To turn this into an actual extension module, a module object needs to be created that contains the function that we defined. The following code is an unabridged example of this:

static PyMethodDef MyLibMethods[] =
{
    {"Sum", WrapSum, METH_VARARGS, "Calculate the sum of two integers."},
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initmy_c_lib(void)
{
    (void)Py_InitModule("my_c_lib", MyLibMethods);
}

This code is compiled and linked into a DLL/shared object named “my_c_lib.pyd” on Windows and “my_c_lib.so” on Linux. The resulting library can then be “imported” like any other Python module.

To summarize:

  • An extension module is a DLL/shared object.
  • The functions inside the library take PyObjects as arguments and return PyObjects as their results.
  • The library uses Python’s conversion functions to convert between PyObjects and C data types.
  • “import” is used to load an extension module just like any old .py module.

Making Your Life Easier

Writing extension modules in this way is repetitive, somewhat tedious, and error-prone. (And we haven’t even seen any error-checking or memory management code yet.) Here are two ways to simplify the process (I am sure there are more):

  • SWIG, the Simplified Wrapper and Interface Generator.This is a tool that parses your C/C++ header files and automatically generates the code for an extension module. Using SWIG is very easy in the simple cases, but it can be a bit fiddly in the complex cases. We have been using it successfully for mission-critical projects at SPIELO and I would recommend it any time.
  • Boost.Python is a library that basically wraps the Python/C API in C++ template classes. Compiling Boost.Python code requires a fairly modern C++ compiler and lots of memory and patience during the compilation process. For various reasons, we decided against using it for the projects at SPIELO, but that’s not to say you shouldn’t try it for yourself.

From Extending To Embedding

With an extension module, it is the Python executable that invokes functions in the C++ code when the .py code requests it. The main program is Python and the extension module merely provides services to the Python interpreter.

Extending

On the other hand, when you embed the Python interpreter in a C++ program, you are using the interpreter as a library. Just like you would use the Expat library to parse XML files, you can use the Python interpreter as a library to execute Python source code (inside .py files or otherwise).

Embedding

Whether you are extending or embedding, you will be using the Python/C API in both cases. Embedding usually involves a fair amount of extending as well. After all, the Python code running inside the embedded interpreter will have to call back into the application to be useful. This means that your program will contain the same kind of wrapper functions that we saw in the earlier “Sum()” example for all the objects and functions that you want to make available to the Python plug-ins.

High-Level Embedding

As a first step, here’s the code to initialize the embedded interpreter, execute some Python source code, and shut down the interpreter.

#include <Python.h>
int main(int argc, char* argv[])
{
    Py_Initialize();
    PyRun_SimpleString("name = raw_input('Who are you? ')n"
                       "print 'Hi there, %s!' % namen");
    Py_Finalize();
    return 0;
}

This isn’t terribly exciting yet. You can’t pass any arguments to the Python code and you don’t receive any results. If you don’t go beyond this, it would be easier to just run “python.exe” as a sub-process.

Simple Plug-In

Let’s try a more involved use case. Here’s some C++ program that allows the user to write a Python plug-in to transform a string.

void program()
{
    std::string input;
    std::cout << "Enter string to transform: ";
    std::getline(std::cin, input);
    std::string transformed = CallPythonPlugIn(input);
    std::cout << "The transformed string is: " << transformed.c_str() <<
    std::endl;
}

The magic is supposed to happen inside the “CallPythonPlugIn()” function that we’ll implement in a minute. This function will invoke a function named “transform()” in a user-provided Python file:

# Example "plugin.py"
def transform(s):
    return s.replace("e", "u").upper()

With this, the “CallPythonPlugIn()” function might look something like this. (For brevity, I left out all of the error checking. In other words, don’t use the code as is! I will present a more complete implementation in a follow-up article.)

// WARNING! This code doesn't contain error checks!
std::string CallPythonPlugIn(const std::string& s)
{
    // Import the module "plugin" (from the file "plugin.py")
    PyObject* moduleName = PyString_FromString("plugin");
    PyObject* pluginModule = PyImport_Import(moduleName);
    // Retrieve the "transform()" function from the module.
    PyObject* transformFunc = PyObject_GetAttrString(pluginModule, "transform");
    // Build an argument tuple containing the string.
    PyObject* argsTuple = Py_BuildValue("(s)", s.c_str());
    // Invoke the function, passing the argument tuple.
    PyObject* result = PyObject_CallObject(transformFunc, argsTuple);
    // Convert the result to a std::string.
    std::string resultStr(PyString_AsString(result));
    // Free all temporary Python objects.
    Py_DECREF(moduleName); Py_DECREF(pluginModule); Py_DECREF(transformFunc);
    Py_DECREF(argsTuple); Py_DECREF(result);

    return resultStr;
}

The “CallPythonPlugIn()” function is roughly equivalent to this Python code:

def CallPythonPlugIn(s):
    pluginModule = __import__("plugin")
    transformFunc = getattr(pluginModule, "transform")
    argsTuple = (s,)
    result = transformFunc(*args)
    return result

And that’s the whole secret of embedding Python: Once you know what you’d like the Python interpreter to do, it’s a matter of mapping the Python code to the respective Python/C API functions using the reference docs.

Extending And Embedding Combined

At some point, you will want to access your C++ functionality from the Python plug-ins. For example, you might want to invoke the “Sum()” function that we wrapped earlier:

// Example "plugin.py"
import the_program
def transform(s):
    return the_program.TransformHelper(s).lower()

In this case, we don’t want the module “the_program” to be an extension module in a separate shared object. Instead, the module should live in the program itself so that it has access to the program’s internals.

The C++ function “TransformHelper()” needs to be wrapped using the same techniques that we applied to the “Sum()” function in an earlier example.

static PyObject* WrapTransformHelper(PyObject* self, PyObject* arg)
{
    const char* str = PyString_AsString(arg);
    std::string result = TransformHelper(str);  // invoke the C++ function
    return PyString_FromString(result.c_str());
}

// Register the wrapped functions.
static PyMethodDef TheProgramMethods[] =
{
    {"TransformHelper", WrapTransformHelper, METH_O, "Transforms a string."},
    {NULL, NULL, 0, NULL}
};

// Somewhere in your program, initialize the module. This is all
// that's required to allow the plug-in to run "import the_program".
Py_InitModule("the_program", TheProgramMethods);

With this, the plug-in can invoke the internal functionality of the program. Of course, it is also possible to wrap entire C++ classes and not just functions. This is a topic for a follow-up article.

Summary

Embedding involves these tasks:

  • Using the Python/C API to do things that you would normally do in Python. The reference docs help you find the right function to do the job.
  • Lots of converting from PyObjects to and from C/C++ data types, just like with extending.
  • Wrapping the internal objects of your program so that the embedded Python code has access to them, just like with extending.

C++ Wrappers for the Python/C API

Using a C++ library that wraps the Python/C API offers several advantages:

  • Integration with C++ data types such as std::string, iostream, etc.
  • Simplified error handling using exceptions
  • Avoids memory leaks by taking care of the reference count of PyObjects

We have used two libraries in the past:

  • Boost.Python. Makes heavy use of C++ templates.
  • PyCXX. Simple and straightforward library.

At SPIELO, we’re currently not using any wrapper library for the embedded interpreter. We do, however, use our own code generators to generate large parts of the most repetitive glue code, which reduces the need for C++ wrappers.

SPIELO Case Study

In our mathematical game engine, embedded Python allows mathematicians and game designers to define game rules that go beyond the pre-defined building blocks that the engine has to offer. The game rules are encoded in a data file that is interpreted by the engine. At certain points in the game flow, Python plug-ins may be invoked to check for additional winning conditions, award special prizes, change the game flow, etc.

The following sections briefly describe certain decisions we made when adding Python to the engine.

Why Python?

Before we added the embedded Python interpreter, our mathematical game engine already had a built-in scripting language. In fact, it was a bytecode interpreter that had to run on an ancient Z80 CPU (one of our target platforms at the time), which meant its functionality was very limited: It had only a single register for storing intermediate results, a 255-byte limit for bytecode programs, no floating-point support, etc.

We had a C-like language on top of the bytecodes. When the limitations of the bytecode interpreter became too much of a burden, we initially considered extending this C-like language with killer features like local variables and sub-routines that would support actual parameter lists and return values.

But how do you design a powerful scripting language that’s easy to learn, readable, and extensible? The answer is, you use an existing language that gets it right, and Python had a lot going for it in this respect:

  • The engine team had lots of experience with Python from other projects.
  • Most of the users of the engine had Python experience.
  • The Python project is mature and has a great community behind it.
  • The interpreter is light-weight and highly portable and its license fits our needs.

Integrating Python into the game engine took us just a few weeks.

To Fork Or Not To Fork

Usually, you can embed the Python interpreter that’s already installed on the user’s system by dynamically linking your program to the installed Python DLL/shared object. This allows embedded Python scripts to use all packages that are available in the existing Python installation.

For us, on the other hand, it was not an option to rely on the versions of Python that come pre-packaged for our target platforms. First of all, there is not even an official Python port for some of these platforms (for example, Windows CE). Second, as the Python interpreter is directly responsible for evaluating parts of the game rules, it is subject to the same strict regulations as the rest of the game engine. Therefore, we are treating the Python interpreter as an integral part of the engine: We track its source code in the same repository as the rest of the engine and include it in the testing/release process of the engine.

To make Python compile on some platforms, we had to make changes to the source code. We usually just rip the parts out that don’t work well on all platforms and that we don’t need anyway. These changes are not suitable to be patched into mainline Python, so we essentially created a fork. The fork means that it is unlikely that we’ll upgrade to a newer version of Python any time soon, but it doesn’t really matter to plug-in authors what version of Python they’re using.

Stripping Down And Sandboxing

The Python standard library gives you access to operating system services, internet protocols, graphical user interfaces, and more. Most of this isn’t needed or even desired for a plug-in language.

For the mathematical game engine, we only included built-in modules, i.e., modules that are compiled directly into the interpreter and that don’t require additional .py files to operate. Of those modules, we only included the ones that plug-in authors actually need and that are safe to use. We don’t include things like networking or operating system support, because there’s no reason why the mathematical game engine should mess with the OS, open HTTP servers, or the like. Aside from the security concerns, we don’t want to give plug-in authors (more) opportunities to shoot themselves in the foot.

In a Python interpreter that’s part of, say, a word processor, security is a major concern. You don’t want a plug-in that’s contained in an email attachment to be able to change any files, send any network requests, or run any system commands without the user’s permission. In this case, it’s a good idea to compile your own Python interpreter and leave out all the dangerous bits.

Embedded Debugging

As the saying goes, with great power comes a great danger of bugs.

With a stand-alone Python program, you can step through the code in your favorite Python IDE. With an embedded interpreter, that’s not possible. Still, we wanted to give our users a nice graphical debugger for their plug-ins, integrated into the game editor in a similar way as the VBA Editor is integrated in Microsoft Office.

Our approach uses “PyEval_SetTrace()” to register a function that’s invoked when each source line is executed. This is almost enough to build all kinds of single-stepping (“Step Into”, “Step Over”, “Step Out”) and breakpoints. In addition, you need to be able to retrieve the stack frames (to display a call stack) and to evaluate Python expressions (to display and manipulate variables in the current stack frame and for conditional breakpoints).

A follow-up article will explain this in more detail.

Closing Remarks

Even though Python is awesome, some problems are best solved in C++. But even these C++ programs can be supercharged by adding some Python back in. Hopefully this talk inspires you to build something awesome with an embedded Python interpreter. If you do, I’d love to hear about it.


This article as well as the slide sources in SVG format. are Copyright 2012 Michael Fötsch, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. The slides in PDF format contain additional material that is Copyright 2012 SPIELO International, All Rights Reserved.

]]>
http://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/feed/ 5
Python Training – Part 5http://realmike.org/blog/2012/06/07/python-training-part-5/ http://realmike.org/blog/2012/06/07/python-training-part-5/#comments Thu, 07 Jun 2012 19:42:22 +0000 http://realmike.org/blog/?p=598 Continue reading Python Training – Part 5 ]]> Part 1 | Part 2 | Part 3 | Part 4

This is part 5 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Module Basics

Python modules are a way to physically organize your source code. It would be impractical to have the entire source code in a single .py file. You can extract parts of your source code (functions, classes, constants, etc.) to their own .py files and use them in your main program.

Single program

program.py:

PI = 3.14

def PiMalDaumen(daumen):
   return daumen * PI

print PiMalDaumen(5) / PI
Program and module

utility.py:

PI = 3.14

def PiMalDaumen(daumen):
   return daumen * PI

program.py:

import utility
if __name__ == "__main__":
    print (utility.PiMalDaumen(5)
           / utility.PI)

Note: Once a .py file has been imported, a .pyc file is created so that the module can be loaded faster the next time it is needed. Modules can also be extensions written in other languages such as C++. These extension modules are DLL files with the filename extension .pyd (or .dll in older Python versions).

What’s the difference between a Python module and a Python program?

Both Python programs and Python modules are just .py files. The difference is merely in usage. Apart from the different usage, the code in programs and modules is executed by the Python interpreter in the same way.

  • A program is a .py (or .pyw1) file that you double-click or run with “python program.py”
  • A module is a .py file that you import in your main program or in other modules to reuse the objects (functions, classes, constants, etc.) that it contains

1 The extension .pyw is used if you don’t want a console window to be opened when you double-click the file. Alternatively, you can use the extension .py and run the program with “pythonw program.py” instead of “python program.py”.

To be useful, a program contains some code that is executed right away when the .py file is double-clicked. A module, on the other hand, usually does not do anything at the moment it is imported. It only provides objects that can be used by the importing module later on.

Inside a .py file, you can find out whether you are the main program or a module:

def SomeFunc(a):
    print "To be used by others who import me or by myself."

if __name__ == "__main__":
    print "I was started as the main program."
    print "Do whatever I'm supposed to do."
else:
    print "I was imported as the module named", __name__

Note: You should always enclose code that should be executed when the main program is run in “if __name__ == “__main__””. This way, the code is not executed if you ever decide to import your main program as a module into a different program.

Syntax of “import”

The “import” keyword can be used in different ways. The following examples will import the module “mymodule.py”, which is shown here:

PI = 3.14

def SomeFunc(obj):
    return 5

class SomeClass:
    def SomeMethod():
        return 10

There are three forms of import:

  • import mymodule: This is the most basic form. The name “mymodule” now refers to a module object whose attributes are the objects defined at the top level of the imported module:
    import mymodule
    print mymodule.PI * mymodule.SomeFunc()
    obj = mymodule.SomeClass()
    print obj.SomeMethod()
  • import mymodule as m:This is similar to the first form, but it assigns a different name to the imported module within the importing module:
    import mymodule as m
    print m.PI * m.SomeFunc()
    obj = m.SomeClass()
    print obj.SomeMethod()

    Note: This is the same as this:

    import mymodule
    m = mymodule
    ...
  • from mymodule import SomeFunc:You can list the things you want to use from the imported module and use them without prefixing them with the module name:
    from mymodule import PI, SomeFunc
    print PI * SomeFunc()

    Note: This is the same as this:

    import mymodulePI = mymodule.PISomeFunc = mymodule.SomeFunc
    ...

    Note: You could write “from mymodule import *” to import everything from the given module at once. However, I strongly discourage you from doing this, because it makes the code less readable, and there’s a danger of conflicts between objects with the same names coming from different modules.

Where does Python look for .py files to import?

When Python encounters code like “import mymodule”, it looks for the module “mymodule” in these places:

  1. In the directory that contains the importing module
  2. In all the directories that are contained in the list “sys.path” at the time the import is executed (execute “import sys; print sys.path” to see what it contains)

Note: Python does not look in the current working directory unless the path “.” is explicitly contained in “sys.path”.

The list “sys.path” is filled in this way:

  • From hard-coded paths in the Python interpreter, e.g, “C:Python25Lib”, “C:Python25Libsite-packages”, etc.
  • From paths contained in the environment variable “PYTHONPATH” when Python was started
  • From paths listed in all files with the extension .pth found in “C:Python25” when Python was started
  • Explicitly by your program by modifying “sys.path”,e.g.:
    import sys
    # Ensure that modules located in the current working
    # directory take precedence over all other directories.
    # Note: This refers to the cwd at the time the import will
    # be executed, not necessarily the cwd at this very moment.
    sys.path.insert(0, ".")
    
    # Add a sub-directory of the current working directory.
    # Use the absolute path via os.getcwd() so that it doesn't
    # change when we change the cwd later, e.g. via os.chdir().
    import os
    sys.path.append(os.path.join(os.getcwd(), "subdir"))

    Note: Fiddling with “sys.path” inside the program is sometimes necessary, but most often it is not the right way to do things. Maybe you should organize your modules in packages (see section “Packages”) or extend the search path with a .pth file?

Modules are loaded only once

Each module is loaded only once. If two modules in the same program contain a line “import mymodule”, the module “mymodule” is loaded when the first one of them is executed. The second one receives a reference to the module that’s already loaded.

mymodule.py:

print "mymodule"
def X():
    return "X"
program.py:

print "program"
import mymodule
import utility
print "program:", mymodule.X()
utility.py:

print "utility"
import mymodule
print "utility:", mymodule.X()

When you execute program.py, the output is:

program

mymodule

utility

utility: X

program: X

Note: You can use “reload()” to force a module to be re-executed. This might be useful when you expect your modules to be modified while the program is being executed (when writing a debugger, for example), but shouldn’t be necessary otherwise.

Packages

Packages are a way to organize modules in directories.

Everything in a single directory

  main.py
  database_logic.py
  database_reports.py
  gui_window.py
  gui_button.py
  gui_backend_gtk.py
  gui_backend_win32.py
  gui_backend_osx.py
Grouped hierarchically

  main.py
  database
    logic.py
    reports.py
  gui
    window.py
    button.py 
    backends
      gtk.py
      win32.py
      osx.py

If there were no packages, you could be tempted to add all sub-directories to Python’s module search path:

Bad practice
import sys
sys.path += ["database", "gui", "gui\backends"]
import logic
from window import Window
import gtk

Packages provide a much cleaner way:

# Import the module databaselogic.py
import database.logic

# Import "Window" from the module guiwindow.py
from gui.window import Window

# Import the module guibackendsgtk.py and assign a shorter name
import gui.backends.gtk as backend

To make this work, all you need to do is place an empty file named “__init__.py” in each directory that should be treated as a package:

  main.py
  database
    __init__.py
    logic.py
    reports.py
  gui
    __init__.py
    window.py
    button.py
    backends
      __init__.py
      gtk.py
      win32.py
      osx.py

Intra-Package Imports

Consider this package hierarchy:

myapp
  __init__.py
  database
    __init__.py
    logic.py
    reports.py
  gui
    __init__.py
    window.py
    button.py
    backends
      __init__.py
      gtk.py
      win32.py

A module inside a package can import modules inside a sub-package normally:

# From   myappguiwindow.py
# Import myappguibackendsgtk.py
import backends.gtk

A module inside a sub-package can also import modules from other parts of the package hierarchy using absolute imports:

# From   myappguibackendsgtk.py
# Import myappdatabasereports.py
import myapp.database.reports

Note: For this to work, the top-level package “myapp” must be in the module search path (see “sys.path” in section “Module Basics”). If Python can’t find “myapp”, you will get the error “ImportError: No module named myapp”.

Use with care:

In Python 2.5 and later, you can also use relative imports (“.” refers to the current package, “..” to the parent package, “…” to the grand-parent package, etc.):

# From   myappguibackendsgtk.py
# Import myappguibutton.py
from .. import button

# Import myappdatabasereports.py
from ...database import reports

There are many subtleties involving relative imports. They are not just a straight translation from filesystem paths to import syntax. For example, this code only works when gtk.py was itself imported from somewhere outside the package using something like “import myapp.gui.backends.gtk”, not when you run “python gtk.py”.

Modules and Reflection

Wikipedia says, “reflection is the process by which a computer program can observe and modify its own structure and behavior.” Reflection is a useful and powerful tool and can be used with modules just like with any other object.

Import Modules with Names Determined at Runtime

You can use the “__import__()” function to load a module whose name is only known at runtime. This does not work:

Does not work
# We basically want to
#    import mymodule
# but the name "mymodule" is stored in a string variable
module_name = "mymodule"
import module_name as m   # Nope. This actually looks for a file
                          # named "module_name.py".
m.FuncInsideModule()

This works:

module_name = "mymodule"
m = __import__(module_name)
m.FuncInsideModule()

The previous example isn’t particularly useful yet, so here’s a real-world example where “__import__()” can be put to good use.

Consider a program that the user can extend by providing plug-in modules. The user is expected to place the .py files inside the “plugins” package and the program scans the directory at startup and loads the modules.

  program.py
  plugins
    __init__.py
    colorize.py
    sort.py
    filter.py

If we wanted to hard-code the plug-in names, we could write something like this:

Bad because everything’s hard-coded
# program.py
import plugins.colorize
import plugins.sort
import plugins.filter

plugin_modules = [plugins.colorize,
                  plugins.sort,
                  plugins.filter]
...
for m in plugin_modules:
    data = m.ApplyPlugin(data)

But having to add “import” statements to the program manually is tedious. We can do better by using “os.listdir()” to scan the directory and the “__import__()” function to import the plugins:

# program.py

import os

if __name__ == "__main__":
    plugin_files = os.listdir("plugins")
    plugin_modules = []
    for fn in plugin_files:
        if fn.endswith(".py") and fn != "__init__.py":
            module_name = os.path.splitext(fn)[0]
            import_name = "plugins.%s" % module_name
            plugin_modules.append(
                # basically do "from plugins import "
                __import__(import_name, fromlist=[module_name]))
    ...
    for m in plugin_modules:
        data = m.ApplyPlugin(data)

Inspecting Module Contents

Module objects provide several built-in attributes:

  • __name__: The module name, as specified in the “import” statement
  • __file__: The path to the .py file1from which the module was loaded.
    • Be careful! This might be a path relative to the current working directory at the time when the import was executed. The current working directory might have changed since then.
  • __dict__: A dict containing all objects (functions, classes, variables, etc.) that the module contains
  • __doc__: The docstring of the module

1 The file could also be a .pyc or .pyd file, or whatever the module was loaded from.

The following source code prints some information about the “os” module:

import os
print "Module os loaded from", os.__file__
print "Docstring:", os.__doc__
print "Contains the following objects:"
for name, obj in os.__dict__.iteritems():
    print name, ":", type(obj)

This will print something like this:

Module os loaded from c:python25libos.pyc
Docstring: OS routines for Mac, NT, or Posix depending on what system we're on.

This exports:
  - all functions from posix, nt, os2, mac, or ce, e.g. unlink, stat, etc.
...

Contains the following objects:
lseek : <type 'builtin_function_or_method'>
O_SEQUENTIAL : <type 'int'>
pathsep : <type 'str'>
execle : <type 'function'>
_Environ : <type 'classobj'>
urandom : <type 'builtin_function_or_method'>
execlp : <type 'function'>
...

You can also use dir(), getattr(), hasattr(), and setattr() to access the module’s contents just like with other objects.

As a real-world example, consider a program that runs test cases contained inside a user-provided Python module. A test case is any function whose name starts with “Test_”.

# test_runner.py
# Start with:
#    python test_runner.py 
import sys
if __name__ == "__main__":
    module_name = sys.argv[1]
    test_suite = __import__(module_name)
    print "Running", test_suite.__doc__
    if hasattr(test_suite, "InitTests"):
        init_func = getattr(test_suite, "InitTests")
        init_func()
    for obj_name in dir(test_suite):
        if obj_name.startswith("Test_"):
            test_func = getattr(test_suite, obj_name)
            print "Performing", test_func.__name__, "...",
            print test_func()

This is an example test suite:

# module_to_test.py

"My test suite"

def InitTests():
    print "Initializing..."

def Test_Case1():
    return 5 * 5 == 25

def Test_Case2():
    return -1 ** 0 == 0

This is the output of the program:

> python test_runner.py module_to_test
Testing Various test cases
Initializing...
Performing Test_Case1 ... True
Performing Test_Case2 ... False

Memory Leaks

In C++, a memory leak typically occurs because your program forgets about a chunk of memory that it reserved. The memory is never freed although your program doesn’t make any use of it. If this happens too often, memory usage of the program might reach critical levels.

In Python, a memory leak occurs because your program keeps references to unneeded objects.

Python uses reference counting and a garbage collector to prevent most types of memory leaks:

x = [1, 2, 3]
d = {4: x}
y = (x, d)
# Here, the list [1, 2, 3] has ref count 3.
d.clear()
# Now it's down to 2.
x = 0
# Now the tuple y has the only reference left.
del y
# The list [1, 2, 3] has ref count 0. The garbage collector
# can free its associated memory at any time now.

In order to produce a memory leak in Python (or rather, to cause undesired memory consumption), you have to accumulate references to objects that you wouldn’t otherwise need. Here’s a hypothetical and somewhat trivial example:

class File:
    def __init__(self, filename):
        self.m_filename = filename
        self.m_file_contents = open(filename).read()

processed_files = []
for fn in filename_list:
    f = File(fn)
    DoSomeImportantProcessing(f)
    processed_files.append(f)

print "Processed files:", [f.m_filename for f in processed_files]

Here, the list “processed_files” keeps references to “File” objects, although only the filenames of the objects will be needed after the loop. However, the “File” objects contain references to the file data. The peak memory usage of the program is the total size of all processed files. Of course, it would be more efficient to just store “processed_filenames”, like this:

processed_filenames = []
for fn in filename_list:
    f = File(fn)
    DoSomeImportantProcessing(f)
    processed_filenames.append(fn)

print "Processed files:", processed_filenames

The previous example isn’t a real memory leak. It’s just a case of undesired memory consumption that might not have been as obvious if the program was larger and more contrived.

Python can suffer from real memory leaks, though:

  • Memory leaks in C/C++ extension libraries (either caused by bugs in the libraries or by incorrect usage)
  • Reference cycles involving objects with overloaded “__del__()” methods

Here’s an example of a reference cycle:

Not a memory leak
class X():
    pass

x = X()
y = X()
z = X()
x.next = y
y.next = z
z.next = x
del x
del y
del z

In this case, although there’s a reference cycle, the Python garbage collector is able to break the cycle and all the memory is freed as expected.

Memory leak
class X():
    def __del__(self):
        pass

x = X()
y = X()
z = X()
x.next = y
y.next = z
z.next = x
del x
del y
del z
import gc
gc.collect() # let the garbage collector do its work right now
print gc.garbage
[<__main__.X instance>, <__main__.X instance>,
 <__main__.X instance>]

The objects “x”, “y”, and “z” are not garbage-collected. Python cannot decide which object to delete first, because it doesn’t know whether our implementation of “__del__()” relies on a specific order. Therefore, the memory of the three “X” objects cannot be freed. You can break the cycle manually, as described in the Python docs.

This problem affects your code only if you have cyclic references among your objects and the involved classes implement “__del__()” methods.

Further information:

]]>
http://realmike.org/blog/2012/06/07/python-training-part-5/feed/ 0
Python Training – Part 4http://realmike.org/blog/2012/06/07/python-training-part-4/ http://realmike.org/blog/2012/06/07/python-training-part-4/#comments Thu, 07 Jun 2012 18:54:43 +0000 http://realmike.org/blog/?p=588 Continue reading Python Training – Part 4 ]]> Part 1 | Part 2 | Part 3 || Part 5

This is part 4 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

More Common Scripting Tasks

Parsing Command-Line Arguments

The command-line arguments to the Python program can be found in the “sys.argv” variable.

When you start a Python program with this command line:

C:> tempcmd_line.py -x -o “the output.txt” input.txt

The contents of “sys.argv” are as follows:

[‘C:\temp\cmd_line.py’, ‘-x’, ‘-o’, ‘the output.txt’, ‘input.txt’]

The “optparse” module provides more convenient access to the command-line arguments:

  • You can specify a list of supported arguments along with their data types
  • You can specify default values for omitted arguments
  • The usage screen (“–help” or “-h”) is generated automatically
  • …and much more

As an example, let’s use “optparse” to interpret the arguments of this GNU program:

> head -h
Usage: head [OPTION]... [FILE]...
Print first 10 lines of each FILE to standard output.
With more than one FILE, precede each with a header giving the file name.
With no FILE, or when FILE is -, read standard input.

  -c, --bytes=SIZE         print first SIZE bytes
  -n, --lines=NUMBER       print first NUMBER lines instead of first 10
  -q, --quiet, --silent    never print headers giving file names
  -v, --verbose            always print headers giving file names
      --help               display this help and exit
      --version            output version information and exit

...

Report bugs to <bug-textutils@gnu.org>.

What we see:

  • The program accepts a list of options (using a “-“ or “–” prefix) and arguments (the list of files)
  • There are options with and without parameters (“-n” takes the number of lines, while “-v” does not require a parameter)
  • Some options have default values.

Here’s a Python program with a command-line interface like that:

import optparse
import sys

def Main():
    parser = optparse.OptionParser(
        usage="Usage: %prog [OPTION]... [FILE]...",
        version="Version 1.0.nWritten by me.",
        description="Print first 10 lines of each FILE ...",
        epilog="... Report bugs to <bug-textutils@gnu.org>.")
    parser.add_option("-c", "--bytes", type="int",
                      metavar="SIZE",    # Here, the default
                                         # metavar would be "BYTES"
                      help="print first SIZE bytes")
    parser.add_option("-n", "--lines", type="int",
                      metavar="NUMBER", dest="num_lines",
                      help="print first NUMBER lines instead of first 10")
    parser.add_option("-q", "--quiet", "--silent",
                      action="store_true",
                      help="never print headers giving file names")
    parser.add_option("-v", "--verbose", action="store_true",
                      help="always print headers giving file names")

    parser.set_defaults(num_lines=10, quiet=False, verbose=False)
    options, args = parser.parse_args()

    print "-c =", options.bytes
    print "-n =", options.num_lines
    print "-q =", options.quiet
    print "-v =", options.verbose
    print "args =", args

if __name__ == "__main__":
    sys.exit(Main())

How it works:

  • To define the command-line interface, create an “optparse.OptionParser” instance and invoke its methods.
  • The “OptionParser” initializer takes a number of optional keyword arguments for things like version info and help text.
  • You add options using the “add_option” method:
    • The first arguments specify the short and long option strings.
    • Additional keyword arguments specify things like data type and help string.
  • Default values are best set using the “set_defaults” method.
  • The “parse_args” method returns an object that contains the option values as attributes and a list of positional arguments.

Try to invoke the program using the following command lines:

> cmd_line.py -h

> cmd_line.py somefile

> cmd_line.py –n 12 somefile

> cmd_line.py –n this_is_not_a_string

> cmd_line.py –unknown-option

Reading from INI Files

The “ConfigParser” module can be used to read and write INI files.

Here’s an example INI file:

; Example INI file
[Basic]
quiet=True
lines=10
multiline=This is
 a multi-line value

[Files]
dir=c:temp
# %(dir)s will be replaced with the value of dir
input=%(dir)sinput.txt

This INI file can be read as follows:

import ConfigParser

if __name__ == "__main__":
    p = ConfigParser.SafeConfigParser()
    p.read(["config.ini"])
        # read() can load several INI files at once.
    print "multiline =", p.get("Basic", "multiline")
    print "quiet =", p.getboolean("Basic", "quiet")
    print "lines =", p.getint("Basic", "lines")
    print "input =", p.get("Files", "dir")
    print "input =", p.get("Files", "input")
    try:
        p.get("Basic", "non-existant")
    except (ConfigParser.NoSectionError,
            ConfigParser.NoOptionError), e:
        print e
    print "Sections:", p.sections()
    print "Items in Basic:", p.items("Basic")

Creating and Reading ZIP Files

Use the “zipfile” module to work with ZIP files.

Create a new archive:

The following code creates a new archive with two files:

  • One file is read from a file on disk and stored under a different name in the archive.
  • The other file is constructed directly from a Python string (which could also contain binary data).
import zipfile

z = zipfile.ZipFile("new_archive.zip", "w",
                    compression=zipfile.ZIP_DEFLATED)
z.write("file.txt",             # This file on disk...
        "subdir/t.txt")         # ...is added under this name.
z.writestr("text.txt",          # Name in the archive
           "Specify the contents of the file directly")
z.close()

Add files to an existing archive:

To add files to an existing archive, specify mode “a” when opening the file:

z = zipfile.ZipFile("existing_archive.zip", "a")
z.write(...

Read an existing archive:

Using the “infolist” method, you can retrieve a list of “ZipInfo” objects for each file in the archive. Using the “read” method, you can retrieve the byte stream of a file as a Python string:

z = zipfile.ZipFile("new_archive.zip", "r")
for i in z.infolist():
    print i.filename, i.compress_size, i.file_size, "etc."
    print "File contents:", z.read(i.filename)
z.close()

Note: See also the modules “tarfile”, “gzip”, “bz2”, and “zlib” for other ways of creating archives and compressing data.

Interfacing with C++

There are many levels on which you can use Python and C++ (or other programming languages) together:

  • Interpret binary data (potentially produced by C++)
  • Invoke functions in a C++ DLL from Python
  • Write Python extension modules in C++ (advanced)
  • Embed the Python interpreter in C++ to offer scripting facilities (advanced)

Working with Binary Data

Let’s assume you have a C++ program that writes the following struct to a binary file:

struct TestStructure
{
    unsigned char ByteMember;
    signed short ShortMember;
    char StringBuffer[11];
    unsigned long LongMember;
};

void PrintTestStructToFile(const char* filename)
{
    TestStructure t;
    t.ByteMember = 1;
    t.ShortMember = -1;
    strcpy(t.StringBuffer, "abcdefg");
    t.LongMember = 0xcafebabe;

    FILE* f = fopen(filename, "w");
    fwrite(&t, sizeof(t), 1, f);
    fclose(f);
}

When you open the file in binary mode, you might get a string like this:

‘x01xccxffxffabcdefgx00xccxccxccxccxbexbaxfexca’

We want something else.

First, use the “ctypes” module and re-define the struct in Python:

import ctypes
class TestStructure(ctypes.Structure):
    _fields_ = [("ByteMember", ctypes.c_ubyte),
                ("ShortMember", ctypes.c_short),
                ("StringBuffer", ctypes.c_char * 11),
                ("LongMember", ctypes.c_ulong)]

Next, we can read the file into a “ctypes” byte buffer and “cast” it to the struct type:

data = ctypes.create_string_buffer(
    open(filename, "rb").read())

struct = TestStructure.from_address(ctypes.addressof(data))
    # Of course, we can also create an uninitialized instance
    # by writing "struct = TestStructure()".
print "ByteMember", struct.ByteMember
print "ShortMember", struct.ShortMember
print "StringBuffer", struct.StringBuffer
print "LongMember", hex(struct.LongMember)

# When initializing the struct from a pointer to a data buffer,
# the buffer must live at least as long as we use the struct.
# Therefore, store a reference to "data" right in "struct".
struct.data = data

We can also save the data back to a binary file from Python:

open(filename, "wb").write(
    ctypes.string_at(ctypes.addressof(struct),
                     ctypes.sizeof(struct)))

See the help for the “ctypes” module for more information.

Note: You can also use the “struct” module for working with binary data.

Invoking Functions in a C++ DLL

Let’s assume we have a DLL that exports the following function:

extern "C" __declspec(dllexport)
const char* __stdcall WorkWithFile(
    const char* filename)
{
    printf("Doing something with %sn", filename);
    return "It worked!";
}

We can invoke it from Python like this:

dll = ctypes.WinDLL("cpp_code.dll")
dll.WorkWithFile.restype = ctypes.c_char_p
print dll.WorkWithFile("some_file.txt")

Things to note:

  • We’re using “ctypes.WinDLL”, because we want to call a “__stdcall” function. (We’d use “ctypes.CDLL” for “__cdecl” functions.)
  • By default, “ctypes” assumes that the return type of the function is an integer. By setting the “restype” attribute of the function wrapper, we can specify the real return type.

What if you want to use C++ classes exported from a DLL?

You should compile the C++ code as a Python extension module. See the next section.

(Exporting classes directly from a DLL is generally not a good idea. This approach is not portable across different compilers, or even different versions of the same compiler. Everything you export should be declared as “extern “C”” to avoid problems with name mangling.)

Extending and Embedding Python

Many of the modules in the standard Python library are actually extension modules written in C.

Wrapping up some C++ code as an extension module is a task that can be largely automated using the SWIG program. See my article “Python Extensions in C++ Using SWIG”.

It is also possible to embed the Python interpreter in a C++ program. This is especially useful if you want to provide a simple way for users to write plug-ins for your program, or to provide a built-in scripting language (like VBA for Microsoft Office).

In the mathematics department, we’ve been using the PyCXX C++ library successfully for this task. See http://cxx.sourceforge.net/.

Introduction to GUI Programming

wxPython Demo

There are many GUI toolkits available for Python. In the mathematics department, we’re using wxPython exclusively (http://www.wxpython.org/), which is a binding for the cross-platform wxWidgets library (http://www.wxwidgets.org/). wxPython allows us to create complex, state-of-the-art GUIs relatively easily (HOMER being the most recent example).

Once you have installed wxPython to your local Python installation, you should take a look at the wxPython Demo (usually in Start → Programs →wxPython2.8 Docs Demos and Tools → Run the wxPython DEMO).

In the next few sections, we’ll use wxPython to build some very simple GUIs to make your scripts easier to use for people who don’t know what the command line is. 😉

Dialog Boxes in Command-Line Programs

Sometimes you have a command-line program and just want to ask the user for a filename, pick an item from a list, enter a string, or whatever. This can be done easily.

Add this code to your program:

import wx
import wx.lib.dialogs as dlg
g_app = wx.PySimpleApp()    # This object must be alive as long as
    # you want to open dialogs. Without an app object, the program
    # will crash.

Usability note: Consider adding a command-line or INI file-based interface to your program in addition to the GUI. This way, the program can be run unattended from a script, without having to make the same choices manually each time the program is run.

Displaying Messages

To display a simple message box with an OK button, use this code:

dlg.messageDialog(message="Message", title="Title", aStyle=wx.OK)

You can also display other buttons or add an icon:

result = dlg.messageDialog(
    message="Are you sure?", title="The Tool",
    aStyle=wx.YES | wx.NO | wx.ICON_WARNING)
if result.returned == wx.ID_YES:
    print "As you wish, Master!"

To display a longer text and/or to allow the user to copy the text to the Clipboard, use this:

dlg.scrolledMessageDialog(
    message="This is some long textn" * 100, title="The Tool")

Asking for Some Text

Example usage:

result = dlg.textEntryDialog(title="Enter something",
                             message="Right here",
                             defaultText="default")
if result.accepted:
    print "You entered", result.text
else:
    print "Cancelled"

Asking for Files and Directories

Asking for files to open:

result = dlg.openFileDialog(title='Open',
    directory='c:\temp', filename='x.txt',
    wildcard='Text Files (*.txt)|*.txt',
    style=wx.OPEN | wx.MULTIPLE)
if result.accepted:
    print "You selected", result.paths

Asking for a file to save:

result = dlg.saveFileDialog(title='Save',
    directory='c:\temp', filename='x.txt',
    wildcard='Text Files (*.txt)|*.txt',
    style=wx.SAVE | wx.OVERWRITE_PROMPT)
if result.accepted:
    print "You selected", result.paths

Asking for a directory:

result = dlg.dirDialog(message='Choose a directory',
                       path='c:\temp')
if result.accepted:
    print "You selected", result.path

Offering Multiple Choices

To allow the user to pick a single item:

result = dlg.singleChoiceDialog(message='Choose wisely',
                                title='The Tool',
                                lst=['Blue pill', 'Red pill'])
if result.accepted:
    print "Your choice:", result.selection

To allow the user to pick multiple items:

result = dlg.multipleChoiceDialog(message='Choose',
    title='The Tool', lst=['Cheese', 'Ham', 'Mushrooms'])
if result.accepted:
    print "Your choice:", result.selectio

A Minimal GUI Application

Here’s the code to open a main window:

import wx

class MainFrame(wx.Frame):
    pass

class App(wx.App):
    def OnInit(self):
        frame = MainFrame(parent=None,
                          title="The GUI")
        frame.Show()
        return True

if __name__ == "__main__":
    app = App(redirect=False)
    app.MainLoop()

What we see:

  • To create a GUI application, derive a class from “wx.App”, instantiate it, and call its “MainLoop” method.
  • In the “OnInit” method, the App object creates the main frame.
  • We create the App object with “redirect=False”. This means that all messages (including the output of “print” statements) will be printed to the console. This is useful during debugging when you start the program from the console. When you set “redirect=True”, wxPython creates a separate log window to display messages.

Adding Widget Inspector and an Interactive Shell

During development, you’d often wish to peek into the program while it’s running, just like you could in an interactive Python shell. This can be done easily with the “InspectionTool” that wxPython provides.

Let’s add the code to display a button and to open the “InspectionTool” when you click the button:

from wx.lib.inspection import InspectionTool

class MainFrame(wx.Frame):
    def __init__(self, parent, title):
        wx.Frame.__init__(self, parent=parent, title=title)

        inspector_btn = wx.Button(self, -1, "Widget Inspector")
        self.Bind(wx.EVT_BUTTON, self.OnOpenWidgetInspector,
                  inspector_btn)

        self.m_hello = "World"

    def OnOpenWidgetInspector(self, evt):
        if not InspectionTool().initialized:
            InspectionTool().Init()
        InspectionTool().Show(self, True)
wxPython Widget Inspector

What we see:

  • To create a button, create a “wx.Button” object. The first parameter to the initializer is the parent window. The second one is a unique ID (used to distinguish widgets in event handlers). We don’t need an ID, so we just set it to -1. The third parameter is the button label. For the other optional parameters, please see the wxPython Docs.
  • To register an event handler that should be called when the button is pressed, use the “Bind” method with the ID of the event and the method to invoke.
  • The event handler method takes a “wxEvent” objects as its only parameter. We don’t need the event object in our case.

When you run the application and click the button, the Widget Inspector opens. You can browse the GUI widgets that you created and use the interactive shell to work with the objects.

Working with Sizers

Example GUI

Sizers are used in wxPython to calculate the layout of widgets. Sizers perform the following tasks automatically:

  • Arrange widgets horizontally, vertically, in a grid, etc.
  • Resize widgets when the parent window is resized
  • Adjust the size of the parent to the space requirements of the children

As an example for working with sizers, let’s build a GUI with two buttons and a text box. The buttons should be right-aligned and have a nice border around them. The text box should consume the remaining free space. When the frame is resized, the layout should adapt.

The screenshot to the right shows the desired result.

For this layout, we use two sizers, a vertical one with two compartments and a horizontal one with three:

GUI Sizers

First, we create the widgets normally:

class MainFrame(wx.Frame):
    def __init__(self, parent, title):
        wx.Frame.__init__(self, parent=parent, title=title)
        self.SetBackgroundColour(
            wx.SystemSettings_GetColour(wx.SYS_COLOUR_BTNFACE))

        inspector_btn = wx.Button(self, -1, "Widget Inspector")
        self.Bind(wx.EVT_BUTTON, self.OnOpenWidgetInspector,
                  inspector_btn)

        quit_btn = wx.Button(self, -1, "Quit")

        text_box = wx.TextCtrl(self, -1, style=wx.TE_MULTILINE,
                               size=(500, 300))

Next, we create a horizontal sizer for the buttons:

horz_sizer = wx.BoxSizer(wx.HORIZONTAL)
        horz_sizer.AddStretchSpacer(prop=1)
        horz_sizer.Add(inspector_btn, proportion=0,
                       flag=wx.RIGHT, border=4)
        horz_sizer.Add(quit_btn, proportion=0)

What we see:

  • To right-align the buttons, we add a “stretch spacer” first. The argument “prop=1” defines the proportion. This will be explained shorty.
  • Next, we add the “Widget Inspector” button and add 4 pixels of free space to its right. The argument “proportion=0” will be explained shortly.
  • Finally, we add the “Quit” button.

What’s the thing about “proportion”?

When you add several widgets to a sizer, the proportion is used to define the percentage of the space that each widget should take up. For example, when you add three widgets with proportions 5, 4, and 7, this is the space they’ll take up:

Sizer Proportions
horz_sizer = wx.Sizer(wx.HORIZONTAL)
horz_sizer.Add(btn_1,
               proportion=5)
horz_sizer.Add(btn_2,
               proportion=4)
horz_sizer.Add(btn_3,
               proportion=7)

Proportion 0 means “use the minimum required space for the widget.”

Next, we create a vertical sizer, add the horizontal sizer with the buttons and the text box, and set the sizer to the frame:

vert_sizer = wx.BoxSizer(wx.VERTICAL)
        vert_sizer.Add(horz_sizer, proportion=0,
                       flag=wx.EXPAND | wx.ALL, border=4)
        vert_sizer.Add(text_box, proportion=1, flag=wx.EXPAND)

        self.SetSizer(vert_sizer)
        self.Fit()

What we see:

  • We specify the “wx.EXPAND” flag. This will be explained shortly.
  • The “wx.ALL” flag specifies that the border should apply to all sides (it’s a shortcut for “wx.LEFT | wx.TOP | wx.RIGHT | wx.BOTTOM”).

What does “wx.EXPAND” do?

Sizer EXPAND

Without “wx.EXPAND”, a horizontal sizer aligns the widgets in columns, but it does not touch their heights. Similarly, a vertical sizer aligns the widgets in rows, but it does not touch their widhts.

When “wx.EXPAND” is specified when adding a widget to a horizontal sizer, the sizer will adjust the height of the widget to the height of the sizer. (The height of the sizer is the maximum height of its children.) Similarly, “wx.EXPAND” tells a vertical sizer to adjust the width of the widget.

Tip: When the sizers do not work as expected, the Widget Inspector might help you find the problem. Select a widget and click the “Highlight” button to check whether it takes up the space that you expected.

Getting Rid of the Console

When you double-click a .py file, a console window opens. This is annoying and useless for GUI applications.

This can be solved in two ways:

  • Rename the .py file to .pyw
  • Run the .py file with “pythonw.exe” instead of “python.exe”

If you still want the output of “print” statements to be visible, pass “redirect=True” to the initializer of the “wx.App” object. A separate window will be opened when a “print” occurs.

Advanced: You can also write your own file-like object (like “StringIO”) that you assign to “sys.stdout” and “sys.stderr” and that appends all texts to a “Log Messages” window in the GUI.

GUI Tools for Creating GUIs

There are tools that let you layout frames and dialog boxes graphically. Personally, I prefer creating widgets programmatically, because the graphical editors that I tried all have shortcomings. If you want to check for yourself, see XRCed, which comes with the wxPython Demos and Tools.

Homework

Run this command in the Python shell:

>>> import this

]]>
http://realmike.org/blog/2012/06/07/python-training-part-4/feed/ 0
Python Training – Part 3http://realmike.org/blog/2012/06/07/python-training-part-3/ http://realmike.org/blog/2012/06/07/python-training-part-3/#comments Thu, 07 Jun 2012 17:36:28 +0000 http://realmike.org/blog/?p=578 Continue reading Python Training – Part 3 ]]> Part 1 | Part 2 || Part 4 | Part 5

This is part 3 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Exceptions

In Python, exceptions are the primary error handling mechanism. Whether you access an invalid list index, or you open a file that doesn’t exist, or you divide by zero—an exception is raised in all of these cases. If your program doesn’t handle the exception explicitly, a traceback is printed and the program terminates:

Traceback (most recent call last):
  File "C:tempx.py", line 9, in <module>
    Main()
  File "C:tempx.py", line 7, in Main
    print Divide(5, 0)
  File "C:tempx.py", line 3, in Divide
    x = a / b
ZeroDivisionError: integer division or modulo by zero

2.1Basic Usage

To handle exceptions, enclose code that might raise an exception in a “try” / “except” block. To raise an exception when something goes wrong, use the “raise” keyword.

If you are familiar with exception handling in C++, here’s how it maps to Python:

C++Python
// Define your own exception class.
class MyException
{
public:
    MyException(const string& msg)
    :   m_msg(msg)
    { }

    string m_msg;
};

// Raise an exception.
void FunctionWithError()
{
    throw MyException("Oops.");
}

// Handle an exception.
void HandleException()
{
    try
    {
        FunctionWithError();
    }
    catch (MyException& e)
    {
        cerr << e.m_msg;
    }
    catch (YourException)
    {
        cerr << "Your exception."
    }
    catch (...)
    {
        cerr << "Unknown error."
    }
}
# Define your own exception class.
class MyException:
    def __init__(self, msg):
        self.m_msg = msg

# Raise an exception.
def FunctionWithError():
    raise MyException("Oops.")

# Handle an exception.
def HandleException():
    try:
        FunctionWithError()
    except MyException, e:
        print e.m_msg
    except YourException:
        print "Your exception."
    except:
        print "Unknown error."

“Exception” Base Class

All of the exceptions that the Python interpreter or the standard library functions raise are derived from “Exception”. (I don’t recall any library function where this is not the case.) When you define your own exception classes, you should derive from “Exception” as well.

For exceptions that don’t need anything besides a message:

class MyException(Exception):
    pass

try:
    raise MyException("Oh no!")
except Exception, e:
        # Catches all exceptions that are derived from Exception.
    print e

For exceptions that need more:

class MyExtendedException(Exception):
    def __init__(self, info):
        # Initialize the base class with our own message.
        Exception.__init__(self, str(info) + "it happens")
        self.m_info = info

try:
    raise MyExtendedException("Sh")
except MyExtendedException, e:
    print e, e.m_info

Catching Multiple Types of Exceptions

The “except” keyword accepts a tuple with any number of exception classes:

try:
    if x:
        raise FirstException()
    else:
        raise SecondException()
except (FirstException, SecondException), e:
    print e

The previous code is equivalent to:

try:
    if x:
        raise FirstException()
    else:
        raise SecondException()
except FirstException, e:
    print e
except SecondException, e:
    print e

“try” / “except” / “else”

To run a code block only if no exception was raised, add an “else” clause to the “try” block:

try:
    print "Don't raise."
except:
    print "We never get here."
else:
    print "This is only run when no exception occurred."

“try” / “except” / “finally”

To run a code block regardless of whether an exception occurred or not, use “finally”:

def IntermediateFunction(fail):
    try:
        FunctionThatFailsSometimes(fail)
    finally:
        print "We always get here."
        # Hidden homework: See what happens when you
        # add "return" here. (Hint: Does the exception still
        # get through?)

def FunctionThatFailsSometimes(fail):
    if fail:
        print "Raise."
        raise Exception()
    else:
        print "Don't raise."

try:
    print "---Fail"
    IntermediateFunction(True)
except:
    print "The exception still gets through."
finally:
    print '"except" and "finally" can be used together.'

try:
    print "---Success"
    IntermediateFunction(False)
except:
    print "We never get here."
finally:
    print '"except" and "finally" can be used together.'

Output:

---Fail
Raise.
We always get here.
The exception still gets through.
"except" and "finally" can be used together.
---Success
Don't raise.
We always get here.
"except" and "finally" can be used together.

Printing a Traceback

Sometimes you want to handle an exception and still print the same kind of traceback that you would get from the interpreter if you didn’t have a “try” / “except”. Use the “traceback” module for this:

import traceback
try:
    raise Exception()
except:
    traceback.print_exc()    # print to stderr
    f = open("exception.txt", "w")
    traceback.print_exc(file=f)
    f.close()

Operator Overloading and Other Magic

A class can define a number of special methods to do things for which you would use operator overloading in C++.

Note: Some of these methods make your class behave more like one of the built-in types, like “list”, “dict”, or “str”. If your class is merely an extension of one of these types, consider deriving from the built-in class, but mind the Liskov Substitution Principle.

Destructor

To perform clean-up when the object is deleted, add a “__del__” method to your class:

class MyClass(object):
    def __del__(self):
        print "Object is deleted."

a = MyClass()
b = {1: a}
print "Removing first reference to object"
del a
print "Removing last reference to object"
b.clear()

# Output:
#    Removing first reference to object
#    Removing last reference to object
#    Object is deleted.

Support “str()” and “repr()”

When you call “str()” and “repr()” on an instance of a user-defined class, you get something like this:

<MyClass instance at 0x12345678>

To end up with something nicer, add “__str__” and “__repr__” methods to your class:

class MyClass(object):
    def __str__(self):
        return "I am a MyClass."

    def __repr__(self):
        return "MyClass()"

x = MyClass()
print str(x)     # prints "I am a MyClass."
print repr(x)    # prints "MyClass()"

Note: If you don’t have a “__str__” method, the “__repr__” method is used for “str()” as well.

If the string representation should be a Unicode string, add a “__unicode__” method:

class MyClass(object):
    def __unicode__(self):
        return u"xe4xf6xfc"

    def __str__(self):
        return "aou"

x = MyClass()
print str(x)     # prints "aou"
print "%s" % x   # prints "aou"
print unicode(x) # prints "äöü"
print u"%s" % x  # prints "äöü"

Support “==”, “!=”, “<”, “<=”, etc.

To support comparison operations, add these methods to your class:

OperatorMethod
==__eq__
!=__ne__
<__lt__
<=__le__
>__gt__
>=__ge__

Each of these methods receives the other object as its only parameter and should return True or False:

class MyClass(object):
    def __init__(self, a, b):
        self.m_a = a
        self.m_b = b
    def __eq__(self, other):
        return (self.m_a, self.m_b) == (other.m_a, other.m_b)
    def __ne__(self, other):
        return not self.__eq__(other)
    def __lt__(self, other):
        return (self.m_a, self.m_b) < (other.m_a, other.m_b)
    def __le__(self, other):
        return self.__lt__(other) or self.__eq__(other)
    def __gt__(self, other):
        return (self.m_a, self.m_b) > (other.m_a, other.m_b)
    def __ge__(self, other):
        return self.__gt__(other) or self.__eq__(other)
    def __repr__(self):
        return repr((self.m_a, self.m_b))

a = MyClass(3, 7)
b = MyClass(3, 7)
print a == b   # True
print a != b   # False
print a < b    # False
print a > b    # False
print a >= b   # True

# Comparison is also required for sorting:
ls = [MyClass(3, 7), MyClass(5, 2), MyClass(3, 5)]
ls.sort()
print ls    # prints [ (3, 5), (3, 7), (5, 2)]

Support “[ ]”

To support the subscript operator “[ ]”, add “__getitem__” and “__setitem__” methods. For list-like classes, these methods should receive an integer index and raise an “IndexError” if the index is invalid. For dict-like classes, these methods should receive a key of any type and raise a “KeyError” if the key is unknown.

class Alphabet(object):
    def __getitem__(self, idx):
        if 0 <= idx < 26:
            return chr(idx + ord("A"))
        else:
            raise IndexError("Index out of range")

    def __setitem__(self, idx, value):
        if 0 <= idx < 26:
            print "All %s will be changed to %s" % (
                chr(idx + ord("A")), value)
        else:
            raise IndexError("Index out of range")

x = Alphabet()
print x[3]       # prints "D"
x[2] = "Y"       # prints "All C will be changed to Y"

Support “for” Loops (Iteration)

There are two ways of supporting “for” loops over a sequence:

  • Implement “len()” and “[ ]” with “__len__” and “__getitem__”
  • Implement an iterator with “__iter__”

Using “len()” and “[ ]”

class MyList(object):
    def __len__(self):
        return 5

    def __getitem__(self, idx):
        if 0 <= idx < len(self):
            return idx * 10
        else:
            raise IndexError("Index out of range")

ls = MyList()
for i in ls:
    print i,

# Output:
#    0 10 20 30 40

Using an Iterator

class MyList(object):
    class MyIterator(object):
        def __init__(self, the_str):
            self.__m_the_str = the_str
            self.__m_idx = 0
        def __iter__(self):
            return self
        def next(self):
            if self.__m_idx >= len(self.__m_the_str):
                raise StopIteration()
            else:
                self.__m_idx += 1
                return self.__m_the_str[self.__m_idx - 1]

    def __iter__(self):
        return MyList.MyIterator("Iterate over this")

ls = MyList()
for i in ls:
    print i,

# Output:
#    I t e r a t e  o v e r  t h i s

How it works:

  • The “for” loop calls the “__iter__” method.
  • The “__iter__” method must return an iterator object that implements two methods:
    • “__iter__”, which returns the iterator itself
    • “next”, which is called repeatedly to retrieve the elements, until it raises a “StopIteration” exception.

Using an Iterator and a Generator Function

The preceding example can be written much more concisely using a “generator” function:

class MyList(object):
    def __iter__(self):
        for c in "Iterate over this":
            yield c

ls = MyList()
for i in ls:
    print i,

# Output:
#    I t e r a t e  o v e r  t h i s

How it works:

  • The “yield” keyword turns a normal function into a generator function.
  • When the generator function is called, it really returns an iterator.
  • For each step of the iteration, the function executes until it encounters a “yield”.
  • The result of the “yield” becomes value of the current step.
  • The iteration continues until the function exits through an implicit or explicit “return”.

Another generator example:

def OddFibonacci(maximum):
    a, b = 0, 1
    while b <= maximum:
        if (b % 2) == 0:
            yield str(b) + " is even!"
        else:
            yield b
        a, b = b, a + b

for x in OddFibonacci(5):
    print x

# Output:
#    1
#    1
#    2 is even!
#    3
#    5

Calling an Object like a Function (Functors)

To be able to call an object like a function, add a “__call__” method with any number of arguments:

class MyFunctor(object):
    def __init__(self, factor):
        self.__m_factor = factor

    def __call__(self, a):
        return self.m_factor * a

f = MyFunctor(10)
print f(3)    # prints 30

This particular example can also be written using the “lambda” keyword:

f = lambda a: 10 * a
    # This is equivalent to:
    #    def f(a):
    #        return 10 * a
print f(3)    # prints 30

More

There are many other special methods that we didn’t talk about. See chapter 3.4, “Special method names,” in the “Python Reference Manual.”

Common Scripting Tasks

Walking a Directory Structure

The “os.walk()” function walks a directory tree and returns a list (or rather, an iterator over a list) of tuples of the form “(dirpath, dirnames, filenames)”. Here’s an example:

import os
root = r"c:temp"
for dirpath, dirnames, filenames in os.walk(root):
    print os.path.join(root, dirpath)
    print "  Sub-directories:",
    prefix = "n   - "
    print prefix + prefix.join(dirnames)
    print "  Files:",
    print prefix + prefix.join(filenames)
  • To list the contents of a single directory, you can also use “os.listdir()”.
  • To list filenames that match a pattern (e.g., “*.txt” or “log????.*”), use “glob.glob()”.

 

More info: See the docs of the following modules for other filesystem-related functions that you might find useful: “os”, “os.path”, “shutil”, “glob”

Running External Programs

There are several ways of running external programs. The most important ones are:

  • Calling “os.system()” with the command as you would type it on the command line. This function waits for the program to finish and returns its exit code.
  • Using the “subprocess.Popen” class, you can run a program asynchronously (without blocking the calling Python program) and you can communicate with the program via stdin, stdout, and stderr.
  • Use “os.startfile()” to open a file with its associated program. For example, to open a Word document in Word, you can write “os.startfile(‘document1.doc’)”. This is like double-clicking the file in Explorer.

Here’s an example of using “subprocess.Popen” to layout a graph using “dot.exe” (from the Graphviz package):

import subprocess

PROGRAM_PATH = r"dot.exe"

p = subprocess.Popen(PROGRAM_PATH + " -T plain",
                     stdin=subprocess.PIPE,
                     stdout=subprocess.PIPE)
graph = """digraph {
    a -> b
    b -> c
    a -> c
    }
    """
stdout, stderr = p.communicate(graph)

print stdout

Instead of redirecting the output of the program, we can just as well work with temporary files and “os.system()”:

import os
import tempfile

DOT_PATH = r"dot.exe"
EXAMPLE_GRAPH = "digraph { a -> b; b -> c; a -> c }"

def GetTempFilename():
    fh, fname = tempfile.mkstemp(suffix=".tmp", prefix="dot")
    os.close(fh)
    return fname

def LayoutGraph(graph):
    input_file = GetTempFilename()
    output_file = GetTempFilename()

    try:
        open(input_file, "w").write(graph)
        os.system(DOT_PATH + ' -T plain -o "%s" "%s"'
                  % (output_file, input_file))
        return open(output_file).read()
    finally:
        # Delete the temporary files.
        os.remove(input_file)
        os.remove(output_file)

if __name__ == "__main__":
    print LayoutGraph(EXAMPLE_GRAPH)

Regular Expressions

Regular expressions facilitate searching for patterns in a string. The syntax appears a bit cryptic at first (which is probably due to it being cryptic), but don’t give up easily. It’s often easier to use a regular expression than to perform the same parsing using basic string operations like “find”, “split”, and slicing.

As an example, let’s assume we have a text file that contains dates in a certain format:

...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...

Using a Python script, we’d like to transform it to this:

...
Meeting on Thursday, 13 September 2007. Call Joe at 555-1232-4756.
... See last week's report (Friday, 01 February 2008). ...
Reservations were made from Saturday, 17 January 2009
to Monday, 16 February 2009.
...

Let’s start with a regex that matches only the string “2007-09-13” and use the “re.sub()” function to replace it with “Thursday, 13 September 2007”:

import re

text = """...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...
"""

print re.sub(r"2007-09-13", "Thursday, 13 September 2007", text)

Note: You should always use a raw string (r”…”) for the pattern string. (Backslashes are used frequently as part of the regular expression syntax. If you don’t use raw strings, you have to escape each backslash, which makes patterns harder to read.)

Instead of hard-coding the replacement string, what we really want is to call a function each time the pattern matches and calculate the replacement string in the function. This is possible by passing a function to “re.sub()”:

...
def ReplaceDate(match):
    return "Thursday, 13 September 2007"
print re.sub(r"2007-09-13", ReplaceDate, text)

The argument “match” to the “ReplaceDate” function is a “re.MatchObject” instance. To find out what a match object can do, try this in an interactive shell:

>>> text = "Meeting on 2007-09-13."
>>> m = re.search(r"2007-09-13", text)
>>> m
… <_sre.SRE_Match object at 0x01E84058>
>>> dir(_)
… ['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start']
>>> m.group()
… '2007-09-13'
>>> m.start(), m.end()
… 11, 21
>>> text[11:21]
… '2007-09-13'

Let’s write a better regular expression:

def ReplaceDate(match):
    return repr(match.group())
print re.sub(r"(d{4})-(d{1,2})-(d{1,2})", ReplaceDate, text)

Output:

...
Meeting on ('2007', '09', '13'). Call Joe at 555-1232-4756.
See last week's report (('2008', '02', '01')). ...
Reservations were made from ('2009', '1', '17')
to ('2009', '2', '16').
...

Let’s pick apart the regular expression: (d{4})-(d{1,2})-(d{1,2})

  • “d” matches any decimal digit.
  • Appending {m,n} matches m to n repetitions of the preceding pattern. For example, “d{1,2}” matches a single digit or two digits.
  • Other ways of indicating repetitions are “d?” (an optional digit), “d+” (one or more digits), and “d*” (zero or more digits).
  • The parentheses are used to create groups. A tuple of these groups is returned by the “groups” method of the match object.

What’s missing is some code that converts a tuple like “(‘2007′, ’09’, ’13’)” to the string “Thursday, 13 September 2007”. Here’s the final code:

import datetime
import re

text = """...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...
"""

def ReplaceDate(match):
    year, month, day = map(int, match.groups())
    date = datetime.date(year, month, day)
    return date.strftime("%A, %d %B %Y")

print re.sub(r"(d{4})-(d{1,2})-(d{1,2})", ReplaceDate, text)

More info: See the docs of the “re” module. An overview of the regular expression syntax can be found in the section “Regular Expression Syntax” in the “Python Library Reference.”

Homework

The homework combines several techniques presented in this handout. The program can be written in less than 100 lines (comments included).

Write a program that draws an “#include” graph of some C++ code of your choice:

  • Walk one or more directories that contain your “.cpp” and “.h” files.
  • Open each “.cpp” file and search for “#include” directives (preferably using a regular expression).
  • For each “.cpp” file, store a list of all “.h” files that you find in the “#include” directives. The pairs of “.cpp” and “.h” files are the edges of your graph.
  • Once you have all the edges, write a graph file for “dot.exe” similar to this:
    digraph {
        "main.cpp" -> "helper.h"
        "main.cpp" -> "container.h"
        "helper.cpp" -> "os.h"
        "helper.cpp" -> "helper.h"
        ...
    }
  • Invoke “utildotdot.exe -T png -o includes.png the_graph.txt” to draw the include graph.
]]>
http://realmike.org/blog/2012/06/07/python-training-part-3/feed/ 0
Python Training – Part 2http://realmike.org/blog/2012/06/07/python-training-part-2/ http://realmike.org/blog/2012/06/07/python-training-part-2/#comments Thu, 07 Jun 2012 16:47:40 +0000 http://realmike.org/blog/?p=566 Continue reading Python Training – Part 2 ]]> Part 1 || Part 3 | Part 4 | Part 5

This is part 2 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Note: The training was based on Python 2.x, because that’s what we were using at the time. I would love to update it to Python 3 at some point. Any help with this would be greatly appreciated.

Namespaces and Scopes

When you define a variable anywhere in a Python function, the variable is added to the local namespace of the function. Unlike in C++, even if the variable is defined inside an “if” statement or in a “for” loop, the variable name is also visible outside the “if” or “for” block:

def SomeFunc(a):
    def InnerFunc():
        return 2 * x    # x is visible here as well
    if a:
        x = a
    else:
        x = 0
    return x + InnerFunc()

If a name is not found in the local scope at runtime, Python tries the global scope (aka module scope, where a module refers to the “.py” file) next, then the built-in scope (containing things like “int()”, “sorted()”, etc.).

the_global = 123

def SomeFunc():
    return the_global

def OtherFunc():
    the_global = 456
    return the_global

def GlobalFunc():
    global the_global
    the_global = 789
    return the_global

def BuggyFunc():
    x = the_global
    the_global = 456
    return x

print the_global    # prints 123
print SomeFunc()    # prints 123
print OtherFunc()   # prints 456
print the_global    # still prints 123
print GlobalFunc()  # prints 789
print the_global    # prints 789
print BuggyFunc()   # UnboundLocalError: local variable
                    # 'the_global' referenced before assignment

Things to note:

  • When you assign to a variable in a function, you are always writing to the local namespace of the function, even if the variable name exists in the global scope as well (see “OtherFunc”).
  • To assign to a variable in the global namespace, use the “global” keyword (see “GlobalFunc”).
  • As soon as you assign to a variable in a function, reading from the variable anywhere in the function accesses the local variable, even in a line that precedes the assignment. Therefore, you receive an error in “BuggyFunc”, because “x = the_global” tries to read the local variable “the_global”, which isn’t assigned yet.

You can use these built-in functions to access the contents of namespaces:

  • globals(): Returns a dictionary containing the variables in the module scope.
  • locals(): Returns a dictionary containing the variables in the local (class or function) scope. You should not modify the dictionary.
  • dir(): When called without parameters, returns a list of variables in the local scope. When called with an object as the parameter, returns a list of attributes of the object.

You can remove the binding of a name to an object by using the “del” keyword:

>>> the_global = 123
>>> print the_global
…
>>> del the_global
>>> print the_global
… NameError: name 'the_global' is not defined

Classes

To define a class in Python, use the “class” keyword. For the methods, use the “def” keyword, just like for functions.

class SomeClass(object):
    """@brief The docstring for the class."""

    def __init__(self, initial_value):
        """@brief This is the constructor, or more precisely,
                the initializer of the class.
        """
        self.m_some_member = initial_value

    def SomeMethod(self, inc):
        self.m_some_member += inc
        return self.m_some_member

# Working with the class
c = SomeClass(100)
print c.SomeMethod(10)    # prints 110
c.m_some_member = 0
print c.m_some_member     # prints 0

What we see:

  • New-style classes are derived from the “object”class.
    • If you write just “class SomeClass:”, you still get a class, but some of the features that we’ll discuss later (such as properties) won’t work. So you should always derive from “object” (or from a class that’s already derived from “object”) if possible.
  • The equivalent to a C++ constructor is the “__init__” method. You can leave it out if you have no fields to initialize. (“__del__” is the opposite.)
  • The first parameter to all methods must be the object instance. By convention, you should name it “self”.
    • This is the equivalent to the implicit “this” parameter in C++.
  • To create fields, assign to attributes of “self”. Typically, this is done in “__init__”, but it can be done everywhere.
    • According to our Python coding guidelines, fields are prefixed with “m_”, but this is just convention.
  • To create instances of the class, call the class like a function with the parameters specified for “__init__”.
  • You can access fields directly from outside the class (“c.m_some_member = …”).

Inheritance

To create a derived class, specify a comma-separated list of base classes when you define the class:

class DerivedClass(BaseA, BaseB):

The derived class inherits all the attributes of the base classes. When an attribute name appears both in “BaseA” and in “BaseB”, the attribute from “BaseA” has precedence, because it appears first in the list of base classes.

An example demonstrating some aspects of inheritance:

class BaseA(object):
    def __init__(self):
        self.m_member = None

    def MethodA(self):
        print "BaseA. MethodA", self.m_member

    def CommonMethod(self):
        print "BaseA.CommonMethod"

class BaseB(object):
    def MethodB(self):
        print "BaseB. MethodB"

    def CommonMethod(self):
        print "BaseB.CommonMethod"

class DerivedClass(BaseA, BaseB):
    def MethodA(self):
        print "DerivedClass.MethodA"
        # Call the inherited method
        self.m_member = "hi!"
        BaseA.MethodA(self)

d = DerivedClass()
d.MethodA()
# Output:
#  DerivedClass.MethodA
#  BaseA.MethodA hi!

d.MethodB()
# Output:
#  BaseB.MethodB

d.CommonMethod()
# Output:
#  BaseA.CommonMethod

What we see:

  • “BaseA.__init__” is invoked automatically when you create an instance of “DerivedClass”.
  • An attribute in a derived class overwrites an attribute of the same name in the base class (“DerivedClass.MethodA”).
  • If an attribute appears in more than one base class, the attribute from the class that was specified first in the list of base classes has precedence (“BaseA.CommonMethod”).
  • To invoke the base class implementation of a method, write BaseClass.MethodName and pass “self” as the first parameter explicitly.

Please note that if you define your own “__init__” in the derived class, or if you have multiple base classes with an “__init__”, you are responsible for invoking the base class implementation of “__init__”:

class BaseClass(object):
    def __init__(self):
        self.m_member = 0

    def SomeMethod(self):
        return self.m_member

class GoodDerivedClass(BaseClass):
    def __init__(self):
        BaseClass.__init__(self)

class BadDerivedClass(BaseClass):
    def __init__(self):
        pass

good = GoodDerivedClass()
print good.SomeMethod()    # prints 0

bad = BadDerivedClass()
print bad.SomeMethod()     # AttributeError: 'BadDerivedClass'
                           # object has no attribute 'm_member'

Public, private, protected

Python distinguishes between public and private attributes. Any attribute name prefixed with two underscores becomes private (except for names of the form “__xy__”).

class SomeClass(object):
    def PublicMethod(self):
        self.__m_private_field = "encapsulated"
        return self.__PrivateMethod()

    def __PrivateMethod(self):
        return self.__m_private_field

c = SomeClass()
print c.PublicMethod()    # prints "encapsulated"
print c.__PrivateMethod() # AttributeError: SomeClass instance has
                          # no attribute '__PrivateMethod'
print c.__m_private_field # AttributeError: SomeClass instance has
                          # no attribute '__m_private_field'

What we see:

  • To make an attribute (method or field) private, prefix it with “__”.
  • Private attributes can only be accessed from inside the class.
  • An “AttributeError” exception is raised when you try to access private attributes from outside the class.

Python does not support protected attributes (i.e., attributes that you can access in derived classes only). By convention, we prefix such attributes with a single underscore, so that users of the class know they’re an implementation detail, but authors of derived classes can still access them:

class BaseClass(object):
    def _ProtectedMethod(self):
        self._m_protected = "protected"

class DerivedClass(BaseClass):
    def PublicMethod(self):
        self._ProtectedMethod()
        print self._m_protected
    def __PrivateMethod(self):
        return self.__m_private_field

Properties

Python supports “properties”, i.e., pairs of Get/Set methods that are called transparently when you access an attribute:

class SomeClass(object):
    def __init__(self, initial_value):
        self.__m_read_write_prop = initial_value
        self.__m_read_only_prop = initial_value

    def __GetReadWriteProp(self):
        print "Someone's reading ReadWriteProp"
        return self.__m_read_write_prop

    def __SetReadWriteProp(self, new_value):
        print "Someone's writing ReadWriteProp"
        self.__m_read_write_prop = new_value

    ReadWriteProp = property(fget=__GetReadWriteProp,
                             fset=__SetReadWriteProp)

    def __GetReadOnlyProp(self):
        print "Someone's reading ReadOnlyProp"
        return self.__m_read_only_prop

    ReadOnlyProp = property(fget=__GetReadOnlyProp)

C = SomeClass("initial")
print "val =", c.ReadWriteProp
# Output:
#  Someone's reading ReadWriteProp
#  val = initial

c.ReadWriteProp = "new"
# Output:
#  Someone's writing ReadWriteProp

print "val =", c.ReadWriteProp
# Output:
#  Someone's reading ReadWriteProp
#  val = new

print "val =", c.ReadOnlyProp
# Output:
#  Someone's reading ReadWriteProp
#  val = initial

c.ReadOnlyProp = "new"    # AttributeError: can't set attribute

Note: For properties to work, the class must be derived from “object”. Otherwise, the property loses its special behavior and becomes a normal attribute as soon as you write “c.ReadWriteProp = 1000”.

Note: Properties are an application of Python’s “descriptor” concept. For more about this and other features of new-style classes, see my article “Introduction to New-Style Classes in Python”.

Static Fields and Static Methods

Python supports static fields and methods:

class InstanceCounter(object):
    s_num_instances = 0

    def __init__(self):
        InstanceCounter.s_num_instances += 1

    def GetNumInstances():    # no "self" here
        return InstanceCounter.s_num_instances

    GetNumInstances = staticmethod(GetNumInstances)

a = InstanceCounter()
b = InstanceCounter()
print InstanceCounter.GetNumInstances()    # prints 2
InstanceCounter.s_num_instances = 100
c = InstanceCounter()
print InstanceCounter.GetNumInstances()    # prints 101

print a.s_num_instances    # prints 101
a.s_num_instances = 5
print a.s_num_instances                    # prints 5
print InstanceCounter.GetNumInstances()    # still prints 101

Things to note:

  • According to our Python coding guidelines, static fields are prefixed with “s_”, but this is just convention.
  • Static methods do not have a “self” parameter, obviously.
  • Static fields and methods can be used both via the class and via an instance.
  • In the line “a.s_num_instances = 5”, an attribute named “s_num_instances” is added to the symbol table of “a”. This attribute hides the static field of the same name in “SomeClass” when you access it through “a”. The static field of “SomeClass” is not changed.

Class Methods

Class methods are similar to static methods (but less frequently used). Like static methods, they don’t receive a “self” parameter, but they receive a “cls” parameter with a reference to the class object:

class SomeClass(object):
    def ClassMethod(cls):
        print cls.__name__

    ClassMethod = classmethod(ClassMethod)

class DerivedClass(SomeClass):
    pass

SomeClass.ClassMethod()    # prints "SomeClass"
c = SomeClass()
c.ClassMethod()            # prints "SomeClass"
DerivedClass.ClassMethod() # prints "DerivedClass"
d = DerivedClass()
d.ClassMethod()            # prints "DerivedClass"

Example – Parsing XML

The following sections walk you through the task of writing a Python program that prints the contents of an XML document. This will give us plenty of opportunity to learn new things about Python programming in general.

Note: You can find the code and example XML documents in the ZIP package for this lesson.

Setting up a ContentHandler

The standard Python library contains an XML parser and modules to access XML documents using the SAX and DOM APIs. We’ll be using the SAX API from the “xml.sax” module. This module contains the function “parse”, which requires a user-defined class with callbacks for handling the various parts of the XML document.

# File: step_1xml_printer.py
import sys
import xml.sax
import xml.sax.handler

class Printer(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_level = 0
        self.m_num_elements = 0

    def startElement(self, name, attrs):
        # Invoked for each opening tag
        print " " * self.__m_level + name
        self.__m_level += 1
        self.m_num_elements += 1

    def endElement(self, name):
        # Invoked for each closing tag
        self.__m_level -= 1

def Main(filename_or_stream):
    handler = Printer()
    xml.sax.parse(filename_or_stream, handler)
    print handler.m_num_elements, "elements"

if __name__ == "__main__":
    Main(sys.argv[1])  # The name of the XML file must be the
                       # first command-line parameter.

The program produces output like this:

...
     MOB_Object
      MOB_InfoLine
       MOB_Text
      MOB_InfoLine
       MOB_Text
     MOB_Object
      MOB_GDL_ReelSlotGameLine
...
2995 elements

New things in the code:

  • The code in “if __name__ == “__main__”” is executed only when the “.py” file is the main program. When the “.py” file is loaded into another program via the “import” keyword, the code is not run. The importing module can call the “Main()” function later with an XML file of its own choice.
  • “sys.argv” is a list of command-line parameters to the Python program. “sys.argv[0]” contains the path to the program. The remaining elements contain the parameters.

Writing the Unit Test

In this section: In-memory files with the “StringIO” class and redirecting “sys.stdout”.

The program reads its input from a file and prints the output directly to STDOUT. One way of writing the unit test would be:

  • Prepare an XML file “test_input.xml” with test data.
  • Invoke the program from the unit test by running another instance of Python with redirected output:
    import os
    import sys
    …
    os.system(sys.executable + " xml_printer.py "
              "test_input.xml >output.txt")

    This is equivalent to running “xml_printer.py test_input.xml >output.txt” on the command line.

  • Compare the expected results to the results in “output.txt”.

While this approach might have its advantages, we’ll go down a different route:

  • Prepare the XML input in an in-memory file, which we can pass directly to the “Main” function. The “StringIO” class serves as an in-memory file.
  • Redirect STDOUT by setting the “sys.stdout” variable to another “StringIO” object. (The “print” statements in the program go through “sys.stdout” implicitly.)
  • Compare the expected results to the contents of the redirected STDOUT.

This is our unit test:

# File: step_1test_xml_printer.py
import StringIO
import sys
import unittest
import xml_printer

class TextXmlPrinter(unittest.TestCase):
    def setUp(self):
        # Redirect STDOUT so that all subsequent "print"
        # statements in the Python program go to a StringIO buffer.
        self.__m_old_stdout = sys.stdout
        sys.stdout = StringIO.StringIO()

    def tearDown(self):
        # Restore STDOUT so that prints go to the screen again.
        sys.stdout = self.__m_old_stdout

    def test_PrintHierarchy(self):
        # Prepare the XML in an in-memory file, i.e., in a
        # StringIO buffer.
        data = StringIO.StringIO(
            """<?xml version="1.0"?>
               <A>
                 <B>
                   <C/>
                 </B>
                 <D>
                   <E/>
                 </D>
               </A>
            """)
        xml_printer.Main(data)
        expected = ("An"
                    " Bn"
                    "  Cn"
                    " Dn"
                    "  En"
                    "5 elementsn")
        # Compare the expected results to the contents of
        # our redirected STDOUT.
        self.assertEquals(expected,
                          sys.stdout.getvalue())

if __name__ == "__main__":
    unittest.main()

Things to note:

  • The “setUp” method is called before each “test_” method.
  • The “tearDown” method is called after each “test_” method, even if the test fails.
  • We redirect STDOUT by temporarily setting the global variable “sys.stdout” to a “StringIO” buffer, and restoring the original stream afterwards.

Printing Attribute Values

In this section: Working with Unicode strings.

In the next step, we’ll print the XML attributes for each element. The attributes are passed as a dictionary to the “startElement” method of the “ContentHandler”. Try this:

# File: step_2xml_printer.py
import sys
import xml.sax
import xml.sax.handler

class Printer(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_level = 0
        self.m_num_elements = 0

    def startElement(self, name, attrs):
        # Invoked for each opening tag
        print " " * self.__m_level + name
        self.__m_level += 1
        self.m_num_elements += 1
        for attr_name, attr_value in attrs.items():
            print " " * self.__m_level + " -", print attr_name, "=", attr_value
             # This might cause an error. Explanation follows.

    def endElement(self, name):
        # Invoked for each closing tag
        self.__m_level -= 1

def Main(filename_or_stream):
    handler = Printer()
    xml.sax.parse(filename_or_stream, handler)
    print handler.m_num_elements, "elements"

if __name__ == "__main__":
    Main(sys.argv[1])  # The name of the XML file must be the
                       # first command-line parameter.

When you run this program on the command line with the file “scene.xml” from the example ZIP package, you receive this error in the line “print attr_name, “=”, attr_value”:

…
  File "C:Python25libencodingscp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'u20ac' in position 0: character maps to <undefined>

The reason for the error is this: The XML parser works with Unicode strings. One of the attribute values contains the € sign (Unicode 0x20ac). This character is stored in a Python string as follows:

>>> u"u20ac"
… u'u20ac'
>>> type(_)
… &lt;type 'unicode'&gt;

When you print a string of type “unicode” to STDOUT or to a file, it is converted to an 8-bit string using an encoding. The encoding used by the console (at least on my computer) is cp437. However, codepage 437 does not define the € sign, so the attempt to encode the string results in an error. Try it yourself:

>>> u"u20ac".encode("cp437")
… UnicodeEncodeError: 'charmap' codec can't encode character u'u20ac' in position 0: character maps to <undefined>

Important notes:

  • In PyCrust or another graphical shell, you can “print u’u20ac’” without problems. The error occurs only on the command line. The reason is that PyCrust uses its own file object for “sys.stdout”, which supports Unicode strings directly.
  • There is no encoding error when you work with normal strings. For example, if you read the € sign from a file in the standard Windows encoding, cp1252, it ends up as the Python string “x80”. When you print this, no encoding or decoding takes place. However, the wrong character might show up if the target of the print uses a different encoding than cp1252.

To solve the problem, we could either print “repr(attr_value)” instead, or we can encode the string to ASCII before printing it, replacing all unknown characters with “?”:

>>> u"u20ac".encode("ascii", "replace")
… '?'

So, the new code for printing the attributes looks like this:

for attr_name, attr_value in attrs.items():
    print " " * self.__m_level + " -", print attr_name, "=", 
          attr_value.encode("ascii", "replace")

If you need to convert an 8-bit string that contains characters in a certain encoding to a Unicode object, use the “decode” method:

>>> "x80".decode("cp1252")
… u'u20ac'
>>> _.encode("utf-8")
… 'xe2x82xac'
>>> _.decode("utf-8")
… u'u20ac'

Writing to a File Using an Encoding

To write Unicode strings to a file using a special encoding, you can use the “open” function from the “codecs” module. This is an example of printing the € sign to a UTF-8-encoded XML file:

import codecs
f = codecs.open("utf8.xml", "w", "utf-8")
print >> f, '<?xml version="1.0" encoding="utf-8"?>'
print >> f, u"<Root>u20ac</Root>"
f.close()

When you open the resulting file in a hex editor, you can see that the € sign is stored as a sequence of three bytes, as defined by the UTF-8 encoding. When you open the file in a text editor that supports the UTF-8 encoding, the € sign appears correctly.

Building an Object Tree

In this section: The “setattr” introspection function and more about lists.

Finally, let’s create Python objects from the XML elements. The unit test shows the desired interface of these objects:

# File: step_3test_xml_objects.py
import unittest
import xml_objects

data = """<?xml version="1.0"?>
<Page>
  <Paragraph align="left">
    This is <Bold>bold</Bold> text.
  </Paragraph>
  <Paragraph align="center">
    <Bold>Bold</Bold> and <Italic>italic</Italic>.
  </Paragraph>
  <Table border="1">
    <Row><Cell>A</Cell><Cell>B</Cell></Row>
  </Table>
</Page>"""

class TextXmlObjects(unittest.TestCase):
    def setUp(self):
        self.__m_root = xml_objects.Load(data)
    def test_ChildNodes(self):
        self.assertEquals(
            set(["Paragraph", "Table"]),
            self.__m_root.GetChildNames())
        self.assertEquals(
            2, len(self.__m_root.GetChildNodes("Paragraph")))
        self.assertEquals(
            1, len(self.__m_root.GetChildNodes("Table")))

    def test_Attributes(self):
        first_para = self.__m_root.GetChildNodes("Paragraph")[0]
        self.assertEquals(
            ["align"],
            first_para.GetAttributeNames())
        self.assertEquals(
            "left", first_para.align)

if __name__ == "__main__":
    unittest.main()

To summarize the interface:

  • The “Load” function returns the object for the root element.
  • The “GetChildNames” method returns a set of the child element names.
  • The “GetChildNodes” method returns a list of the child elements of a given name.
  • The “GetAttributeNames” method returns a list of XML attribute names.
  • The XML attribute values can be queried using normal Python attributes.

This is the code of the program:

# File: step_3xml_objects.py
import xml.sax
import xml.sax.handler

class Element(object):
    def __init__(self):
        self.__m_child_nodes = []
        self.__m_attributes = {}

    def AddChildNode(self, name, element):
        self.__m_child_nodes.append((name, element))

    def AddAttributes(self, attrs):
        self.__m_attributes.update(attrs)
        for (attr_name,
             attr_value) in self.__m_attributes.iteritems():
            setattr(self, attr_name, attr_value)

    def GetChildNames(self):
        return set([name for name, element in self.__m_child_nodes])

    def GetChildNodes(self, element_name):
        return [element for name, element in self.__m_child_nodes
                if name == element_name]

    def GetAttributeNames(self):
        return self.__m_attributes.keys()

class Loader(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_element_stack = []
        self.__m_root = None

    def GetRoot(self):
        return self.__m_root

    def startElement(self, name, attrs):
        element = Element()
        element.AddAttributes(attrs)
        self.__m_element_stack.append(element)

    def endElement(self, name):
        element = self.__m_element_stack.pop()
        if self.__m_element_stack:
            self.__m_element_stack[-1].AddChildNode(name, element)
        else:
            self.__m_root = element

def Load(xml_string):
    handler = Loader()
    xml.sax.parseString(xml_string, handler)
    return handler.GetRoot()

The code works like this:

  • When a new XML element starts, an “Element” instance is pushed on a stack.
  • The XML attributes are passed to the “AddAttributes” method of the “Element”.
  • In “AddAttributes”, new Python attributes are added to the instance using “setattr”: The code “setattr(x, “y”, z)” has the same effect as “x.y = z”. (“getattr” can be used for reading.)
  • When an XML element ends, the “Element” instance is popped off the stack and passed to the “AddChildNode” method of the parent element.
  • When there are no more elements on the stack, we have reached the root.
  • The “GetChildNames” and “GetChildNodes” methods use the list comprehension syntax (“[x for y in ys if expr]”) to return parts of the list “self.__m_child_nodes”.

More list comprehension examples:

>>> names = ["John", "Frank", "Sue", "Jane"]
>>> numbers = [1, 2, 3, 4, 5, 6, 7, 8]
>>> [n.upper() for n in names if n.startswith("J")]
… ["JOHN", "JANE"]
>>> # Convert two lists to a list of tuples by using "zip".
>>> [x for x in zip(numbers, names)]
… [(1, 'John'), (2, 'Frank'), (3, 'Sue'), (4, 'Jane')]

Homework

Extend “xml_objects.py” so that it handles the text contents of the elements. In the element…

    <Paragraph align="left">
        This is <Bold>bold</Bold> text.
    </Paragraph>

… it should be possible to retrieve the text “This is”, the child element “<Bold>”, and the text “text.” in the correct order.

Write the unit test first to help you find a convenient interface.

]]>
http://realmike.org/blog/2012/06/07/python-training-part-2/feed/ 0
Python Training – Part 1http://realmike.org/blog/2012/06/07/python-training-part-1/ http://realmike.org/blog/2012/06/07/python-training-part-1/#comments Thu, 07 Jun 2012 15:40:01 +0000 http://realmike.org/blog/?p=551 Continue reading Python Training – Part 1 ]]> Part 2 | Part 3 | Part 4 | Part 5

This is the course material for a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

The target audience for the trainings were software developers, testers, and mathematicians. Each training session took 3 to 4 hours and consisted mostly of interactive programming, roughly (but not slavishly) following the outline on this page. Depending on the prior programming experience of each group, I made small changes to the trainings as I went along.

Each participant received this tutorial printed to a booklet (12 or 16 pages). Even though I would usually deviate a lot from the exact examples in the handouts, having the handouts frees the participants from taking their own notes and allows them to actively participate in the classroom. I also gave out little exercises at the end of each session to be completed by the participants until the following week.

Note: The training was based on Python 2.x, because that’s what we were using at the time. I would love to update it to Python 3 at some point. Any help with this would be greatly appreciated.

About Python

Python is an interpreted, dynamically-typed, object-oriented programming language. It can be used for writing small, throw-away scripts and large, object-oriented applications alike. Some of Python’s features:

  • Built-in high-level data types
  • Garbage collector
  • Comprehensive standard library and a myriad of add-on libraries on the web
  • Easily extensible with libraries written in other programming languages

Getting Started

Installing

To use Python locally, install these programs:

Running

Python source code has the filename extension “.py”. When Python is installed locally, you can double-click a “.py” file to run the program. You can also run “python.exe <py-file>” from the command line.

Interactive Mode

When you start “python.exe” from the command line without parameters, Python runs in interactive mode. In this mode, you can type Python source code directly on the console and see the results immediately.

If you prefer a GUI, you can run the interactive Python shell using PyCrust (from the wxPython Tools) or the PythonWin IDE. Both provide code completion and other useful features.

Editing/Debugging

To edit and debug Python programs, you can use PythonWin.

For developing larger Python programs consisting of many files, I recommend Eclipse and the PyDev plug-in.

Manuals

When you install Python locally, you can find the Python Manuals, including a tutorial, on the Start menu.

In an interactive Python shell, use the built-in “help” function to display help for any Python object. For example, help(str) prints a list of all methods of str objects.

Use PyCrust to receive IntelliSense-style help on available methods and method parameters.

Hello, World!

Write the following code into a file with the extension “.py” or type the code in an interactive Python shell.

The typical “Hello, World!” program is nearly too simple in Python:

print “Hello, World!”

Let’s make it more interesting:

who = raw_input("Who do you want to greet (default is World): ")
if not who:
    who = "World"
else:
    who = who.capitalize() # make the first character upper-case
print who, "is in da house!", # comma at the end prevents newline
print "Hello, " + who + "!"

Things to note:

  • You don’t declare the data type of variables, you just assign a value (“who”)
  • No curly braces for “if” and “else” blocks: The indentation alone determines where a block ends
  • Strings are objects (“capitalize” method, “+” operator overloaded, etc.). As we’ll see later, everything’s an object in Python.

Built-in Data Types

In the following examples, “>>>” denotes lines that you type in the Python shell and “…” denotes the output.

Note: I recommend PyCrust as an interactive shell, because it has some features that PythonWin lacks. You can paste lines starting with “>>>” into PyCrust when you click “Edit” → “Paste Plus”.

Numbers

There are no big surprises. The operators are more or less the same as in C. Python has built-in support for large integers. Here are a few examples of working with numbers:

>>> (12 + 3) * 47
… 705
>>> 0xff
… 255
>>> 32 / 7.0 * 1.3e-6
… 5.9428571428571423e-006
>>> 2**16
… 65536
>>> _ - (1 << 16)
    # in interactive mode, “_” contains the last printed value,
    # i.e., 65536
… 0
>>> 12**45
… 3657261988008837196714082302655030834027437228032L

Strings

Examples of working with strings:

>>> "Here's a string"
… "Here's a string"
>>> 'This "is" it'
… 'This "is" it'
>>> print "Newlinen"and"ttab"
… Newline
… "and"   tab
>>> "C:\temp"
… 'C:\temp'
>>> r"C:temp"
    # in a raw string ('r' prefix), the backslash is not an escape character
… 'C:\temp'
>>> """Multi-
line string"""
… 'Multi-nline string'
>>> '''Multi-
  line string'''
… ‘Multi-n  line string’

You can convert any object to human-readable form by using the “str” function. The “repr” function has a similar purpose: If possible, it returns a string representation that you can later pass to the “eval” function to turn the string back into an object.

>>> str(5)
… '5'
>>> repr(5)
… '5'
>>> str(3.8)
… '3.8'
>>> repr(3.8)
… '3.7999999999999998'
>>> repr({1: 3, 2: 4})
… '{1: 3, 2: 4}'
>>> eval('{1: 3, 2: 4}')
… {1: 3, 2: 4}

Some useful string methods and operators:

>>> "Hello, " + "World!"
… 'Hello, World!'
>>> "x" * 10
… 'xxxxxxxxxx'
>>> w = '    word  '
>>> w.strip()
… 'word'
>>> w.lstrip()
… 'word  '
>>> w.rstrip()
… '    word'
>>> s = "Edward Kennedy Ellington"
>>> s[1]
… 'd'
>>> s.startswith("Edw")
… True
>>> "Kennedy" in s
… True
>>> s.lower()
… 'edward kennedy ellington'
>>> s.upper().replace("E", "U")
… 'UDWARD KUNNUDY ULLINGTON'

Using the slice notation, you can access sub-strings:

>>> s = "Hello, World!"
>>> s[1:]    # sub-string starting at index 1
… 'ello, World!'
>>> s[:3]    # sub-string up to but not including index 3
… 'Hel'
>>> s[2:5]
… 'llo'
>>> len(s[2:5] == 5 - 2)
… True
>>> s[-1]    # the last character
… '!'
>>> s[:-2]    # everything up to the second-last character
… 'Hello, Worl'

Strings are immutable, i.e., you can’t modify them after they were created. For example, you can’t use the subscript operator “[]” to overwrite characters in the string.

>>> s[1] = 'x'
… TypeError: 'str' object does not support item assignment

The “%” operator provides printf-like functionality:

>>> "%s has the value %i" % ("X", 25)
… 'X has the value 25'
>>> "%s has the value %s" % ("X", 25)
… 'X has the value 25'
>>> val1 = 3
>>> val2 = 8
>>> "%(val1)i and %(val2)i" % locals()
… '3 and 8'

For scanf-like functionality, you should use regular expressions. See the “re” module in the standard library. We’ll cover this in one of the next lessons.

What we didn’t cover: There’s a separate class for Unicode strings. We’ll get back to this in one of the next lessons.

If you’re interested, you can try it out yourself:

>>> u"äöü"
… u'xe4xf6xfc'
>>> u"äöü".encode("utf-8")
… 'xc3xa4xc3xb6xc3xbc'
>>> _.decode("utf-8")
… u'xe4xf6xfc'

Lists

Python has a built-in list data type that can store objects of arbitrary types.

>>> ls = [1, "text", 3, [4, 5]]
>>> ls
… [1, 'text', 3, [4, 5]]
>>> ls[1]
… 'text'
>>> ls[1] = 2
>>> ls
… [1, 2, 3, [4, 5]]
>>> ls[1:3]
… [2, 3]
>>> ls[1:3] = []    # same as del[1:3]
>>> ls
… [1, [4, 5]]
>>> ls[1:2] = [4, 3, 5, 2]
>>> ls
… [1, 4, 3, 5, 2, [4, 5]]
>>> del ls[-1]
>>> ls
… [1, 4, 3, 5, 2]
>>> ls.sort()
>>> ls
… [1, 2, 3, 4, 5]
>>> ls.reverse()
>>> ls
… [5, 4, 3, 2, 1]
>>> ls.append(11)
>>> ls
… [5, 4, 3, 2, 1, 11]
>>> ls.pop()
… 11
>>> ls
… [5, 4, 3, 2, 1]
>>> ls.extend([11, 12, 13])
>>> ls
… [5, 4, 3, 2, 1, 11, 12, 13]

Tuples

You can think of tuples as fixed-length lists. Here are a few examples:

>>> t = (1, 2, "text")
>>> t
… (1, 2, 'text')
>>> len(t)
… 3
>>> t[0]
… 1
>>> t[1] = 5
… TypeError: 'tuple' object does not support item assignment
>>> a, b, c = t
>>> print a, b, c
… 1 2 text
>>> u = a, b
>>> u
… (1, 2)
>>> empty = ()
>>> empty
… ()
>>> singleton = (1,)
>>> singleton
… (1,)
>>> list(singleton)
… [1]
>>> tuple([1, 2, 3])
… (1, 2, 3)
>>> i, (name, age) = (0, ("John", 4))
>>> print i, name, age
… 0 John 4
>>> # BTW, this works with lists just as well:
>>> i, (name, age) = [1, ["Sue", 27]]
>>> print i, name, age
… 1 Sue 27

Dictionaries

Dictionaries are associative containers, i.e., mappings between keys and values. Python dictionaries are implemented as hash tables and can store arbitrary data types. Here are a few examples:

>>> d = {5: 3.2,
>>>      "John": 4,
>>>      (1, 2): [3, 4, 5]}
… {5: 3.2, 'John': 4, (1, 2): [3, 4, 5]}
>>> d[5]
… 3.2
>>> d["John"] = "Doe"
>>> d
… {5: 3.2, 'John': 'Doe', (1, 2): [3, 4, 5]}
>>> d["Jane"] = "Doe"
>>> d
… {5: 3.2, 'John': 'Doe', 'Jane': 'Doe', (1, 2): [3, 4, 5]}
>>> d.keys()
… [5, 'John', 'Jane', (1, 2)]
>>> d.values()
… [3.2, 'Doe', 'Doe', [3, 4, 5]]
>>> d.items()
… [(5, 3.2), ('John', 'Doe'), ('Jane', 'Doe'), ((1, 2), [3, 4, 5])]
>>> d.update({5: 4.8, 7: 3})
>>> d
… [(5, 4.8), ('John', 'Doe'), 7: 3, ('Jane', 'Doe'), ((1, 2), [3, 4, 5])]
>>> d2 = dict([(5, 3.2), ("John", "Doe")])
>>> d2
… {'John': 'Doe', 5: 3.2}
>>> # Dicts can be used with the “%” operator for strings
>>> "My name is %(John)s" % d2
… 'My name is Doe'
>>> del d2["John"]
>>> d2
… {5: 3.2}

Sets

Sets are similar to lists, but their values are unique and unordered. The “set” data type supports operations such as computing the union, difference, and intersection of sets.

>>> s = set([10, 3, 7, 3, 5, 4, 4])
>>> s
… set([10, 3, 4, 5, 7]
>>> s.add(8)
>>> s
… set([3, 4, 5, 7, 8, 10])
>>> s.difference([7, 10, 4])
… set([3, 5, 8])
>>> s.union([3, "Joe"])
… set([3, 4, 5, 7, 8, 10, 'Joe'])
>>> s.intersection([1, 5, 7])
… set([5, 7])

Control Structures

If Statements

This is an example for an if-elif-else statement:

if x == 5:
    This()
    AndThat()
elif y > 3 and z != 4:
    Other()
else:
    SomethingElseEntirely()

This, of course, is the same as:

if x == 5:
    This()
    AndThat()
else:
    if y > 3 and z != 4:
        Other()
    else:
        SomethingElseEntirely()

If the Boolean expression gets too long, you might want to introduce a line break. The following attempt, however, would result in a syntax error:

if y > 3
  and z != 4:
    …

You must either use the line continuation character “”:

if y > 3 
  and z != 4:
    …

Or enclose the expression in parentheses:

if (y > 3
  and z != 4):
    …

Python does not check the indentation inside parentheses, brackets (as used for lists), and curly braces (as used for dicts), so you can always insert line breaks inside those.

While Loops

Here’s an example for a “while” loop in Python:

x = 1
while x < 100:
    print x
    x *= 3

Note: There is no equivalent to “do-while” loops in Python.

For Loops

In Python, “for” loops are used exclusively to iterate over sequences. If you want something like “for (int i = 0; i < 10; ++i)”, use the “xrange” function, which returns a sequence of integers:

for i in xrange(10):
    print i,

Output: 0 1 2 3 4 5 6 7 8 9

ls = [6, 1, 5, 3, 7]
for x in ls:
    print x,

for x in sorted(ls):
    print x,

for x in reversed(ls):
    print x,

Output:

6 1 5 3 7
1 3 5 6 7
7 3 5 1 6

d = {1: 10, 3: 30, 5: 50}
for k in d:
    print k,

Output: 1 3 5

d = {1: 10, 3: 30, 5: 50}
for k, v in d.iteritems():
    print "%i = %i" % (k, v)

Output:

1 = 10
3 = 30
5 = 50

people = [("John", 4),
           ("Sue", 27),
           ("Frank", 15),
           ("Clara", 8)]
for i, (name, age) in enumerate(people):
    print "%i: %s is %i years old" % (i, name, age)

Output:

0: John is 4 years old
1: Sue is 27 years old
2: Frank is 15 years old
3: Clara is 8 years old

The “break” and “continue” keywords known from C exist in Python as well.

people = [("John", 4),
           ("Sue", 27),
           ("Frank", 15),
           ("Clara", 8)]
for i, (name, age) in enumerate(people):
    if age > 25:
        continue
    print "%i: %s is %i years old" % (i, name, age)
    if age == 15:
        break

Output:

0: John is 4 years old
2: Frank is 15 years old

Python supports an “else” clause for loops. Consider this common construct:

found = False
for name, age in people:
    if age > 25:
        found = True
        break
if not found:
    print "No one is older than 25."

This can be written shorter using “for-else”:

for name, age in people:
    if age > 25:
        break
else:
    print "No one is older than 25."

The “else” clause is entered when the iteration continued all the way to the end, i.e., if no “break” was executed.

The “while” loop supports an “else” clause as well.

Defining Functions

Functions are defined using the “def” keyword. Here’s a simple function taking two parameters:

def Add(a, b):
    return a + b

# Invoking it
print Add(5, 3)    # prints 8
print Add("Hello, ", "World!")    # prints “Hello, World!”
print Add([1, 2], [3, 4])    # prints [1, 2, 3, 4]

Things to note:

  • You don’t declare the data type of the parameters or of the return type
    • The function can be invoked with any two objects that support the “+” operator, similar to how template functions work in C++.
  • The indentation alone determines where the function ends

Note: C++ programmers might wonder whether the lack of explicit data type declarations results in more bugs. My experience is that it’s very rare for a bug in a Python program to be caused by a data type mismatch. It is still a good idea to have a unit-testing suite with high code coverage for your Python code. This way, you can often detect such problems faster than the C++ compiler could parse the header files. :-)

Default parameters are also possible:

def Greet(greeting, who="World"):
    print "%s, %s!" % (greeting, who)

# Invoking it
Greet("Hello")    # prints “Hello, World!”
Greet("Good morning", "Vietnam")    # prints “Good morning, Vietnam!”

You can always specify the parameter names explicitly, in any order, regardless of whether there’s a default value or not:

Greet(who="Forrest", greeting="Run")    # prints “Run, Forrest!”

Another example, with several default parameters:

def Print(a, b=1, c=2, d=3):
    print a, b, c, d
    # If there is no explicit “return”, this is equivalent to
    # return None

# Invoke it:
Print(0)    # prints 0 1 2 3
Print(0, 4, 8 )    # prints 0 4 8 3
Print(0, c=5)    # prints 0 1 5 3
Print(0, c=7, 33)    # syntax error “non-keyword arg after keyword arg”

Returning More Than One Value

It is possible to return more than one value. More precisely, you can return a tuple and assign the elements of the tuple to individual variables. There is no magic involved here. We’ve seen all of this in the section about Tuples already.

def GetPerson():
    return "John", 4

name, age = GetPerson()
print name, age
person = GetPerson()
print person

Parameter Passing

When you pass parameters to a function, a reference to the passed object is added to the local scope of the function.

  • When you modify the object in-place, the caller sees the changes, because the local parameter name is a reference to the same object that the caller sees.
  • When you assign a new value to the local parameter name, the object doesn’t change. The parameter name is now a reference to a different object, but this does not affect the contents of the original object.
def Func(a, b, c):
    a += "!!!"
    b.append(5)
    c = []

txt = "???"
list1 = [1, 2, 3]
list2 = [1.0, 2.0]
Func(txt, list1, list2)
print "txt =", txt
print "list1 =", list1
print "list2 =", list2

Output:

txt = “???”
list1 = [1, 2, 3, 5]
list2 = [1.0, 2.0]

This is what happens:

  • When the function is invoked, it creates three local names that contain references to the passed objects: a = txt, b = list1, c = list2
  • The line a += “!!!” assigns a new object reference to the local name a. It does not change the string in-place. (Strings cannot be changed in-place at all because they are immutable objects.)
  • The line b.append(5) appends a value to the object pointed to by b. This is the same object that the caller knows by the name list1.
  • The line c = [] assigns a new object reference to the local name c. It does not modify the object that c referred to earlier.

Working with Text Files

The built-in “open” function can be used to return file objects of existing or new files. The file objects contain “read” and “write” methods. For writing to a file, the “print” keyword supports the “>>” operator.

This code creates a new file:

f = open("new.txt", "w")
f.write("This is the first linen")
print >> f, "The second line"
print >> f, "n".join(["a", "b", "c", "d"])
f.close()

The new file contains these lines:

This is the first line
The second line
a
b
c
d

There are several ways of reading a text file. You can read the entire contents at once as a string:

>>> open("new.txt").read()
… 'This is the first linenThe second linenanbncndn'

You can read the entire contents at once as a list of lines:

>>> open("new.txt").readlines()
… ['This is the first linen', 'The second linen', 'an', 'bn', 'cn', 'dn']

You can iterate over the file object, which reads the lines one by one:

>>> for ln in open("new.txt"):
>>>     print ln.rstrip("n")    # strip the newline at the end of each line
… This is the first line
… The second line
… a
… b
… c
… d

You can call the “readline” method repeatedly until it returns an empty string:

>>> f = open("new.txt")
>>> while True:
>>>     ln = f.readline()
>>>     if not ln:
>>>         break
>>>     print ln.rstrip("n")    # strip the newline at the end of each line
… This is the first line
… The second line
… a
… b
… c
… d

Using the Standard Library

Python comes with a comprehensive library of modules for various tasks. See the Global Module Index in the Python Manuals for a list of available modules.

This is a list of some of the most commonly used modules and their purpose:

  • os: Operating system interfaces for working with files and directories, accessing environment variables, invoking external programs, etc.
  • sys: Command-line arguments, file objects for STDOUT and STDERR, internal interpreter variables, etc.
  • pprint: Pretty-print Python objects such as lists and dicts
  • StringIO: Provides the StringIO class, which behaves like an in-memory file. Similar to std::stringstream in C++.
  • unittest: Unit-testing framework
  • Many others…

To use any of these modules, use the “import” keyword. Once imported, you can use the classes and functions defined in the module:

>>> import os
>>> help(os)
… help text stripped
>>> os.listdir("c:\temp")
… ['many', 'files', 'in', 'here']
>>> from os import listdir
>>> listdir("c:\temp")

We’ll take a closer look at the standard library in the next lessons.

]]>
http://realmike.org/blog/2012/06/07/python-training-part-1/feed/ 0