<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>realmike.org</title>
	<atom:link href="http://realmike.org/blog/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://realmike.org/blog</link>
	<description>Python and C++, GNU/Linux, computer stuff...</description>
	<lastBuildDate>Thu, 27 Sep 2012 20:00:23 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Embedding Python &#8211; Tutorial &#8211; Part 1</title>
		<link>http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/</link>
		<comments>http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/#comments</comments>
		<pubDate>Sun, 08 Jul 2012 14:56:42 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[embedding]]></category>
		<category><![CDATA[europython]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=680</guid>
		<description><![CDATA[This is a follow-up to my talk at EuroPython 2012, &#8220;Supercharging C++ Code with Embedded Python&#8220;. An embedded Python interpreter allows users to extend the functionality of the program by writing Python plug-ins. In this series of tutorials, I will give you step-by-step instructions on how to use the Python/C API to do this. I <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p>This is a follow-up to my talk at EuroPython 2012, &#8220;<a title="Supercharging C++ Code With Embedded Python – EuroPython 2012 Talk" href="http://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/">Supercharging C++ Code with Embedded Python</a>&#8220;. An embedded Python interpreter allows users to extend the functionality of the program by writing Python plug-ins. In this series of tutorials, I will give you step-by-step instructions on how to use the Python/C API to do this.</p>
<p>I assume that you know how to write and compile C/C++ programs. If you have prior experience with writing Python extension modules, it may be helpful, although it&#8217;s not required. In <a title="Python Extensions In C++ Using SWIG" href="http://realmike.org/blog/2010/07/18/python-extensions-in-cpp-using-swig/">my article about extending Python</a>, you can find instructions for setting up your Makefiles/workspaces when working with the Python/C API.</p>
<h2>The Example Program</h2>
<p>In this part, we&#8217;re going to add Python plug-ins to a simple C++ program. The program reads lines of text from STDIN and outputs them unmodified to STDOUT.</p>
<pre>#include &lt;iostream&gt;
#include &lt;string&gt;

int main(int argc, char* argv[])
{
    std::clog &lt;&lt; "Type lines of text:" &lt;&lt; std::endl;
    std::string input;
    while (true)
    {
        std::getline(std::cin, input);
        if (!std::cin.good())
        {
            break;
        }
        std::cout &lt;&lt; input &lt;&lt; std::endl;
    }
    return 0;
}</pre>
<p>The user should now be able to write a Python plug-in that modifies the string before it is printed. These plug-ins will look something like this:</p>
<pre># elmer_fudd_filter.py
def filterFunc(s):
    return s.replace("r", "w").replace("l", "w")

# shout_filter.py
def filterFunc(s):
    return s.upper()</pre>
<p>To make this work, we will link the program to the Python interpreter and use the Python/C API to import the plug-in module and to invoke the &#8220;filterFunc()&#8221; function inside it.</p>
<p><span id="more-680"></span></p>
<p>As a first step in that direction, we simply initialize the Python interpreter with &#8220;Py_Initialize()&#8221; and shut it down with &#8220;Py_Finalize()&#8221;.</p>
<pre>#include &lt;Python.h&gt;
#include &lt;iostream&gt;
#include &lt;string&gt;

int main(int argc, char* argv[])
{
    Py_Initialize();
    std::clog &lt;&lt; "Type lines of text:" &lt;&lt; std::endl;
    std::string input;
    while (true)
    {
        std::getline(std::cin, input);
        if (!std::cin.good())
        {
            break;
        }
        std::cout &lt;&lt; input &lt;&lt; std::endl;
    }
    Py_Finalize();
    return 0;
}</pre>
<p>According to the <a href="http://docs.python.org/c-api/intro.html#includes">Python documentation</a>, the include directive for &#8220;Python.h&#8221; should appear first in the C/C++ file. It doesn&#8217;t sound like a good idea to require this, but if you don&#8217;t put &#8220;Python.h&#8221; first, you might get compiler warnings like these:</p>
<pre>In file included from /usr/include/python2.7/Python.h:8:0,
                 from program.cpp:3:
/usr/include/python2.7/pyconfig.h:1161:0: warning: "_POSIX_C_SOURCE" redefined [enabled by default]
/usr/include/features.h:164:0: note: this is the location of the previous definition
/usr/include/python2.7/pyconfig.h:1183:0: warning: "_XOPEN_SOURCE" redefined [enabled by default]
/usr/include/features.h:166:0: note: this is the location of the previous definition</pre>
<p>To compile and link the program, you need to have the Python headers and the static Python library on your machine. On Windows, the headers are in the &#8220;include&#8221; directory of the Python installation, and the .lib file is in the &#8220;libs&#8221; directory. On GNU/Linux, these files come with the Python development package. On my Ubuntu 12.04 system, I had to install &#8220;python-dev&#8221;: &#8220;sudo apt-get install python-dev&#8221;.</p>
<p>Here&#8217;s an example command line to build the program using GCC:</p>
<pre>g++ -o program program.cpp -I/usr/include/python2.7 -Wall -lpython2.7</pre>
<p>If you are on Windows and using Visual Studio, see my article about extending Python if you need help setting up your project.</p>
<h2>Invoking the Plug-In</h2>
<p>Once this works, we can start adding the code to call the Python plug-in. To find the appropriate Python/C API calls, it always helps to think about the equivalent Python code first. Written in Python, our finished program might look something like this:</p>
<pre>PLUGIN_NAME = "shout_filter"

def CallPlugIn(ln):
    plugin = __import__(PLUGIN_NAME)
    filterFunc = getattr(plugin, "filterFunc")
    args = (ln,)
    ln = filterFunc(*args)
    return ln

while True:
    ln = raw_input()
    if not ln:
        break
    ln = CallPlugIn(ln)
    print ln</pre>
<p>The &#8220;CallPlugIn()&#8221; function imports the &#8220;shout_filter.py&#8221; module, retrieves the &#8220;filterFunc()&#8221; function, and invokes it. The function could certainly be written in a more concise, more Pythonic way. However, it&#8217;s easier to find the corresponding Python/C API calls when the code is broken down into its basic building blocks like &#8220;__import__()&#8221; and &#8220;getattr()&#8221;.</p>
<p>By digging through the <a href="http://docs.python.org/c-api/index.html">Python/C API reference manual</a>, we can find the API calls for each piece of Python code:</p>
<table>
<tbody>
<tr>
<th>Python</th>
<th>Python/C API</th>
</tr>
<tr>
<td><code>__import__</code></td>
<td><code>PyImport_Import()</code></td>
</tr>
<tr>
<td><code>getattr</code></td>
<td><code>PyObject_GetAttrString()</code></td>
</tr>
<tr>
<td><code>args = (ln,)</code></td>
<td><code>Py_BuildValue()</code></td>
</tr>
<tr>
<td><code>filterFunc(*args)</code></td>
<td><code>PyObject_CallObject()</code></td>
</tr>
</tbody>
</table>
<p>Thus, the first attempt at writing the &#8220;CallPlugIn()&#8221; function in C++ looks like this:</p>
<pre>static const char* PLUGIN_NAME = "shout_filter";

std::string CallPlugIn(const std::string&amp; ln)
{
    PyObject* pluginModule = PyImport_Import(PyString_FromString(PLUGIN_NAME));
    PyObject* filterFunc = PyObject_GetAttrString(pluginModule, "filterFunc");
    PyObject* args = Py_BuildValue("(s)", ln.c_str());
    PyObject* result = PyObject_CallObject(filterFunc, args);
    return PyString_AsString(result);
}</pre>
<p>In the main routine, call the function before printing the line of text:</p>
<pre>std::cout &lt;&lt; CallPlugIn(input) &lt;&lt; std::endl;</pre>
<p>In its current form, the &#8220;CallPlugIn()&#8221; function has two major problems:</p>
<ul>
<li>There&#8217;s no error checking. When anything goes wrong (the module can&#8217;t be imported, &#8220;filterFunc()&#8221; raises an exception, etc.), the program will likely crash.</li>
<li>There are memory leaks. We create a number of objects, but we never decrement their reference counts. Eventually, the program will run out of memory.</li>
</ul>
<p>We will fix the leaks later. For now, let&#8217;s at least return an error message if anything goes wrong. Most API calls that return a &#8220;PyObject*&#8221; return a NULL pointer if an error occurred. In this case, it is the caller&#8217;s responsibility to handle/report the error and to clear Python&#8217;s internal error indicator with &#8220;PyErr_Clear()&#8221;. To print the traceback of the last error, use &#8220;PyErr_Print()&#8221;, which has the side effect of also clearing the error indicator. It&#8217;s important to clear the error as soon as possible, otherwise subsequent Python calls might fail in unexpected ways or give you misleading error messages.</p>
<pre>std::string CallPlugIn(const std::string&amp; ln)
{
    PyObject* pluginModule = PyImport_Import(PyString_FromString(PLUGIN_NAME));
    if (!pluginModule)
    {
        PyErr_Print();
        return "Error importing module";
    }
    PyObject* filterFunc = PyObject_GetAttrString(pluginModule, "filterFunc");
    if (!filterFunc)
    {
        PyErr_Print();
        return "Error retrieving 'filterFunc'";
    }
    PyObject* args = Py_BuildValue("(s)", ln.c_str());
    if (!args)
    {
        PyErr_Print();
        return "Error building args tuple";
    }
    PyObject* result = PyObject_CallObject(filterFunc, args);
    if (!result)
    {
        PyErr_Print();
        return "Error invoking 'filterFunc'";
    }
    const char* cResult = PyString_AsString(result);
    if (!cResult)
    {
        PyErr_Print();
        return "Error converting result to C string";
    }
    return cResult;
}</pre>
<p>Create the file &#8220;shout_filter.py&#8221; in the same directory that you run the program from and add a valid &#8220;filterFunc()&#8221; function:</p>
<pre>def filterFunc(ln):
    return ln.upper()</pre>
<p>When you run the program now, you get an error (most likely): &#8220;Error importing module&#8221;. Why is that?</p>
<p>Normally, when importing a module, Python tries to find the module file next to the importing module (the module that contains the import statement). Python then tries the directories in &#8220;sys.path&#8221;. The current working directory is usually not considered. In our case, the import is performed via the API, so there is no importing module in whose directory Python could search for &#8220;shout_filter.py&#8221;. The plug-in is also not on &#8220;sys.path&#8221;. One way of enabling Python to find the plug-in is to add the current working directory to the module search path by doing the equivalent of &#8220;sys.path.append(&#8216;.&#8217;)&#8221; via the API.</p>
<pre>Py_Initialize();
PyObject* sysPath = PySys_GetObject((char*)"path");
PyList_Append(sysPath, PyString_FromString("."));</pre>
<p>Run the program again and type a few lines of text. Everything should work now. You can also try to deliberately introduce errors into the plug-in function and see whether they are caught by the program.</p>
<p>When you think about it, this is quite an achievement: You just added a full-fledged scripting language to your program with an amazingly small amount of code.</p>
<h2>Reference Counting</h2>
<p>As I mentioned, this program leaks memory pretty badly. For example, every time we run &#8220;Py_BuildValue()&#8221;, a tuple and a string object are created, but they are never freed. The reference count (refcount) of the tuple is initially 1 and we never decrement it, so the object remains alive forever. The next time we run &#8220;CallPlugIn()&#8221;, a new object is created.</p>
<p>Whenever you receive a &#8220;PyObject*&#8221; via the Python C/API, you need to figure out whether you are responsible for decrementing its refcount. The API docs distinguish between these cases:</p>
<ul>
<li><strong>New reference.</strong> The refcount has been incremented before the object was returned to the caller. The caller is reponsible for decrementing the refcount with &#8220;Py_DECREF()&#8221; when the object is no longer needed.</li>
<li><strong>Borrowed reference.</strong> The refcount has <strong>not</strong> been incremented before the object was returned. For example, the &#8220;PyTuple_GetItem()&#8221; function, which returns an item of a tuple, returns a borrowed reference. You can work with the item normally, at least as long the item is still in the tuple. When the tuple is destroyed, though, the item may be destroyed (if the refcount reaches zero). If you need to keep a reference to the item for longer, you are responsible for incrementing the item&#8217;s refcount yourself with &#8220;Py_INCREF()&#8221;.</li>
</ul>
<p>The docs also talk about &#8220;<strong>stolen references</strong>&#8220;. Sometimes when you pass an object to an API, the API will &#8220;steal&#8221; the reference, which means that the API will take care of decrementing the refcount at some point and that the caller must refrain from doing the same.</p>
<p>The &#8220;CallPlugIn()&#8221; function with added reference counting:</p>
<pre>std::string CallPlugIn(const std::string&amp; ln)
{
    PyObject* name = PyString_FromString(PLUGIN_NAME);
    PyObject* pluginModule = PyImport_Import(name);
    Py_DECREF(name);
    if (!pluginModule)
    {
        PyErr_Print();
        return "Error importing module";
    }
    PyObject* filterFunc = PyObject_GetAttrString(pluginModule, "filterFunc");
    Py_DECREF(pluginModule);
    if (!filterFunc)
    {
        PyErr_Print();
        return "Error retrieving 'filterFunc'";
    }
    PyObject* args = Py_BuildValue("(s)", ln.c_str());
    if (!args)
    {
        PyErr_Print();
        Py_DECREF(filterFunc);
        return "Error building args tuple";
    }
    PyObject* resultObj = PyObject_CallObject(filterFunc, args);
    Py_DECREF(filterFunc);
    Py_DECREF(args);
    if (!resultObj)
    {
        PyErr_Print();
        return "Error invoking 'filterFunc'";
    }
    const char* resultStr = PyString_AsString(resultObj);
    if (!resultStr)
    {
        PyErr_Print();
        Py_DECREF(resultObj);
        return "Error converting result to C string";
    }
    std::string result = resultStr;
    Py_DECREF(resultObj);
    return result;
}</pre>
<p>Note that I try to decrement the refcount of each object as soon as possible. For example, after retrieving the &#8220;filterFunc&#8221; callable from the &#8220;pluginModule&#8221; object, we can immediately decrement the refcount of the &#8220;pluginModule&#8221; object. The underlying module will not go away, since its reference count is not zero yet.</p>
<p>We also need to make sure that the refcount is properly decremented even if we leave the function early due to an error. For example, when we fail to build the arguments tuple, we decrement the refcount of the &#8220;filterFunc&#8221; (the only object we have a reference to at that point in the code) before leaving the function.</p>
<p>At the end of the function, we must not decrement the refcount of the &#8220;resultObj&#8221; string object before we have created a copy of the underlying C string. (The pointer returned by &#8220;PyString_AsString()&#8221; is only valid as long as the string object has a refcount greater than zero.)</p>
<p>We created another temporary object when setting up &#8220;sys.path&#8221;. This must be freed as well:</p>
<pre>Py_Initialize();
PyObject* sysPath = PySys_GetObject((char*)"path");
PyObject* curDir = PyString_FromString(".");
PyList_Append(sysPath, curDir);
Py_DECREF(curDir);</pre>
<p>Note that &#8220;Py_DECREF()&#8221; is not called on &#8220;sysPath&#8221; since that one is a borrowed reference.</p>
<p>You might already see the problem with this: It is way too easy to make mistakes. If you forget to call &#8220;Py_DECREF()&#8221;, you have a leak. If you call &#8220;Py_DECREF()&#8221; on a borrowed reference, you probably cause a crash. With error handling mixed in, it&#8217;s very easy to cause both kinds of problems. Using a C++ library that wraps PyObjects and takes care of reference counting solves these issues (mostly). This will be the topic of a future tutorial.</p>
<h2>Debugging Memory Leaks</h2>
<p>Sometimes you will need to debug memory issues no matter whether you are using a C++ wrapper or not. For this, it is very useful to have a debug version of the Python interpreter. On GNU/Linux, you can usually just install it from the repositories. For example, on my Ubuntu 12.04 system, I have to &#8220;sudo apt-get install python-dbg&#8221;. I then build the program with debug options:</p>
<pre>g++ -o program program.cpp -I/usr/include/python2.7 -Wall -DPy_DEBUG -g -lpython2.7_d</pre>
<p>(Don&#8217;t forget the &#8220;Py_DEBUG&#8221; preprocessor definition when linking against the debug interpreter. Otherwise, you might see crashes and errors like: &#8220;Fatal Python error: UNREF invalid object&#8221;.)</p>
<p>On Windows, you might have to compile a debug version of the Python interpreter yourself.</p>
<p>One of the things that a debug interpreter allows you to do is query the total reference count of all objects. By calling the function &#8220;sys.gettotalrefcount()&#8221; at different points in your program, you can check whether this number remains stable.</p>
<pre>void PrintTotalRefCount()
{
#ifdef Py_REF_DEBUG
    PyObject* refCount = PyObject_CallObject(PySys_GetObject((char*)"gettotalrefcount"), NULL);
    std::clog &lt;&lt; "total refcount = " &lt;&lt; PyInt_AsSsize_t(refCount) &lt;&lt; std::endl;
    Py_DECREF(refCount);
#endif
}

...

int main(int argc, char* argv)
{
    ...
    std::cout &lt;&lt; CallPlugIn(input) &lt;&lt; std::endl;
    PrintTotalRefCount();
    ...
}</pre>
<p>If you try to remove one of the &#8220;Py_DECREF()&#8221; calls in &#8220;CallPlugIn()&#8221;, you will notice that the total reference count goes up after each invocation.</p>
<h2>Possible Improvements</h2>
<p>The &#8220;CallPlugIn()&#8221; function in its current form is slightly inefficient. We don&#8217;t really have to re-import the plug-in module and retrieve the &#8220;filterFunc()&#8221; function each time a line of text needs to be transformed. (It&#8217;s not as bad as it may appear, though. Once the module has been imported, it remains in &#8220;sys.modules&#8221;, so each time we call &#8220;PyImport_Import()&#8221;, we receive a reference to the existing module object.) One possible optimization would be to keep a reference to the &#8220;filterFunc&#8221; object during the entire lifetime of the program. Then, for each invocation of &#8220;CallPlugIn()&#8221;, we&#8217;d merely have to build an arguments tuple and invoke &#8220;filterFunc()&#8221;.</p>
<p>If you implement this optimization, though, try to keep your regular C++ code separate from the parts that require knowledge of the Python/C API. It is nice to only use standard C++ types in the interface of the &#8220;CallPlugIn()&#8221; and be able to keep the PyObjects and Python/C API calls an implementation detail.</p>
<h2>Next Time</h2>
<p>In the next part, we&#8217;ll start with a more complex project. Among other things, we will give the Python plug-ins access to the application classes.</p>
<p>You can download the <a href="http://realmike.org/blog/wp-content/uploads/2012/07/embedding_tutorial_1.zip">complete source code</a> for this tutorial.</p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Supercharging C++ Code With Embedded Python &#8211; EuroPython 2012 Talk</title>
		<link>http://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/</link>
		<comments>http://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/#comments</comments>
		<pubDate>Thu, 05 Jul 2012 14:15:13 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[embedding]]></category>
		<category><![CDATA[ep2012]]></category>
		<category><![CDATA[europython]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[talk]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=615</guid>
		<description><![CDATA[This is the talk that I gave at EuroPython 2012 in Florence, Italy. It was a 60-minute talk, so it&#8217;s light on technical details. I am planning to publish follow-up articles that provide step-by-step instructions along with complete code examples. The first part of the tutorial is available. If you want to know when the <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><em>This is the talk that I gave at EuroPython 2012 in Florence, Italy. It was a 60-minute talk, so it&#8217;s light on technical details. I am planning to publish follow-up articles that provide step-by-step instructions along with complete code examples. <strong>The <a title="Embedding Python – Tutorial – Part 1" href="http://realmike.org/blog/2012/07/08/embedding-python-tutorial-part-1/">first part of the tutorial</a> is available.</strong> If you want to know when the next part will become available, <a href="http://realmike.org/blog/feed/">subscribe to the RSS</a> or add me on <a href="https://profiles.google.com/mfoetsch">Google+</a> or on <a href="https://twitter.com/mfoetsch">Twitter</a>.</em></p>
<p><iframe src="http://www.youtube.com/embed/aK8gDUUBMiM" frameborder="0" width="560" height="315"></iframe><br />
(<a href="http://youtu.be/aK8gDUUBMiM">Watch on YouTube</a>.)</p>
<p>You can download the <a href="http://realmike.org/blog/wp-content/uploads/2012/07/slides.pdf">slides in PDF format</a> and the <a href="http://realmike.org/blog/wp-content/uploads/2012/07/unbranded_slides1.zip">slide sources in SVG format.</a> (Please note the <a href="#licensing">licensing restrictions</a>.)</p>
<p><em><strong>BTW, our team is hiring.</strong> If you&#8217;re interested in extending/embedding Python, or just interested in Python in general, you should definitely <a href="http://www.spielo.com/locations/graz">get in touch with us</a>. Benefits of the position include an agile development process, a variety of projects to work on&#8230;and a <em>beach within walking distance.</em></em></p>
<h2>About Me / About SPIELO</h2>
<p>I work as a software architect in the mathematics department at SPIELO International in Graz, Austria.</p>
<p>SPIELO International designs, manufactures and distributes cabinets, games, central systems and associated software for diverse gaming segments, including distributed government-sponsored markets and commercial casino markets.</p>
<p>Our team is responsible for the mathematical game engine that controls all payout-relevant aspects of the game. Part of the engine is an embedded Python interpreter.</p>
<h2>Embedded Python: What is it? When to use it?</h2>
<p>This talk is about embedding the Python interpreter in a C/C++ program and using the Python/C API to run Python scripts inside the program.</p>
<div id="attachment_659" class="wp-caption alignnone" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/embedding.png"><img class="size-medium wp-image-659" title="embedding" src="http://realmike.org/blog/wp-content/uploads/2012/06/embedding-300x173.png" alt="" width="300" height="173" /></a><p class="wp-caption-text">Embedding in a nutshell: Put Python inside your app to run scripts.</p></div>
<p>Here are some examples of what you can do with this technique.</p>
<p><strong>Plug-in/extension language.</strong> This is the &#8220;Microsoft Word macro&#8221; use case, in which users can extend the functionality of the program by writing their own scripts. Let&#8217;s say a users wants to apply random formatting to each word in the text. A simple macro does the trick without requiring you to add a feature that most users don&#8217;t need:</p>
<div id="attachment_660" class="wp-caption alignnone" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/typewriter_plus_code.png"><img class="size-medium wp-image-660 " title="typewriter_plus_code" src="http://realmike.org/blog/wp-content/uploads/2012/06/typewriter_plus_code-300x213.png" alt="" width="300" height="213" /></a><p class="wp-caption-text">Embedding example: Scripting a word processor.</p></div>
<p><span id="more-615"></span></p>
<p><strong>Test automation.</strong> Automated testing exercises a program&#8217;s functionality by following pre-defined steps. That&#8217;s not much different from the macro scenario. Throw in a few &#8220;assert&#8221; statements and you have a test case. For testing purposes, the embedded Python scripts might have access to functionality that would be unsafe or useless for macros.</p>
<p><strong>Game engines.</strong> Video games have been using built-in scripting languages for a long time. While performance-critical parts like graphics, physics, and low-level AI are written in C++, the scripts control things like high-level enemy behavior, map generation, and scripted events. Civilization IV is an example of a game that uses Python for this.</p>
<pre>class Guard(Enemy):
    def OnGettingHit(self, actor):
        self.findCover()
        if self.distanceTo(actor) &lt; self.maxShootingDistance:
            self.shootAt(actor)
        else:
            self.team.setAlarmed(True)</pre>
<p>All of this can be done directly in C++, of course, but there are certain advantages to using an embedded Python interpreter:</p>
<ul>
<li><strong>Ease of use.</strong> You can&#8217;t expect the users of your program or the level designers on your team to write plug-ins in C++, but many of them will be willing to learn a bit of Python to get their jobs done.</li>
<li><strong>Sandboxing.</strong> Plug-ins written in C++ can do pretty much anything on the computer. This may be a security issue. A stripped-down Python interpreter, on the other hand, provides a restricted execution environment for plug-ins.</li>
<li><strong>Flexibility.</strong> Even if you do know C++, you can try new ideas faster inside a Python script. Also, Python&#8217;s reflection capabilities open up new possibilities for automated testing.</li>
</ul>
<h2>Extending Python Recap</h2>
<p><a href="http://docs.python.org/extending/index.html">Extending and embedding Python</a> are closely related. Extending Python means taking existing C/C++ code and making its data types and functions available to Python programs. Whenever a C/C++ library offers &#8220;Python bindings&#8221;, it is a Python extension library that uses the Python/C API to plug into the interpreter.</p>
<p>Here&#8217;s the big picture.</p>
<div id="attachment_661" class="wp-caption alignnone" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/extend_my_c_lib.png"><img class="size-medium wp-image-661" title="extend_my_c_lib" src="http://realmike.org/blog/wp-content/uploads/2012/06/extend_my_c_lib-300x185.png" alt="" width="300" height="185" /></a><p class="wp-caption-text">Using an extension module from Python</p></div>
<p>When the Python code invokes the C++ function &#8220;Sum()&#8221;, three things need to happen:</p>
<ul>
<li>The parameters, 5 and 3.2, need to be converted from Python objects to the &#8220;int&#8221; objects that the C++ code understands.</li>
<li>The parameters need to be passed to the Sum() function and the CPU needs to execute the function.</li>
<li>The return value (8 if you&#8217;ve been paying attention) needs to be converted form an &#8220;int&#8221; to a Python object.</li>
</ul>
<p>The Python program is compiled to Python opcode and executed by the Python interpreter. The C++ code is compiled to CPU instructions and executed directly by the CPU. Furthermore, what Python sees when we say &#8220;5&#8243; or &#8220;3.2&#8243; is very different from what the compiled C++ code expects to see. These are very distinct (and seemingly incompatible) worlds.</p>
<div id="attachment_663" class="wp-caption alignnone" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/extend_my_c_lib_low.png"><img class="size-medium wp-image-663" title="extend_my_c_lib_low" src="http://realmike.org/blog/wp-content/uploads/2012/06/extend_my_c_lib_low-300x203.png" alt="" width="300" height="203" /></a><p class="wp-caption-text">Incompatible worlds: Python bytecodes vs. CPU instructions</p></div>
<div id="attachment_664" class="wp-caption alignnone" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/extend_my_c_lib_low_2.png"><img class="size-medium wp-image-664" title="extend_my_c_lib_low_2" src="http://realmike.org/blog/wp-content/uploads/2012/06/extend_my_c_lib_low_2-300x192.png" alt="" width="300" height="192" /></a><p class="wp-caption-text">Incompatible worlds: PyObjects vs. C++ data types</p></div>
<p>We said that the Python and C++ worlds are &#8220;seemingly incompatible&#8221;, but they are really the same world. At its lowest level, the Python interpreter is also just a C program. When you use Python&#8217;s built-in data types or the standard library modules, the interpreter calls C functions sooner or later. For example, here&#8217;s the (slightly simplified) C code of the &#8220;math.radians()&#8221; function that converts an angle from degrees to radians:</p>
<pre>static PyObject* math_radians(PyObject* self, PyObject* arg)
{
    double x = <strong>PyFloat_AsDouble</strong>(arg);
    return <strong>PyFloat_FromDouble</strong>(x * PI / 180.0);
}</pre>
<p>When this function is called in the Python code, for example as &#8220;math.radians(3.2)&#8221;, the argument is a Python object. The Python interpreter knows how to invoke the C function, as long as the C function accepts a Python object as the argument and returns a Python object as the result. Internally, the C code uses the Python interpreter&#8217;s &#8220;PyFloat_AsDouble()&#8221; and &#8220;PyFloat_FromDouble()&#8221; functions to convert between Python objects and the C &#8220;double&#8221; data type.</p>
<p>Back to our &#8220;Sum()&#8221; function. For the Python interpreter to be able to invoke it, we need to rewrite it to take &#8220;PyObject&#8221; arguments (in this case a tuple of parameters) and return a &#8220;PyObject&#8221; result. Better yet, instead of rewriting the original &#8220;Sum()&#8221;, we can introduce a wrapper around &#8220;Sum()&#8221; that conforms to the required interface:</p>
<pre>static PyObject* WrapSum(PyObject* self, PyObject* args)
{
    PyObject* oa;
    PyObject* ob;
    <strong>PyArg_UnpackTuple</strong>(args, "pow", 2, 2, &amp;oa, &amp;ob);
    long a = <strong>PyInt_AsLong</strong>(oa);
    long b = <strong>PyInt_AsLong</strong>(ob);
    long result = <strong>Sum(a, b); // call the original Sum()</strong>
    return <strong>PyInt_FromLong</strong>(result);
}</pre>
<p>To turn this into an actual extension module, a module object needs to be created that contains the function that we defined. The following code is an unabridged example of this:</p>
<pre>static PyMethodDef MyLibMethods[] =
{
    {"Sum", WrapSum, METH_VARARGS, "Calculate the sum of two integers."},
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initmy_c_lib(void)
{
    (void)Py_InitModule("my_c_lib", MyLibMethods);
}</pre>
<p>This code is compiled and linked into a DLL/shared object named &#8220;my_c_lib.pyd&#8221; on Windows and &#8220;my_c_lib.so&#8221; on Linux. The resulting library can then be &#8220;imported&#8221; like any other Python module.</p>
<p>To summarize:</p>
<ul>
<li>An extension module is a DLL/shared object.</li>
<li>The functions inside the library take PyObjects as arguments and return PyObjects as their results.</li>
<li>The library uses Python&#8217;s conversion functions to convert between PyObjects and C data types.</li>
<li>&#8220;import&#8221; is used to load an extension module just like any old .py module.</li>
</ul>
<h3><strong>Making Your Life Easier</strong></h3>
<p>Writing extension modules in this way is repetitive, somewhat tedious, and error-prone. (And we haven&#8217;t even seen any error-checking or memory management code yet.) Here are two ways to simplify the process (I am sure there are more):</p>
<ul>
<li><strong>SWIG, the Simplified Wrapper and Interface Generator.</strong>This is a tool that parses your C/C++ header files and automatically generates the code for an extension module. Using SWIG is very easy in the simple cases, but it can be a bit fiddly in the complex cases. We have been using it successfully for mission-critical projects at SPIELO and I would recommend it any time.
<ul>
<li>See also the tutorial <a href="http://realmike.org/blog/2010/07/18/python-extensions-in-cpp-using-swig/">Python Extensions In C++ Using SWIG</a> on this site.</li>
</ul>
</li>
<li><strong>Boost.Python</strong> is a library that basically wraps the Python/C API in C++ template classes. Compiling Boost.Python code requires a fairly modern C++ compiler and lots of memory and patience during the compilation process. For various reasons, we decided against using it for the projects at SPIELO, but that&#8217;s not to say you shouldn&#8217;t try it for yourself.</li>
</ul>
<h2>From Extending To Embedding</h2>
<p>With an extension module, it is the Python executable that invokes functions in the C++ code when the .py code requests it. The main program is Python and the extension module merely provides services to the Python interpreter.</p>
<div id="attachment_665" class="wp-caption alignnone" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/extending_vs_embedding.png"><img class="size-medium wp-image-665" title="extending_vs_embedding" src="http://realmike.org/blog/wp-content/uploads/2012/06/extending_vs_embedding-300x165.png" alt="" width="300" height="165" /></a><p class="wp-caption-text">Extending</p></div>
<p>On the other hand, when you embed the Python interpreter in a C++ program, you are using the interpreter as a library. Just like you would use the Expat library to parse XML files, you can use the Python interpreter as a library to execute Python source code (inside .py files or otherwise).</p>
<div id="attachment_666" class="wp-caption alignnone" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/extending_vs_embedding_2.png"><img class="size-medium wp-image-666" title="extending_vs_embedding_2" src="http://realmike.org/blog/wp-content/uploads/2012/06/extending_vs_embedding_2-300x200.png" alt="" width="300" height="200" /></a><p class="wp-caption-text">Embedding</p></div>
<p>Whether you are extending or embedding, you will be using the Python/C API in both cases. Embedding usually involves a fair amount of extending as well. After all, the Python code running inside the embedded interpreter will have to call back into the application to be useful. This means that your program will contain the same kind of wrapper functions that we saw in the earlier &#8220;Sum()&#8221; example for all the objects and functions that you want to make available to the Python plug-ins.</p>
<h3>High-Level Embedding</h3>
<p>As a first step, here&#8217;s the code to initialize the embedded interpreter, execute some Python source code, and shut down the interpreter.</p>
<pre>#include &lt;Python.h&gt;
int main(int argc, char* argv[])
{
    Py_Initialize();
    PyRun_SimpleString("name = raw_input('Who are you? ')\n"
                       "print 'Hi there, %s!' % name\n");
    Py_Finalize();
    return 0;
}</pre>
<p>This isn&#8217;t terribly exciting yet. You can&#8217;t pass any arguments to the Python code and you don&#8217;t receive any results. If you don&#8217;t go beyond this, it would be easier to just run &#8220;python.exe&#8221; as a sub-process.</p>
<h3>Simple Plug-In</h3>
<p>Let&#8217;s try a more involved use case. Here&#8217;s some C++ program that allows the user to write a Python plug-in to transform a string.</p>
<pre>void program()
{
    std::string input;
    std::cout &lt;&lt; "Enter string to transform: ";
    std::getline(std::cin, input);
    std::string transformed = <strong>CallPythonPlugIn</strong>(input);
    std::cout &lt;&lt; "The transformed string is: " &lt;&lt; transformed.c_str() &lt;&lt;
    std::endl;
}</pre>
<p>The magic is supposed to happen inside the &#8220;CallPythonPlugIn()&#8221; function that we&#8217;ll implement in a minute. This function will invoke a function named &#8220;transform()&#8221; in a user-provided Python file:</p>
<pre># Example "plugin.py"
def transform(s):
    return s.replace("e", "u").upper()</pre>
<p>With this, the &#8220;CallPythonPlugIn()&#8221; function might look something like this. (For brevity, I left out all of the error checking. In other words, don&#8217;t use the code as is! I will present a more complete implementation in a follow-up article.)</p>
<pre>// WARNING! This code doesn't contain error checks!
std::string CallPythonPlugIn(const std::string&amp; s)
{
    <em>// Import the module "plugin" (from the file "plugin.py")</em>
    PyObject* moduleName = PyString_FromString("plugin");
    PyObject* pluginModule = PyImport_Import(moduleName);
    <em>// Retrieve the "transform()" function from the module.</em>
    PyObject* transformFunc = PyObject_GetAttrString(pluginModule, "transform");
    <em>// Build an argument tuple containing the string.</em>
    PyObject* argsTuple = Py_BuildValue("(s)", s.c_str());
    <em>// Invoke the function, passing the argument tuple.</em>
    PyObject* result = PyObject_CallObject(transformFunc, argsTuple);
    <em>// Convert the result to a std::string.</em>
    std::string resultStr(PyString_AsString(result));
    <em>// Free all temporary Python objects.</em>
    Py_DECREF(moduleName); Py_DECREF(pluginModule); Py_DECREF(transformFunc);
    Py_DECREF(argsTuple); Py_DECREF(result);

    return resultStr;
}</pre>
<p>The &#8220;CallPythonPlugIn()&#8221; function is roughly equivalent to this Python code:</p>
<pre>def CallPythonPlugIn(s):
    pluginModule = __import__("plugin")
    transformFunc = getattr(pluginModule, "transform")
    argsTuple = (s,)
    result = transformFunc(*args)
    return result</pre>
<p>And that&#8217;s the whole secret of embedding Python: Once you know what you&#8217;d like the Python interpreter to do, it&#8217;s a matter of mapping the Python code to the respective Python/C API functions using the <a href="http://docs.python.org/c-api/">reference docs</a>.</p>
<h3>Extending And Embedding Combined</h3>
<p>At some point, you will want to access your C++ functionality from the Python plug-ins. For example, you might want to invoke the &#8220;Sum()&#8221; function that we wrapped earlier:</p>
<pre>// Example "plugin.py"
<strong>import the_program</strong>
def transform(s):
    return <strong>the_program.TransformHelper(s)</strong>.lower()</pre>
<p>In this case, we don&#8217;t want the module &#8220;the_program&#8221; to be an extension module in a separate shared object. Instead, the module should live in the program itself so that it has access to the program&#8217;s internals.</p>
<p>The C++ function &#8220;TransformHelper()&#8221; needs to be wrapped using the same techniques that we applied to the &#8220;Sum()&#8221; function in an earlier example.</p>
<pre>static PyObject* WrapTransformHelper(PyObject* self, PyObject* arg)
{
    const char* str = PyString_AsString(arg);
    std::string result = TransformHelper(str);  // invoke the C++ function
    return PyString_FromString(result.c_str());
}

// Register the wrapped functions.
static PyMethodDef TheProgramMethods[] =
{
    {"TransformHelper", WrapTransformHelper, METH_O, "Transforms a string."},
    {NULL, NULL, 0, NULL}
};

// Somewhere in your program, initialize the module. This is all
// that's required to allow the plug-in to run "import the_program".
Py_InitModule("the_program", TheProgramMethods);</pre>
<p>With this, the plug-in can invoke the internal functionality of the program. Of course, it is also possible to wrap entire C++ classes and not just functions. This is a topic for a follow-up article.</p>
<h3>Summary</h3>
<p>Embedding involves these tasks:</p>
<ul>
<li>Using the Python/C API to do things that you would normally do in Python. The <a href="http://docs.python.org/c-api/">reference docs</a> help you find the right function to do the job.</li>
<li>Lots of converting from PyObjects to and from C/C++ data types, just like with extending.</li>
<li>Wrapping the internal objects of your program so that the embedded Python code has access to them, just like with extending.</li>
</ul>
<h3>C++ Wrappers for the Python/C API</h3>
<p>Using a C++ library that wraps the Python/C API offers several advantages:</p>
<ul>
<li>Integration with C++ data types such as std::string, iostream, etc.</li>
<li>Simplified error handling using exceptions</li>
<li>Avoids memory leaks by taking care of the reference count of PyObjects</li>
</ul>
<p>We have used two libraries in the past:</p>
<ul>
<li>Boost.Python. Makes heavy use of C++ templates.</li>
<li>PyCXX. Simple and straightforward library.</li>
</ul>
<p>At SPIELO, we&#8217;re currently not using any wrapper library for the embedded interpreter. We do, however, use our own code generators to generate large parts of the most repetitive glue code, which reduces the need for C++ wrappers.</p>
<h2>SPIELO Case Study</h2>
<p>In our mathematical game engine, embedded Python allows mathematicians and game designers to define game rules that go beyond the pre-defined building blocks that the engine has to offer. The game rules are encoded in a data file that is interpreted by the engine. At certain points in the game flow, Python plug-ins may be invoked to check for additional winning conditions, award special prizes, change the game flow, etc.</p>
<p>The following sections briefly describe certain decisions we made when adding Python to the engine.</p>
<h3>Why Python?</h3>
<p>Before we added the embedded Python interpreter, our mathematical game engine already had a built-in scripting language. In fact, it was a bytecode interpreter that had to run on an ancient <a href="http://en.wikipedia.org/wiki/Z80">Z80 CPU</a> (one of our target platforms at the time), which meant its functionality was very limited: It had only a single register for storing intermediate results, a 255-byte limit for bytecode programs, no floating-point support, etc.</p>
<p>We had a C-like language on top of the bytecodes. When the limitations of the bytecode interpreter became too much of a burden, we initially considered extending this C-like language with killer features like local variables and sub-routines that would support actual parameter lists and return values.</p>
<p>But how do you design a powerful scripting language that&#8217;s easy to learn, readable, and extensible? The answer is, you use an existing language that gets it right, and Python had a lot going for it in this respect:</p>
<ul>
<li>The engine team had lots of experience with Python from other projects.</li>
<li>Most of the users of the engine had Python experience.</li>
<li>The Python project is mature and has a great community behind it.</li>
<li>The interpreter is light-weight and highly portable and its license fits our needs.</li>
</ul>
<p>Integrating Python into the game engine took us just a few weeks.</p>
<h3>To Fork Or Not To Fork</h3>
<p>Usually, you can embed the Python interpreter that&#8217;s already installed on the user&#8217;s system by dynamically linking your program to the installed Python DLL/shared object. This allows embedded Python scripts to use all packages that are available in the existing Python installation.</p>
<p>For us, on the other hand, it was not an option to rely on the versions of Python that come pre-packaged for our target platforms. First of all, there is not even an official Python port for some of these platforms (for example, Windows CE). Second, as the Python interpreter is directly responsible for evaluating parts of the game rules, it is subject to the same strict regulations as the rest of the game engine. Therefore, we are treating the Python interpreter as an integral part of the engine: We track its source code in the same repository as the rest of the engine and include it in the testing/release process of the engine.</p>
<p>To make Python compile on some platforms, we had to make changes to the source code. We usually just rip the parts out that don&#8217;t work well on all platforms and that we don&#8217;t need anyway. These changes are not suitable to be patched into mainline Python, so we essentially created a fork. The fork means that it is unlikely that we&#8217;ll upgrade to a newer version of Python any time soon, but it doesn&#8217;t really matter to plug-in authors what version of Python they&#8217;re using.</p>
<h3>Stripping Down And Sandboxing</h3>
<p>The Python standard library gives you access to operating system services, internet protocols, graphical user interfaces, and more. Most of this isn&#8217;t needed or even desired for a plug-in language.</p>
<p>For the mathematical game engine, we only included built-in modules, i.e., modules that are compiled directly into the interpreter and that don&#8217;t require additional .py files to operate. Of those modules, we only included the ones that plug-in authors actually need and that are safe to use. We don&#8217;t include things like networking or operating system support, because there&#8217;s no reason why the mathematical game engine should mess with the OS, open HTTP servers, or the like. Aside from the security concerns, we don&#8217;t want to give plug-in authors (more) opportunities to shoot themselves in the foot.</p>
<p>In a Python interpreter that&#8217;s part of, say, a word processor, security is a major concern. You don&#8217;t want a plug-in that&#8217;s contained in an email attachment to be able to change any files, send any network requests, or run any system commands without the user&#8217;s permission. In this case, it&#8217;s a good idea to compile your own Python interpreter and leave out all the dangerous bits.</p>
<h3>Embedded Debugging</h3>
<p>As the saying goes, with great power comes a great danger of bugs.</p>
<p>With a stand-alone Python program, you can step through the code in your favorite Python IDE. With an embedded interpreter, that&#8217;s not possible. Still, we wanted to give our users a nice graphical debugger for their plug-ins, integrated into the game editor in a similar way as the VBA Editor is integrated in Microsoft Office.</p>
<p>Our approach uses &#8220;PyEval_SetTrace()&#8221; to register a function that&#8217;s invoked when each source line is executed. This is almost enough to build all kinds of single-stepping (&#8220;Step Into&#8221;, &#8220;Step Over&#8221;, &#8220;Step Out&#8221;) and breakpoints. In addition, you need to be able to retrieve the stack frames (to display a call stack) and to evaluate Python expressions (to display and manipulate variables in the current stack frame and for conditional breakpoints).</p>
<p>A follow-up article will explain this in more detail.</p>
<h2>Closing Remarks</h2>
<p>Even though Python is awesome, some problems are best solved in C++. But even these C++ programs can be supercharged by adding some Python back in. Hopefully this talk inspires you to build something awesome with an embedded Python interpreter. If you do, I&#8217;d love to hear about it.</p>
<p><a name="licensing"></a><br />
<em>This article as well as the <a href="http://realmike.org/blog/wp-content/uploads/2012/07/unbranded_slides1.zip">slide sources in SVG format.</a> are Copyright 2012 Michael Fötsch, licensed under a <em><a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license">Creative Commons Attribution-ShareAlike 3.0 Unported License</a></em>. The <a href="http://realmike.org/blog/wp-content/uploads/2012/07/slides.pdf">slides in PDF format</a> contain additional material that is Copyright 2012 SPIELO International, All Rights Reserved.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2012/07/05/supercharging-c-code-with-embedded-python/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Python Training &#8211; Part 5</title>
		<link>http://realmike.org/blog/2012/06/07/python-training-part-5/</link>
		<comments>http://realmike.org/blog/2012/06/07/python-training-part-5/#comments</comments>
		<pubDate>Thu, 07 Jun 2012 19:42:22 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=598</guid>
		<description><![CDATA[Part 1 &#124; Part 2 &#124; Part 3 &#124; Part 4 This is part 5 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2012/06/07/python-training-part-5/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><a href="http://realmike.org/blog/2012/06/07/python-training-part-1/">Part 1</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-2/">Part 2</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-3/">Part 3</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-4/">Part 4</a></p>
<p><em>This is part 5 of a Python training that I gave while I was working at <a href="http://spielo.com/careers">SPIELO International</a>. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a <a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.</em></p>
<p><span id="more-598"></span></p>
<h1><a name="_Ref245645728"></a><a name="_Ref245645746"></a> Module Basics</h1>
<p>Python modules are a way to physically organize your source code. It would be impractical to have the entire source code in a single .py file. You can extract parts of your source code (functions, classes, constants, etc.) to their own .py files and use them in your main program.</p>
<table width="100%" cellspacing="0" cellpadding="7">
<tbody>
<tr valign="TOP">
<td><strong>Single program</strong></p>
<p>program.py:</p>
<pre>PI = 3.14

def PiMalDaumen(daumen):
   return daumen * PI

print PiMalDaumen(5) / PI</pre>
</td>
<td><strong>Program and module</strong></p>
<p>utility.py:</p>
<pre>PI = 3.14

def PiMalDaumen(daumen):
   return daumen * PI</pre>
<p>program.py:</p>
<pre>import utility
if __name__ == "__main__":
    print (utility.PiMalDaumen(5)
           / utility.PI)</pre>
</td>
</tr>
</tbody>
</table>
<p><strong>Note:</strong> Once a .py file has been imported, a <strong>.pyc</strong> file is created so that the module can be loaded faster the next time it is needed. Modules can also be extensions written in other languages such as C++. These extension modules are DLL files with the filename extension <strong>.pyd</strong> (or .dll in older Python versions).</p>
<h2><a name="__RefHeading__173_1765578431"></a>What’s the difference between a Python module and a Python program?</h2>
<p>Both Python programs and Python modules are just .py files. The difference is merely in usage. Apart from the different usage, the code in programs and modules is executed by the Python interpreter <strong>in the same way.</strong></p>
<ul>
<li>A <strong>program</strong> is a .py (or .pyw<sup><a name="sdfootnote1anc" href="#sdfootnote1sym"></a><sup>1</sup></sup>) file that you double-click or run with “python program.py”</li>
<li>A <strong>module</strong> is a .py file that you import in your main program or in other modules to reuse the objects (functions, classes, constants, etc.) that it contains</li>
</ul>
<p><a name="sdfootnote1sym" href="#sdfootnote1anc"></a>1 The extension .pyw is used if you don’t want a console window to be opened when you double-click the file. Alternatively, you can use the extension .py and run the program with “pythonw program.py” instead of “python program.py”.</p>
<p>To be useful, a program contains some code that is executed right away when the .py file is double-clicked. A module, on the other hand, usually does not do anything at the moment it is imported. It only provides objects that can be used by the importing module later on.</p>
<p>Inside a .py file, you can find out whether you are the main program or a module:</p>
<pre>def SomeFunc(a):
    print "To be used by others who import me or by myself."

if __name__ == "__main__":
    print "I was started as the main program."
    print "Do whatever I'm supposed to do."
else:
    print "I was imported as the module named", __name__</pre>
<p><strong>Note:</strong> You should <strong>always</strong> enclose code that should be executed when the main program is run in <span style="font-family: Courier New,monospace;">“if __name__ == &#8220;__main__&#8221;”</span>. This way, the code is not executed if you ever decide to import your main program as a module into a different program.</p>
<h2><a name="__RefHeading__175_1765578431"></a>Syntax of “import”</h2>
<p>The “import” keyword can be used in different ways. The following examples will import the module “mymodule.py”, which is shown here:</p>
<pre>PI = 3.14

def SomeFunc(obj):
    return 5

class SomeClass:
    def SomeMethod():
        return 10</pre>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;"><span style="font-family: Arial,sans-serif;"><span style="font-size: small;">There are three forms of import:</span></span></span></span></p>
<ul>
<li><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;"><strong>import mymodule: </strong></span></span>This is the most basic form. The name “mymodule” now refers to a module object whose attributes are the objects defined at the top level of the imported module:
<pre><strong>import mymodule</strong>
print <strong>mymodule</strong>.PI * <strong>mymodule</strong>.SomeFunc()
obj = <strong>mymodule</strong>.SomeClass()
print obj.SomeMethod()</pre>
</li>
</ul>
<ul>
<li><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;"><strong>import mymodule as m:</strong></span></span>This is similar to the first form, but it assigns a different name to the imported module within the importing module:
<pre><strong>import mymodule as m</strong>
print <strong>m</strong>.PI * <strong>m</strong>.SomeFunc()
obj = <strong>m</strong>.SomeClass()
print obj.SomeMethod()</pre>
<p><strong>Note:</strong> This is the same as this:</p>
<pre><strong>import mymodule</strong>
<strong>m = mymodule</strong>
...</pre>
</li>
</ul>
<ul>
<li><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;"><strong>from mymodule import SomeFunc:</strong></span></span>You can list the things you want to use from the imported module and use them without prefixing them with the module name:
<pre><strong>from mymodule import PI, SomeFunc</strong>
print PI * SomeFunc()</pre>
<p><strong>Note:</strong> This is the same as this:</p>
<pre><strong>import mymodule</strong><strong>PI = mymodule.PI</strong><strong>SomeFunc = mymodule.SomeFunc</strong>
...</pre>
<p><strong>Note:</strong> You could write <span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">“from mymodule import *”</span></span> to import everything from the given module at once. However, I<strong> strongly discourage </strong>you from doing<strong> </strong>this, because it makes the code <strong>less readable,</strong> and there’s a <strong>danger of conflicts</strong> between objects with the same names coming from different modules.</li>
</ul>
<h2>Where does Python look for .py files to import?</h2>
<p>When Python encounters code like <span style="font-family: Courier New,monospace;">“import mymodule”</span>, it looks for the module “mymodule” in these places:</p>
<ol>
<li>In the directory that contains the importing module</li>
<li>In all the directories that are contained in the list “sys.path” <strong>at the time the import is executed</strong> (execute <span style="font-family: Courier New,monospace;">“import sys; print sys.path”</span> to see what it contains)</li>
</ol>
<p><strong>Note:</strong> Python does <strong>not</strong> look in the current working directory unless the path “.” is explicitly contained in “sys.path”.</p>
<p>The list “sys.path” is filled in this way:</p>
<ul>
<li>From <strong>hard-coded</strong> paths in the Python interpreter, e.g, “C:\Python25\Lib”, “C:\Python25\Lib\site-packages”, etc.</li>
<li>From paths contained in the environment variable <strong>“PYTHONPATH”</strong> when Python was started</li>
<li>From paths listed in all files with the extension <strong>.pth</strong> found in “C:\Python25” when Python was started</li>
<li>Explicitly by your program by <strong>modifying “sys.path”,</strong>e.g.:
<pre>import sys
# Ensure that modules located in the current working
# directory take precedence over all other directories.
# Note: This refers to the cwd at the time the import will
# be executed, not necessarily the cwd at this very moment.
sys.path.insert(0, ".")

# Add a sub-directory of the current working directory.
# Use the absolute path via os.getcwd() so that it doesn't
# change when we change the cwd later, e.g. via os.chdir().
import os
sys.path.append(os.path.join(os.getcwd(), "subdir"))</pre>
<p><strong>Note:</strong> Fiddling with “sys.path” inside the program is sometimes necessary, but most often it is <strong>not</strong> the right way to do things. Maybe you should organize your modules in packages (see section “Packages”) or extend the search path with a .pth file?</li>
</ul>
<h2>Modules are loaded only once</h2>
<p>Each module is loaded only once. If two modules in the same program contain a line “import mymodule”, the module “mymodule” is loaded when the first one of them is executed. The second one receives a reference to the module that’s already loaded.</p>
<table width="100%" cellspacing="0" cellpadding="7">
<tbody>
<tr valign="TOP">
<td>mymodule.py:</p>
<pre>print "mymodule"
def X():
    return "X"</pre>
</td>
<td>program.py:</p>
<pre>print "program"
import mymodule
import utility
print "program:", mymodule.X()</pre>
</td>
<td>utility.py:</p>
<pre>print "utility"
import mymodule
print "utility:", mymodule.X()</pre>
</td>
</tr>
</tbody>
</table>
<p>When you execute program.py, the output is:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">program</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">mymodule</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">utility</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">utility: X</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">program: X</span></span></p>
<p><strong>Note:</strong> You can use “reload()” to force a module to be re-executed. This might be useful when you expect your modules to be modified while the program is being executed (when writing a debugger, for example), but shouldn’t be necessary otherwise.</p>
<h1><a name="__RefHeading__181_1765578431"></a>Packages</h1>
<p>Packages are a way to organize modules in directories.</p>
<table width="100%" cellspacing="0" cellpadding="7">
<tbody>
<tr valign="TOP">
<td><strong>Everything in a single directory</strong></p>
<pre>\
  main.py
  database_logic.py
  database_reports.py
  gui_window.py
  gui_button.py
  gui_backend_gtk.py
  gui_backend_win32.py
  gui_backend_osx.py</pre>
</td>
<td><strong>Grouped hierarchically</strong></p>
<pre>\
  main.py
  database\
    logic.py
    reports.py
  gui\
    window.py
    button.py 
    backends\
      gtk.py
      win32.py
      osx.py</pre>
</td>
</tr>
</tbody>
</table>
<p>If there were no packages, you could be tempted to add all sub-directories to Python’s module search path:</p>
<pre><span style="color: #ff0000;"><strong>Bad practice</strong></span>
import sys
sys.path += ["database", "gui", "gui\\backends"]
import logic
from window import Window
import gtk</pre>
<p>Packages provide a much cleaner way:</p>
<pre># Import the module database\logic.py
import <strong>database.logic</strong>

# Import "Window" from the module gui\window.py
from <strong>gui.window</strong> import Window

# Import the module gui\backends\gtk.py and assign a shorter name
import <strong>gui.backends.gtk</strong> as backend</pre>
<p>To make this work, all you need to do is place an empty file named “__init__.py” in each directory that should be treated as a package:</p>
<pre>\
  main.py
  <strong>database\</strong>
    <strong>__init__.py</strong>
    logic.py
    reports.py
  <strong>gui\</strong>
    <strong>__init__.py</strong>
    window.py
    button.py
    <strong>backends\</strong>
      <strong>__init__.py</strong>
      gtk.py
      win32.py
      osx.py</pre>
<h2>Intra-Package Imports</h2>
<p>Consider this package hierarchy:</p>
<pre>myapp\
  __init__.py
  database\
    __init__.py
    logic.py
    reports.py
  gui\
    __init__.py
    window.py
    button.py
    backends\
      __init__.py
      gtk.py
      win32.py</pre>
<p>A module inside a package can import modules inside a sub-package normally:</p>
<pre># From   myapp\gui\window.py
# Import myapp\gui\backends\gtk.py
import backends.gtk</pre>
<p>A module inside a sub-package can also import modules from other parts of the package hierarchy using <strong>absolute imports:</strong></p>
<pre># From   myapp\gui\backends\gtk.py
# Import myapp\database\reports.py
import myapp.database.reports</pre>
<p style="padding-left: 30px;"><strong>Note:</strong> For this to work, the top-level package “myapp” must be in the module search path (see “sys.path” in section “Module Basics”). If Python can’t find “myapp”, you will get the error “ImportError: No module named myapp”.</p>
<p><strong>Use with care:</strong></p>
<p style="padding-left: 30px;">In Python 2.5 and later, you can also use <strong>relative imports</strong> (“.” refers to the current package, “..” to the parent package, “…” to the grand-parent package, etc.):</p>
<pre style="padding-left: 30px;"># From   myapp\gui\backends\gtk.py
# Import myapp\gui\button.py
from .. import button

# Import myapp\database\reports.py
from ...database import reports</pre>
<p style="padding-left: 30px;">There are many subtleties involving relative imports. They are <strong>not</strong> just a straight translation from filesystem paths to import syntax. For example, this code only works when gtk.py was itself imported from somewhere outside the package using something like <span style="font-family: Courier New,monospace;">“import myapp.gui.backends.gtk”</span>, not when you run “python gtk.py”.</p>
<h1>Modules and Reflection</h1>
<p><a href="http://en.wikipedia.org/wiki/Reflection_%28computer_science%29">Wikipedia</a> says, “reflection is the process by which a computer program can observe and modify its own structure and behavior.” Reflection is a useful and powerful tool and can be used with modules just like with any other object.</p>
<h2><a name="__RefHeading__187_1765578431"></a>Import Modules with Names Determined at Runtime</h2>
<p>You can use the <strong>“__import__()” </strong>function to load a module whose name is only known at runtime. This does not work:</p>
<pre><span style="color: #ff0000;"><strong>Does not work</strong></span>
# We basically want to
#    import mymodule
# but the name "mymodule" is stored in a string variable
module_name = "mymodule"
import module_name as m   # Nope. This actually looks for a file
                          # named "module_name.py".
m.FuncInsideModule()</pre>
<p>This works:</p>
<pre>module_name = "mymodule"
m = <strong>__import__(module_name)</strong>
m.FuncInsideModule()</pre>
<p>The previous example isn’t particularly useful yet, so here’s a <strong>real-world example</strong> where “__import__()” can be put to good use.</p>
<p>Consider a program that the user can extend by providing plug-in modules. The user is expected to place the .py files inside the “plugins” package and the program scans the directory at startup and loads the modules.</p>
<pre>\
  program.py
  plugins\
    __init__.py
    colorize.py
    sort.py
    filter.py</pre>
<p>If we wanted to hard-code the plug-in names, we could write something like this:</p>
<pre><span style="color: #ff0000;"><strong>Bad because everything’s hard-coded</strong></span>
# program.py
import plugins.colorize
import plugins.sort
import plugins.filter

plugin_modules = [plugins.colorize,
                  plugins.sort,
                  plugins.filter]
...
for m in plugin_modules:
    data = m.ApplyPlugin(data)</pre>
<p>But having to add “import” statements to the program manually is tedious. We can do better by using “os.listdir()” to scan the directory and the <strong>“__import__()”</strong> function to import the plugins:</p>
<pre># program.py

import os

if __name__ == "__main__":
    plugin_files = os.listdir("plugins")
    plugin_modules = []
    for fn in plugin_files:
        if fn.endswith(".py") and fn != "__init__.py":
            module_name = os.path.splitext(fn)[0]
            import_name = <strong>"plugins.%s" % module_name</strong>
            plugin_modules.append(
                # basically do "from plugins import "
                <strong>__import__(import_name, fromlist=[module_name]))</strong>
    ...
    for m in plugin_modules:
        data = m.ApplyPlugin(data)</pre>
<h2><a name="_Ref245450142"></a><a name="_Ref245450211"></a><a name="_Ref245450496"></a> Inspecting Module Contents</h2>
<p>Module objects provide several built-in attributes:</p>
<ul>
<li><strong>__name__:</strong> The module name, as specified in the “import” statement</li>
<li><strong>__file__:</strong> The path to the .py file<sup><a name="sdfootnote1anc" href="#sdfootnote1sym"></a><sup>1</sup></sup>from which the module was loaded.
<ul>
<li><strong>Be careful!</strong> This might be a path relative to the current working directory at the time when the import was executed. The current working directory might have changed since then.</li>
</ul>
</li>
<li><strong>__dict__:</strong> A dict containing all objects (functions, classes, variables, etc.) that the module contains</li>
<li><strong>__doc__:</strong> The<strong> </strong>docstring of the module</li>
</ul>
<p><a name="sdfootnote1sym" href="#sdfootnote1anc"></a>1 The file could also be a .pyc or .pyd file, or whatever the module was loaded from.</p>
<p>The following source code prints some information about the “os” module:</p>
<pre>import os
print "Module os loaded from", os.__file__
print "Docstring:", os.__doc__
print "Contains the following objects:"
for name, obj in os.__dict__.iteritems():
    print name, ":", type(obj)</pre>
<p>This will print something like this:</p>
<pre>Module os loaded from c:\python25\lib\os.pyc
Docstring: OS routines for Mac, NT, or Posix depending on what system we're on.

This exports:
  - all functions from posix, nt, os2, mac, or ce, e.g. unlink, stat, etc.
...

Contains the following objects:
lseek : &lt;type 'builtin_function_or_method'&gt;
O_SEQUENTIAL : &lt;type 'int'&gt;
pathsep : &lt;type 'str'&gt;
execle : &lt;type 'function'&gt;
_Environ : &lt;type 'classobj'&gt;
urandom : &lt;type 'builtin_function_or_method'&gt;
execlp : &lt;type 'function'&gt;
...</pre>
<p>You can also use <strong>dir(),</strong> <strong>getattr(), hasattr(),</strong> and <strong>setattr()</strong> to access the module’s contents just like with other objects.</p>
<p>As a <strong>real-world example,</strong> consider a program that runs test cases contained inside a user-provided Python module. A test case is any function whose name starts with “Test_”.</p>
<pre># test_runner.py
# Start with:
#    python test_runner.py 
import sys
if __name__ == "__main__":
    module_name = sys.argv[1]
    test_suite = <strong>__import__(module_name)</strong>
    print "Running", test_suite.__doc__
    if <strong>hasattr(test_suite, "InitTests")</strong>:
        init_func = <strong>getattr(test_suite, "InitTests")</strong>
        init_func()
    for obj_name in <strong>dir(test_suite)</strong>:
        if obj_name.startswith("Test_"):
            test_func = <strong>getattr(test_suite, obj_name)</strong>
            print "Performing", test_func.__name__, "...",
            print test_func()</pre>
<p>This is an example test suite:</p>
<pre># module_to_test.py

"My test suite"

def InitTests():
    print "Initializing..."

def Test_Case1():
    return 5 * 5 == 25

def Test_Case2():
    return -1 ** 0 == 0</pre>
<p>This is the output of the program:</p>
<pre>&gt; python test_runner.py module_to_test
Testing Various test cases
Initializing...
Performing Test_Case1 ... True
Performing Test_Case2 ... False</pre>
<h1>Memory Leaks</h1>
<p>In C++, a memory leak typically occurs because your program <strong>forgets</strong> about a chunk of memory that it reserved. The memory is never freed although your program doesn’t make any use of it. If this happens too often, memory usage of the program might reach critical levels.</p>
<p>In Python, a memory leak occurs because your program <strong>keeps references</strong> to unneeded objects.</p>
<p>Python uses <strong>reference counting</strong> and a <strong>garbage collector</strong> to prevent most types of memory leaks:</p>
<pre>x = [1, 2, 3]
d = {4: x}
y = (x, d)
# Here, the list [1, 2, 3] has ref count 3.
d.clear()
# Now it's down to 2.
x = 0
# Now the tuple y has the only reference left.
del y
# The list [1, 2, 3] has ref count 0. The garbage collector
# can free its associated memory at any time now.</pre>
<p>In order to produce a memory leak in Python (or rather, to cause undesired memory consumption), you have to accumulate references to objects that you wouldn’t otherwise need. Here’s a hypothetical and somewhat trivial example:</p>
<pre>class File:
    def __init__(self, filename):
        self.m_filename = filename
        self.m_file_contents = open(filename).read()

processed_files = []
for fn in filename_list:
    f = File(fn)
    DoSomeImportantProcessing(f)
    processed_files.append(f)

print "Processed files:", [f.m_filename for f in processed_files]</pre>
<p>Here, the list “processed_files” keeps references to “File” objects, although only the filenames of the objects will be needed after the loop. However, the “File” objects contain references to the file data. The peak memory usage of the program is the total size of all processed files. Of course, it would be more efficient to just store “processed_filenames”, like this:</p>
<pre><strong>processed_filenames = []</strong>
for fn in filename_list:
    f = File(fn)
    DoSomeImportantProcessing(f)
    <strong>processed_filenames.append(fn)</strong>

print "Processed files:", processed_filenames</pre>
<p>The previous example isn’t a real memory leak. It’s just a case of undesired memory consumption that might not have been as obvious if the program was larger and more contrived.</p>
<p>Python <strong>can</strong> suffer from real memory leaks, though:</p>
<ul>
<li>Memory leaks in C/C++ extension libraries (either caused by bugs in the libraries or by incorrect usage)</li>
<li>Reference cycles involving objects with overloaded “__del__()” methods</li>
</ul>
<p>Here’s an example of a reference cycle:</p>
<pre><span style="color: #ff0000;"><strong>Not a memory leak</strong></span>
class X():
    pass

x = X()
y = X()
z = X()
x.next = y
y.next = z
z.next = x
del x
del y
del z</pre>
<p>In this case, although there’s a reference cycle, the Python garbage collector is able to break the cycle and all the memory is freed as expected.</p>
<pre><span style="color: #ff0000;"><strong>Memory leak</strong></span>
class X():
    def __del__(self):
        pass

x = X()
y = X()
z = X()
x.next = y
y.next = z
z.next = x
del x
del y
del z
import gc
gc.collect() # let the garbage collector do its work right now
print gc.garbage
[&lt;__main__.X instance&gt;, &lt;__main__.X instance&gt;,
 &lt;__main__.X instance&gt;]</pre>
<p>The objects “x”, “y”, and “z” are not garbage-collected. Python cannot decide which object to delete first, because it doesn’t know whether our implementation of “__del__()” relies on a specific order. Therefore, the memory of the three “X” objects cannot be freed. You can break the cycle manually, as described in the Python docs.</p>
<p>This problem affects your code only if you have cyclic references among your objects and the involved classes implement “__del__()” methods.</p>
<p><strong>Further information:</strong></p>
<ul>
<li><a href="http://docs.python.org/library/gc.html#gc.garbage">http://docs.python.org/library/gc.html#gc.garbage</a></li>
<li><a href="http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks">http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks</a></li>
<li><a href="http://mg.pov.lt/blog/python-object-graphs.html">http://mg.pov.lt/blog/python-object-graphs.html</a> (describes a tool to debug memory leaks)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2012/06/07/python-training-part-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Training &#8211; Part 4</title>
		<link>http://realmike.org/blog/2012/06/07/python-training-part-4/</link>
		<comments>http://realmike.org/blog/2012/06/07/python-training-part-4/#comments</comments>
		<pubDate>Thu, 07 Jun 2012 18:54:43 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=588</guid>
		<description><![CDATA[Part 1 &#124; Part 2 &#124; Part 3 &#124;&#124; Part 5 This is part 4 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2012/06/07/python-training-part-4/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><a href="http://realmike.org/blog/2012/06/07/python-training-part-1/">Part 1</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-2/">Part 2</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-3/">Part 3</a> || <a href="http://realmike.org/blog/2012/06/07/python-training-part-5/">Part 5</a></p>
<p><em>This is part 4 of a Python training that I gave while I was working at <a href="http://spielo.com/careers">SPIELO International</a>. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a <a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.</em></p>
<p><span id="more-588"></span></p>
<h1>More Common Scripting Tasks</h1>
<h2><a name="__RefHeading__5_1765578431"></a>Parsing Command-Line Arguments</h2>
<p>The command-line arguments to the Python program can be found in the “sys.argv” variable.</p>
<p>When you start a Python program with this command line:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">C:\&gt; temp\cmd_line.py -x -o &#8220;the output.txt&#8221; input.txt</span></span></p>
<p>The contents of “sys.argv” are as follows:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">['C:\\temp\\cmd_line.py', '-x', '-o', 'the output.txt', 'input.txt']</span></span></p>
<p>The “optparse” module provides more convenient access to the command-line arguments:</p>
<ul>
<li>You can specify a list of supported arguments along with their data types</li>
<li>You can specify default values for omitted arguments</li>
<li>The usage screen (“&#8211;help&#8221; or “-h”) is generated automatically</li>
<li>…and much more</li>
</ul>
<p>As an example, let’s use “optparse” to interpret the arguments of this GNU program:</p>
<pre>&gt; head -h
Usage: head [OPTION]... [FILE]...
Print first 10 lines of each FILE to standard output.
With more than one FILE, precede each with a header giving the file name.
With no FILE, or when FILE is -, read standard input.

  -c, --bytes=SIZE         print first SIZE bytes
  -n, --lines=NUMBER       print first NUMBER lines instead of first 10
  -q, --quiet, --silent    never print headers giving file names
  -v, --verbose            always print headers giving file names
      --help               display this help and exit
      --version            output version information and exit

...

Report bugs to &lt;bug-textutils@gnu.org&gt;.</pre>
<p>What we see:</p>
<ul>
<li>The program accepts a list of options (using a “-“ or “&#8211;&#8221; prefix) and arguments (the list of files)</li>
<li>There are options with and without parameters (“-n” takes the number of lines, while “-v” does not require a parameter)</li>
<li>Some options have default values.</li>
</ul>
<p>Here’s a Python program with a command-line interface like that:</p>
<pre>import optparse
import sys

def Main():
    parser = <strong>optparse.OptionParser</strong>(
        usage="Usage: %prog [OPTION]... [FILE]...",
        version="Version 1.0.\nWritten by me.",
        description="Print first 10 lines of each FILE ...",
        epilog="... Report bugs to &lt;bug-textutils@gnu.org&gt;.")
    parser.<strong>add_option</strong>("-c", "--bytes", <strong>type="int"</strong>,
                      metavar="SIZE",    # Here, the default
                                         # metavar would be "BYTES"
                      help="print first SIZE bytes")
    parser.add_option("-n", "--lines", type="int",
                      metavar="NUMBER", dest="num_lines",
                      help="print first NUMBER lines instead of first 10")
    parser.add_option("-q", "--quiet", "--silent",
                      <strong>action="store_true"</strong>,
                      help="never print headers giving file names")
    parser.add_option("-v", "--verbose", action="store_true",
                      help="always print headers giving file names")

    <strong>parser.set_defaults</strong>(num_lines=10, quiet=False, verbose=False)
    <strong>options, args = parser.parse_args()</strong>

    print "-c =", options.bytes
    print "-n =", options.num_lines
    print "-q =", options.quiet
    print "-v =", options.verbose
    print "args =", args

if __name__ == "__main__":
    sys.exit(Main())</pre>
<p>How it works:</p>
<ul>
<li>To define the command-line interface, create an “optparse.OptionParser” instance and invoke its methods.</li>
<li>The “OptionParser” initializer takes a number of optional keyword arguments for things like version info and help text.</li>
<li>You add options using the “add_option” method:
<ul>
<li>The first arguments specify the short and long option strings.</li>
<li>Additional keyword arguments specify things like data type and help string.</li>
</ul>
</li>
<li>Default values are best set using the “set_defaults” method.</li>
<li>The “parse_args” method returns an object that contains the option values as attributes and a list of positional arguments.</li>
</ul>
<p>Try to invoke the program using the following command lines:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&gt; cmd_line.py -h</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&gt; cmd_line.py somefile</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&gt; cmd_line.py –n 12 somefile</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&gt; cmd_line.py –n this_is_not_a_string</span></span></p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&gt; cmd_line.py &#8211;unknown-option</span></span></p>
<h2><a name="__RefHeading__7_1765578431"></a>Reading from INI Files</h2>
<p>The “ConfigParser” module can be used to read and write INI files.</p>
<p>Here’s an example INI file:</p>
<pre>; Example INI file
<strong>[Basic]</strong>
<strong>quiet</strong>=True
<strong>lines</strong>=10
<strong>multiline</strong>=This is
 a multi-line value

<strong>[Files]</strong>
<strong>dir</strong>=c:\temp
# %(dir)s will be replaced with the value of dir
<strong>input</strong>=%(dir)s\input.txt</pre>
<p>This INI file can be read as follows:</p>
<pre><strong>import ConfigParser</strong>

if __name__ == "__main__":
    <strong>p = ConfigParser.SafeConfigParser()</strong>
    <strong>p.read(["config.ini"])</strong>
        # read() can load several INI files at once.
    print "multiline =", <strong>p.get</strong>("Basic", "multiline")
    print "quiet =", <strong>p.getboolean</strong>("Basic", "quiet")
    print "lines =", <strong>p.getint</strong>("Basic", "lines")
    print "input =", p.get("Files", "dir")
    print "input =", p.get("Files", "input")
    try:
        p.get("Basic", "non-existant")
    <strong>except (ConfigParser.NoSectionError,</strong>
            <strong>ConfigParser.NoOptionError), e:</strong>
        print e
    print "Sections:", <strong>p.sections()</strong>
    print "Items in Basic:", <strong>p.items</strong>("Basic")</pre>
<h2>Creating and Reading ZIP Files</h2>
<p>Use the “zipfile” module to work with ZIP files.</p>
<p><strong>Create a new archive:</strong></p>
<p>The following code creates a new archive with two files:</p>
<ul>
<li>One file is read from a file on disk and stored under a different name in the archive.</li>
<li>The other file is constructed directly from a Python string (which could also contain binary data).</li>
</ul>
<pre>import zipfile

z = zipfile.ZipFile("new_archive.zip", "w",
                    compression=zipfile.ZIP_DEFLATED)
z.write("file.txt",             # This file on disk...
        "subdir/t.txt")         # ...is added under this name.
z.writestr("text.txt",          # Name in the archive
           "Specify the contents of the file directly")
z.close()</pre>
<p><strong>Add files to an existing archive:</strong></p>
<p>To add files to an existing archive, specify mode “a” when opening the file:</p>
<pre>z = zipfile.ZipFile("existing_archive.zip", "a")
z.write(...</pre>
<p><strong>Read an existing archive:</strong></p>
<p>Using the “infolist” method, you can retrieve a list of “ZipInfo” objects for each file in the archive. Using the “read” method, you can retrieve the byte stream of a file as a Python string:</p>
<pre>z = zipfile.ZipFile("new_archive.zip", "r")
<strong>for i in z.infolist():</strong>
    print i.filename, i.compress_size, i.file_size, "etc."
    print "File contents:", <strong>z.read</strong>(i.filename)
z.close()</pre>
<p><strong>Note:</strong> See also the modules “tarfile”, “gzip”, “bz2”, and “zlib” for other ways of creating archives and compressing data.</p>
<h1><a name="__RefHeading__11_1765578431"></a>Interfacing with C++</h1>
<p>There are many levels on which you can use Python and C++ (or other programming languages) together:</p>
<ul>
<li>Interpret binary data (potentially produced by C++)</li>
<li>Invoke functions in a C++ DLL from Python</li>
<li>Write Python extension modules in C++ (advanced)</li>
<li>Embed the Python interpreter in C++ to offer scripting facilities (advanced)</li>
</ul>
<h2><a name="__RefHeading__13_1765578431"></a>Working with Binary Data</h2>
<p>Let’s assume you have a C++ program that writes the following struct to a binary file:</p>
<pre>struct TestStructure
{
    unsigned char ByteMember;
    signed short ShortMember;
    char StringBuffer[11];
    unsigned long LongMember;
};

void PrintTestStructToFile(const char* filename)
{
    TestStructure t;
    t.ByteMember = 1;
    t.ShortMember = -1;
    strcpy(t.StringBuffer, "abcdefg");
    t.LongMember = 0xcafebabe;

    FILE* f = fopen(filename, "w");
    fwrite(&amp;t, sizeof(t), 1, f);
    fclose(f);
}</pre>
<p>When you open the file in binary mode, you might get a string like this:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&#8216;\x01\xcc\xff\xffabcdefg\x00\xcc\xcc\xcc\xcc\xbe\xba\xfe\xca&#8217;</span></span></p>
<p>We want something else.</p>
<p>First, use the “ctypes” module and re-define the struct in Python:</p>
<pre>import ctypes
<strong>class TestStructure(ctypes.Structure):</strong>
    <strong>_fields_</strong> = [("ByteMember", ctypes.c_ubyte),
                ("ShortMember", ctypes.c_short),
                ("StringBuffer", <strong>ctypes.c_char * 11</strong>),
                ("LongMember", ctypes.c_ulong)]</pre>
<p>Next, we can read the file into a “ctypes” byte buffer and “cast” it to the struct type:</p>
<pre><strong>data = ctypes.create_string_buffer(</strong>
    open(filename, "rb").read())

<strong>struct = TestStructure.from_address(ctypes.addressof(data))</strong>
    # Of course, we can also create an uninitialized instance
    # by writing "struct = TestStructure()".
print "ByteMember", struct.ByteMember
print "ShortMember", struct.ShortMember
print "StringBuffer", struct.StringBuffer
print "LongMember", hex(struct.LongMember)

# When initializing the struct from a pointer to a data buffer,
# the buffer must live at least as long as we use the struct.
# Therefore, store a reference to "data" right in "struct".
struct.data = data</pre>
<p>We can also save the data back to a binary file from Python:</p>
<pre>open(filename, "wb").write(
    ctypes.string_at(ctypes.addressof(struct),
                     ctypes.sizeof(struct)))</pre>
<p>See the help for the “ctypes” module for more information.</p>
<p><strong>Note:</strong> You can also use the “struct” module for working with binary data.</p>
<h2><a name="__RefHeading__15_1765578431"></a>Invoking Functions in a C++ DLL</h2>
<p>Let’s assume we have a DLL that exports the following function:</p>
<pre>extern "C" __declspec(dllexport)
const char* __stdcall WorkWithFile(
    const char* filename)
{
    printf("Doing something with %s\n", filename);
    return "It worked!";
}</pre>
<p>We can invoke it from Python like this:</p>
<pre>dll = ctypes.WinDLL("cpp_code.dll")
dll.WorkWithFile.restype = ctypes.c_char_p
print dll.WorkWithFile("some_file.txt")</pre>
<p>Things to note:</p>
<ul>
<li>We’re using “ctypes.WinDLL”, because we want to call a “__stdcall” function. (We’d use “ctypes.CDLL” for “__cdecl” functions.)</li>
<li>By default, “ctypes” assumes that the return type of the function is an integer. By setting the “restype” attribute of the function wrapper, we can specify the real return type.</li>
</ul>
<p><strong>What if you want to use C++ classes exported from a DLL?</strong></p>
<p>You should compile the C++ code as a Python extension module. See the next section.</p>
<p>(Exporting classes directly from a DLL is generally not a good idea. This approach is not portable across different compilers, or even different versions of the same compiler. Everything you export should be declared as <span style="font-family: Courier New,monospace;">“extern &#8220;C&#8221;”</span> to avoid problems with name mangling.)</p>
<h2><a name="__RefHeading__17_1765578431"></a>Extending and Embedding Python</h2>
<p>Many of the modules in the standard Python library are actually extension modules written in C.</p>
<p>Wrapping up some C++ code as an extension module is a task that can be largely automated using the SWIG program. See my article “<a href="http://realmike.org/blog/2010/07/18/python-extensions-in-cpp-using-swig/">Python Extensions in C++ Using SWIG</a>”.</p>
<p>It is also possible to embed the Python interpreter in a C++ program. This is especially useful if you want to provide a simple way for users to write plug-ins for your program, or to provide a built-in scripting language (like VBA for Microsoft Office).</p>
<p>In the mathematics department, we’ve been using the PyCXX C++ library successfully for this task. See <a href="http://cxx.sourceforge.net/">http://cxx.sourceforge.net/</a>.</p>
<h1><a name="__RefHeading__19_1765578431"></a>Introduction to GUI Programming</h1>
<div id="attachment_594" class="wp-caption alignright" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/wxpython_demo.png"><img class="size-medium wp-image-594" title="wxPython Demo" src="http://realmike.org/blog/wp-content/uploads/2012/06/wxpython_demo-300x273.png" alt="" width="300" height="273" /></a><p class="wp-caption-text">wxPython Demo</p></div>
<p>There are many GUI toolkits available for Python. In the mathematics department, we’re using wxPython exclusively (<a href="http://www.wxpython.org/">http://www.wxpython.org/</a>), which is a binding for the cross-platform wxWidgets library (<a href="http://www.wxwidgets.org/">http://www.wxwidgets.org/</a>). wxPython allows us to create complex, state-of-the-art GUIs relatively easily (HOMER being the most recent example).</p>
<p>Once you have installed wxPython to your local Python installation, you should take a look at the wxPython Demo (usually in Start → Programs →wxPython2.8 Docs Demos and Tools → Run the wxPython DEMO).</p>
<p>In the next few sections, we’ll use wxPython to build some very simple GUIs to make your scripts easier to use for people who don’t know what the command line is. <img src='http://realmike.org/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<h2><a name="__RefHeading__21_1765578431"></a>Dialog Boxes in Command-Line Programs</h2>
<p>Sometimes you have a command-line program and just want to ask the user for a filename, pick an item from a list, enter a string, or whatever. This can be done easily.</p>
<p>Add this code to your program:</p>
<pre>import wx
import wx.lib.dialogs as dlg
g_app = wx.PySimpleApp()    # This object must be alive as long as
    # you want to open dialogs. Without an app object, the program
    # will crash.</pre>
<p><strong>Usability note:</strong> Consider adding a command-line or INI file-based interface to your program in addition to the GUI. This way, the program can be run unattended from a script, without having to make the same choices manually each time the program is run.</p>
<h3>Displaying Messages</h3>
<p>To display a simple message box with an OK button, use this code:</p>
<pre>dlg.<strong>messageDialog</strong>(message="Message", title="Title", aStyle=wx.OK)</pre>
<p>You can also display other buttons or add an icon:</p>
<pre>result = dlg.<strong>messageDialog</strong>(
    message="Are you sure?", title="The Tool",
    aStyle=wx.YES | wx.NO | wx.ICON_WARNING)
if result.returned == wx.ID_YES:
    print "As you wish, Master!"</pre>
<p>To display a longer text and/or to allow the user to copy the text to the Clipboard, use this:</p>
<pre>dlg.<strong>scrolledMessageDialog</strong>(
    message="This is some long text\n" * 100, title="The Tool")</pre>
<h3>Asking for Some Text</h3>
<p>Example usage:</p>
<pre>result = dlg.<strong>textEntryDialog</strong>(title="Enter something",
                             message="Right here",
                             defaultText="default")
if result.accepted:
    print "You entered", result.text
else:
    print "Cancelled"</pre>
<h3>Asking for Files and Directories</h3>
<p>Asking for files to open:</p>
<pre>result = dlg.<strong>openFileDialog</strong>(title='Open',
    directory='c:\\temp', filename='x.txt',
    wildcard='Text Files (*.txt)|*.txt',
    style=wx.OPEN | wx.MULTIPLE)
if result.accepted:
    print "You selected", result.paths</pre>
<p>Asking for a file to save:</p>
<pre>result = dlg.<strong>saveFileDialog</strong>(title='Save',
    directory='c:\\temp', filename='x.txt',
    wildcard='Text Files (*.txt)|*.txt',
    style=wx.SAVE | wx.OVERWRITE_PROMPT)
if result.accepted:
    print "You selected", result.paths</pre>
<p>Asking for a directory:</p>
<pre>result = dlg.dirDialog(message='Choose a directory',
                       path='c:\\temp')
if result.accepted:
    print "You selected", result.path</pre>
<h3>Offering Multiple Choices</h3>
<p>To allow the user to pick a single item:</p>
<pre>result = dlg.<strong>singleChoiceDialog</strong>(message='Choose wisely',
                                title='The Tool',
                                lst=['Blue pill', 'Red pill'])
if result.accepted:
    print "Your choice:", result.selection</pre>
<p>To allow the user to pick multiple items:</p>
<pre>result = dlg.<strong>multipleChoiceDialog</strong>(message='Choose',
    title='The Tool', lst=['Cheese', 'Ham', 'Mushrooms'])
if result.accepted:
    print "Your choice:", result.selectio</pre>
<h2>A Minimal GUI Application</h2>
<p>Here’s the code to open a main window:</p>
<pre>import wx

class MainFrame(wx.Frame):
    pass

<strong>class App(wx.App):</strong>
    def OnInit(self):
        frame = MainFrame(parent=None,
                          title="The GUI")
        frame.Show()
        return True

if __name__ == "__main__":
    <strong>app = App(redirect=False)</strong>
    <strong>app.MainLoop()</strong></pre>
<p>What we see:</p>
<ul>
<li>To create a GUI application, derive a class from “wx.App”, instantiate it, and call its “MainLoop” method.</li>
<li>In the “OnInit” method, the App object creates the main frame.</li>
<li>We create the App object with “redirect=False”. This means that all messages (including the output of “print” statements) will be printed to the console. This is useful during debugging when you start the program from the console. When you set “redirect=True”, wxPython creates a separate log window to display messages.</li>
</ul>
<h2><a name="__RefHeading__25_1765578431"></a>Adding Widget Inspector and an Interactive Shell</h2>
<p>During development, you’d often wish to peek into the program while it’s running, just like you could in an interactive Python shell. This can be done easily with the “InspectionTool” that wxPython provides.</p>
<p>Let’s add the code to display a button and to open the “InspectionTool” when you click the button:</p>
<pre>from wx.lib.inspection import InspectionTool

class MainFrame(wx.Frame):
    def __init__(self, parent, title):
        wx.Frame.__init__(self, parent=parent, title=title)

        <strong>inspector_btn = wx.Button(self, -1, "Widget Inspector")</strong>
        <strong>self.Bind(wx.EVT_BUTTON, self.OnOpenWidgetInspector,</strong>
                  <strong>inspector_btn)</strong>

        self.m_hello = "World"

    def OnOpenWidgetInspector(self, evt):
        if not InspectionTool().initialized:
            InspectionTool().Init()
        InspectionTool().Show(self, True)</pre>
<div id="attachment_589" class="wp-caption alignright" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/widget_inspector.png"><img class="size-medium wp-image-589" title="wxPython Widget Inspector" src="http://realmike.org/blog/wp-content/uploads/2012/06/widget_inspector-300x244.png" alt="" width="300" height="244" /></a><p class="wp-caption-text">wxPython Widget Inspector</p></div>
<p>What we see:</p>
<ul>
<li>To create a button, create a “wx.Button” object. The first parameter to the initializer is the parent window. The second one is a unique ID (used to distinguish widgets in event handlers). We don’t need an ID, so we just set it to -1. The third parameter is the button label. For the other optional parameters, please see the wxPython Docs.</li>
<li>To register an event handler that should be called when the button is pressed, use the “Bind” method with the ID of the event and the method to invoke.</li>
<li>The event handler method takes a “wxEvent” objects as its only parameter. We don’t need the event object in our case.</li>
</ul>
<p>When you run the application and click the button, the Widget Inspector opens. You can browse the GUI widgets that you created and use the interactive shell to work with the objects.</p>
<h2>Working with Sizers</h2>
<div id="attachment_590" class="wp-caption alignright" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/gui_screen.png"><img class="size-medium wp-image-590" title="Example GUI" src="http://realmike.org/blog/wp-content/uploads/2012/06/gui_screen-300x211.png" alt="" width="300" height="211" /></a><p class="wp-caption-text">Example GUI</p></div>
<p>Sizers are used in wxPython to calculate the layout of widgets. Sizers perform the following tasks automatically:</p>
<ul>
<li>Arrange widgets horizontally, vertically, in a grid, etc.</li>
<li>Resize widgets when the parent window is resized</li>
<li>Adjust the size of the parent to the space requirements of the children</li>
</ul>
<p>As an example for working with sizers, let’s build a GUI with two buttons and a text box. The buttons should be right-aligned and have a nice border around them. The text box should consume the remaining free space. When the frame is resized, the layout should adapt.</p>
<p>The screenshot to the right shows the desired result.</p>
<p>For this layout, we use two sizers, a vertical one with two compartments and a horizontal one with three:</p>
<div id="attachment_591" class="wp-caption aligncenter" style="width: 658px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/gui_sizers.png"><img class="size-full wp-image-591" title="GUI Sizers" src="http://realmike.org/blog/wp-content/uploads/2012/06/gui_sizers.png" alt="" width="648" height="384" /></a><p class="wp-caption-text">GUI Sizers</p></div>
<p>First, we create the widgets normally:</p>
<pre>class MainFrame(wx.Frame):
    def __init__(self, parent, title):
        wx.Frame.__init__(self, parent=parent, title=title)
        self.SetBackgroundColour(
            wx.SystemSettings_GetColour(wx.SYS_COLOUR_BTNFACE))

        <strong>inspector_btn</strong> = wx.Button(self, -1, "Widget Inspector")
        self.Bind(wx.EVT_BUTTON, self.OnOpenWidgetInspector,
                  inspector_btn)

        <strong>quit_btn</strong> = wx.Button(self, -1, "Quit")

        <strong>text_box</strong> = wx.TextCtrl(self, -1, style=wx.TE_MULTILINE,
                               size=(500, 300))</pre>
<p>Next, we create a horizontal sizer for the buttons:</p>
<pre>horz_sizer = wx.BoxSizer(wx.HORIZONTAL)
        horz_sizer.AddStretchSpacer(prop=1)
        horz_sizer.Add(inspector_btn, proportion=0,
                       flag=wx.RIGHT, border=4)
        horz_sizer.Add(quit_btn, proportion=0)</pre>
<p>What we see:</p>
<ul>
<li>To right-align the buttons, we add a “stretch spacer” first. The argument “prop=1” defines the proportion. This will be explained shorty.</li>
<li>Next, we add the “Widget Inspector” button and add 4 pixels of free space to its right. The argument “proportion=0” will be explained shortly.</li>
<li>Finally, we add the “Quit” button.</li>
</ul>
<p><strong>What’s the thing about “proportion”?</strong></p>
<p>When you add several widgets to a sizer, the proportion is used to define the percentage of the space that each widget should take up. For example, when you add three widgets with proportions 5, 4, and 7, this is the space they’ll take up:</p>
<div id="attachment_592" class="wp-caption aligncenter" style="width: 350px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/sizer_proportion.png"><img class="size-full wp-image-592 " title="Sizer Proportions" src="http://realmike.org/blog/wp-content/uploads/2012/06/sizer_proportion.png" alt="" width="340" height="132" /></a><p class="wp-caption-text">Sizer Proportions</p></div>
<pre>horz_sizer = wx.Sizer(wx.HORIZONTAL)
horz_sizer.Add(btn_1,
               proportion=5)
horz_sizer.Add(btn_2,
               proportion=4)
horz_sizer.Add(btn_3,
               proportion=7)</pre>
<p>Proportion 0 means “use the minimum required space for the widget.”</p>
<p>Next, we create a vertical sizer, add the horizontal sizer with the buttons and the text box, and set the sizer to the frame:</p>
<pre>vert_sizer = wx.BoxSizer(wx.VERTICAL)
        vert_sizer.Add(horz_sizer, proportion=0,
                       flag=wx.EXPAND | wx.ALL, border=4)
        vert_sizer.Add(text_box, proportion=1, flag=wx.EXPAND)

        self.SetSizer(vert_sizer)
        self.Fit()</pre>
<p>What we see:</p>
<ul>
<li>We specify the “wx.EXPAND” flag. This will be explained shortly.</li>
<li>The “wx.ALL” flag specifies that the border should apply to all sides (it’s a shortcut for “wx.LEFT | wx.TOP | wx.RIGHT | wx.BOTTOM”).</li>
</ul>
<p><strong>What does “wx.EXPAND” do?</strong></p>
<div id="attachment_593" class="wp-caption alignright" style="width: 425px"><a href="http://realmike.org/blog/wp-content/uploads/2012/06/sizer_expand.png"><img class="size-full wp-image-593" title="Sizer EXPAND" src="http://realmike.org/blog/wp-content/uploads/2012/06/sizer_expand.png" alt="" width="415" height="200" /></a><p class="wp-caption-text">Sizer EXPAND</p></div>
<p>Without “wx.EXPAND”, a horizontal sizer aligns the widgets in <em>columns</em>, but it does not touch their <em>heights</em>. Similarly, a vertical sizer aligns the widgets in <em>rows</em>, but it does not touch their <em>widhts</em>.</p>
<p>When “wx.EXPAND” is specified when adding a widget to a horizontal sizer, the sizer will adjust the <em>height</em> of the widget to the height of the sizer. (The height of the sizer is the maximum height of its children.) Similarly, “wx.EXPAND” tells a vertical sizer to adjust the <em>width</em> of the widget.</p>
<p><strong>Tip:</strong> When the sizers do not work as expected, the Widget Inspector might help you find the problem. Select a widget and click the “Highlight” button to check whether it takes up the space that you expected.</p>
<h2><a name="__RefHeading__29_1765578431"></a>Getting Rid of the Console</h2>
<p>When you double-click a .py file, a console window opens. This is annoying and useless for GUI applications.</p>
<p>This can be solved in two ways:</p>
<ul>
<li>Rename the .py file to .pyw</li>
<li>Run the .py file with “pythonw.exe” instead of “python.exe”</li>
</ul>
<p>If you still want the output of “print” statements to be visible, pass “redirect=True” to the initializer of the “wx.App” object. A separate window will be opened when a “print” occurs.</p>
<p><strong>Advanced:</strong> You can also write your own file-like object (like “StringIO”) that you assign to “sys.stdout” and “sys.stderr” and that appends all texts to a “Log Messages” window in the GUI.</p>
<h2><a name="__RefHeading__31_1765578431"></a>GUI Tools for Creating GUIs</h2>
<p>There are tools that let you layout frames and dialog boxes graphically. Personally, I prefer creating widgets programmatically, because the graphical editors that I tried all have shortcomings. If you want to check for yourself, see XRCed, which comes with the wxPython Demos and Tools.</p>
<h1><a name="__RefHeading__33_1765578431"></a>Homework</h1>
<p>Run this command in the Python shell:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&gt;&gt;&gt; import this</span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2012/06/07/python-training-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Training &#8211; Part 3</title>
		<link>http://realmike.org/blog/2012/06/07/python-training-part-3/</link>
		<comments>http://realmike.org/blog/2012/06/07/python-training-part-3/#comments</comments>
		<pubDate>Thu, 07 Jun 2012 17:36:28 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=578</guid>
		<description><![CDATA[Part 1 &#124; Part 2 &#124;&#124; Part 4 &#124; Part 5 This is part 3 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2012/06/07/python-training-part-3/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><a href="http://realmike.org/blog/2012/06/07/python-training-part-1/">Part 1</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-2/">Part 2</a> || <a href="http://realmike.org/blog/2012/06/07/python-training-part-4/">Part 4</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-5/">Part 5</a></p>
<p><em>This is part 3 of a Python training that I gave while I was working at <a href="http://spielo.com/careers">SPIELO International</a>. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a <a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.</em></p>
<p><span id="more-578"></span></p>
<h1>Exceptions</h1>
<p>In Python, exceptions are the primary error handling mechanism. Whether you access an invalid list index, or you open a file that doesn’t exist, or you divide by zero—an exception is raised in all of these cases. If your program doesn’t handle the exception explicitly, a traceback is printed and the program terminates:</p>
<pre>Traceback (most recent call last):
  File "C:\temp\x.py", line 9, in &lt;module&gt;
    Main()
  File "C:\temp\x.py", line 7, in Main
    print Divide(5, 0)
  File "C:\temp\x.py", line 3, in Divide
    x = a / b
ZeroDivisionError: integer division or modulo by zero</pre>
<h2>2.1Basic Usage</h2>
<p>To handle exceptions, enclose code that might raise an exception in a <strong>“try” / “except”</strong> block. To raise an exception when something goes wrong, use the <strong>“raise”</strong> keyword.</p>
<p>If you are familiar with exception handling in C++, here’s how it maps to Python:</p>
<table cellspacing="0" cellpadding="7" width="100%">
<tbody>
<tr valign="TOP">
<td><strong>C++</strong></td>
<td><strong>Python</strong></td>
</tr>
<tr valign="TOP">
<td>
<pre><strong>// Define your own exception class.</strong>
class MyException
{
public:
    MyException(const string&amp; msg)
    :   m_msg(msg)
    { }

    string m_msg;
};

<strong>// Raise an exception.</strong>
void FunctionWithError()
{
    throw MyException("Oops.");
}

<strong>// Handle an exception.</strong>
void HandleException()
{
    try
    {
        FunctionWithError();
    }
    catch (MyException&amp; e)
    {
        cerr &lt;&lt; e.m_msg;
    }
    catch (YourException)
    {
        cerr &lt;&lt; "Your exception."
    }
    catch (...)
    {
        cerr &lt;&lt; "Unknown error."
    }
}</pre>
</td>
<td>
<pre><strong># Define your own exception class.</strong>
class MyException:
    def __init__(self, msg):
        self.m_msg = msg

<strong># Raise an exception.</strong>
def FunctionWithError():
    raise MyException("Oops.")

<strong># Handle an exception.</strong>
def HandleException():
    try:
        FunctionWithError()
    except MyException, e:
        print e.m_msg
    except YourException:
        print "Your exception."
    except:
        print "Unknown error."</pre>
</td>
</tr>
</tbody>
</table>
<h2><a name="__RefHeading__7_1465704042"></a>“Exception” Base Class</h2>
<p>All of the exceptions that the Python interpreter or the standard library functions raise are derived from “Exception”. (I don’t recall any library function where this is not the case.) When you define your own exception classes, you should derive from “Exception” as well.</p>
<p>For exceptions that don’t need anything besides a message:</p>
<pre>class MyException(Exception):
    pass

try:
    raise MyException("Oh no!")
<strong>except Exception, e:</strong>
        <strong># Catches all exceptions that are derived from Exception.</strong>
    print e</pre>
<p>For exceptions that need more:</p>
<pre>class MyExtendedException(Exception):
    def __init__(self, info):
        # Initialize the base class with our own message.
        Exception.__init__(self, str(info) + "it happens")
        self.m_info = info

try:
    raise MyExtendedException("Sh")
except MyExtendedException, e:
    print e, e.m_info</pre>
<h2>Catching Multiple Types of Exceptions</h2>
<p>The “except” keyword accepts a tuple with any number of exception classes:</p>
<pre>try:
    if x:
        raise FirstException()
    else:
        raise SecondException()
except <strong>(FirstException, SecondException)</strong>, e:
    print e</pre>
<p>The previous code is equivalent to:</p>
<pre>try:
    if x:
        raise FirstException()
    else:
        raise SecondException()
except FirstException, e:
    print e
except SecondException, e:
    print e</pre>
<h2>“try” / “except” / “else”</h2>
<p>To run a code block only if no exception was raised, add an “else” clause to the “try” block:</p>
<pre>try:
    print "Don't raise."
except:
    print "We never get here."
else:
    print "This is only run when no exception occurred."</pre>
<h2>“try” / “except” / “finally”</h2>
<p>To run a code block regardless of whether an exception occurred or not, use “finally”:</p>
<pre>def IntermediateFunction(fail):
    try:
        FunctionThatFailsSometimes(fail)
    finally:
        print "We always get here."
        # <strong>Hidden homework:</strong> See what happens when you
        # add "return" here. (Hint: Does the exception still
        # get through?)

def FunctionThatFailsSometimes(fail):
    if fail:
        print "Raise."
        raise Exception()
    else:
        print "Don't raise."

try:
    print "---Fail"
    IntermediateFunction(True)
except:
    print "The exception still gets through."
finally:
    print '"except" and "finally" can be used together.'

try:
    print "---Success"
    IntermediateFunction(False)
except:
    print "We never get here."
finally:
    print '"except" and "finally" can be used together.'</pre>
<p><strong>Output:</strong></p>
<pre>---Fail
Raise.
We always get here.
The exception still gets through.
"except" and "finally" can be used together.
---Success
Don't raise.
We always get here.
"except" and "finally" can be used together.</pre>
<h2>Printing a Traceback</h2>
<p>Sometimes you want to handle an exception and still print the same kind of traceback that you would get from the interpreter if you didn’t have a “try” / “except”. Use the “traceback” module for this:</p>
<pre>import traceback
try:
    raise Exception()
except:
    traceback.print_exc()    # print to stderr
    f = open("exception.txt", "w")
    traceback.print_exc(file=f)
    f.close()</pre>
<h1>Operator Overloading and Other Magic</h1>
<p>A class can define a number of special methods to do things for which you would use operator overloading in C++.</p>
<p><strong>Note:</strong> Some of these methods make your class behave more like one of the built-in types, like “list”, “dict”, or “str”. If your class is merely an extension of one of these types, consider deriving from the built-in class, but mind the <a href="http://en.wikipedia.org/wiki/Liskov_substitution_principle">Liskov Substitution Principle</a>.</p>
<h2><a name="__RefHeading__19_1465704042"></a>Destructor</h2>
<p>To perform clean-up when the object is deleted, add a “__del__” method to your class:</p>
<pre>class MyClass(object):
    def __del__(self):
        print "Object is deleted."

a = MyClass()
b = {1: a}
print "Removing first reference to object"
del a
print "Removing last reference to object"
b.clear()

# Output:
#    Removing first reference to object
#    Removing last reference to object
#    Object is deleted.</pre>
<h2>Support “str()” and “repr()”</h2>
<p>When you call “str()” and “repr()” on an instance of a user-defined class, you get something like this:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">&lt;MyClass instance at 0&#215;12345678&gt;</span></span></p>
<p>To end up with something nicer, add “__str__” and “__repr__” methods to your class:</p>
<pre>class MyClass(object):
    def __str__(self):
        return "I am a MyClass."

    def __repr__(self):
        return "MyClass()"

x = MyClass()
print str(x)     # prints "I am a MyClass."
print repr(x)    # prints "MyClass()"</pre>
<p><strong>Note:</strong> If you don’t have a “__str__” method, the “__repr__” method is used for “str()” as well.</p>
<p>If the string representation should be a Unicode string, add a “__unicode__” method:</p>
<pre>class MyClass(object):
    def __unicode__(self):
        return u"\xe4\xf6\xfc"

    def __str__(self):
        return "aou"

x = MyClass()
print str(x)     # prints "aou"
print "%s" % x   # prints "aou"
print unicode(x) # prints "äöü"
print u"%s" % x  # prints "äöü"</pre>
<h2>Support “==”, “!=”, “&lt;”, “&lt;=”, etc.</h2>
<p>To support comparison operations, add these methods to your class:</p>
<table cellspacing="0" cellpadding="7">
<tbody>
<tr valign="TOP">
<td bgcolor="#e6e6e6"><strong>Operator</strong></td>
<td bgcolor="#e6e6e6"><strong>Method</strong></td>
</tr>
<tr valign="TOP">
<td>==</td>
<td>__eq__</td>
</tr>
<tr valign="TOP">
<td bgcolor="#e6e6e6">!=</td>
<td bgcolor="#e6e6e6">__ne__</td>
</tr>
<tr valign="TOP">
<td>&lt;</td>
<td>__lt__</td>
</tr>
<tr valign="TOP">
<td bgcolor="#e6e6e6">&lt;=</td>
<td bgcolor="#e6e6e6">__le__</td>
</tr>
<tr valign="TOP">
<td>&gt;</td>
<td>__gt__</td>
</tr>
<tr valign="TOP">
<td bgcolor="#e6e6e6">&gt;=</td>
<td bgcolor="#e6e6e6">__ge__</td>
</tr>
</tbody>
</table>
<p>Each of these methods receives the other object as its only parameter and should return True or False:</p>
<pre>class MyClass(object):
    def __init__(self, a, b):
        self.m_a = a
        self.m_b = b
    def __eq__(self, other):
        return (self.m_a, self.m_b) == (other.m_a, other.m_b)
    def __ne__(self, other):
        return not self.__eq__(other)
    def __lt__(self, other):
        return (self.m_a, self.m_b) &lt; (other.m_a, other.m_b)
    def __le__(self, other):
        return self.__lt__(other) or self.__eq__(other)
    def __gt__(self, other):
        return (self.m_a, self.m_b) &gt; (other.m_a, other.m_b)
    def __ge__(self, other):
        return self.__gt__(other) or self.__eq__(other)
    def __repr__(self):
        return repr((self.m_a, self.m_b))

a = MyClass(3, 7)
b = MyClass(3, 7)
print a == b   # True
print a != b   # False
print a &lt; b    # False
print a &gt; b    # False
print a &gt;= b   # True

# Comparison is also required for sorting:
ls = [MyClass(3, 7), MyClass(5, 2), MyClass(3, 5)]
ls.sort()
print ls    # prints [ (3, 5), (3, 7), (5, 2)]</pre>
<h2>Support “[ ]”</h2>
<p>To support the subscript operator “[ ]”, add “__getitem__” and “__setitem__” methods. For list-like classes, these methods should receive an integer index and raise an “IndexError” if the index is invalid. For dict-like classes, these methods should receive a key of any type and raise a “KeyError” if the key is unknown.</p>
<pre>class Alphabet(object):
    def __getitem__(self, idx):
        if 0 &lt;= idx &lt; 26:
            return chr(idx + ord("A"))
        else:
            raise IndexError("Index out of range")

    def __setitem__(self, idx, value):
        if 0 &lt;= idx &lt; 26:
            print "All %s will be changed to %s" % (
                chr(idx + ord("A")), value)
        else:
            raise IndexError("Index out of range")

x = Alphabet()
print x[3]       # prints "D"
x[2] = "Y"       # prints "All C will be changed to Y"</pre>
<h2>Support “for” Loops (Iteration)</h2>
<p>There are two ways of supporting “for” loops over a sequence:</p>
<ul>
<li>Implement “len()” and “[ ]” with “__len__” and “__getitem__”</li>
<li>Implement an iterator with “__iter__”</li>
</ul>
<h3>Using “len()” and “[ ]”</h3>
<pre>class MyList(object):
    def __len__(self):
        return 5

    def __getitem__(self, idx):
        if 0 &lt;= idx &lt; len(self):
            return idx * 10
        else:
            raise IndexError("Index out of range")

ls = MyList()
for i in ls:
    print i,

# Output:
#    0 10 20 30 40</pre>
<h3>Using an Iterator</h3>
<pre>class MyList(object):
    class MyIterator(object):
        def __init__(self, the_str):
            self.__m_the_str = the_str
            self.__m_idx = 0
        <strong>def __iter__(self):</strong>
            return self
        <strong>def next(self):</strong>
            if self.__m_idx &gt;= len(self.__m_the_str):
                <strong>raise StopIteration()</strong>
            else:
                self.__m_idx += 1
                return self.__m_the_str[self.__m_idx - 1]

    <strong>def __iter__(self):</strong>
        return MyList.MyIterator("Iterate over this")

ls = MyList()
for i in ls:
    print i,

# Output:
#    I t e r a t e  o v e r  t h i s</pre>
<p>How it works:</p>
<ul>
<li>The “for” loop calls the “__iter__” method.</li>
<li>The “__iter__” method must return an iterator object that implements two methods:
<ul>
<li>“__iter__”, which returns the iterator itself</li>
<li>“next”, which is called repeatedly to retrieve the elements, until it raises a “StopIteration” exception.</li>
</ul>
</li>
</ul>
<h3>Using an Iterator and a Generator Function</h3>
<p>The preceding example can be written much more concisely using a “generator” function:</p>
<pre>class MyList(object):
    <strong>def __iter__(self):</strong>
        for c in "Iterate over this":
            <strong>yield c</strong>

ls = MyList()
for i in ls:
    print i,

# Output:
#    I t e r a t e  o v e r  t h i s</pre>
<p>How it works:</p>
<ul>
<li>The “yield” keyword turns a normal function into a generator function.</li>
<li>When the generator function is called, it really returns an iterator.</li>
<li>For each step of the iteration, the function executes until it encounters a “yield”.</li>
<li>The result of the “yield” becomes value of the current step.</li>
<li>The iteration continues until the function exits through an implicit or explicit “return”.</li>
</ul>
<p>Another generator example:</p>
<pre>def OddFibonacci(maximum):
    a, b = 0, 1
    while b &lt;= maximum:
        if (b % 2) == 0:
            yield str(b) + " is even!"
        else:
            yield b
        a, b = b, a + b

for x in OddFibonacci(5):
    print x

# Output:
#    1
#    1
#    2 is even!
#    3
#    5</pre>
<h2>Calling an Object like a Function (Functors)</h2>
<p>To be able to call an object like a function, add a “__call__” method with any number of arguments:</p>
<pre>class MyFunctor(object):
    def __init__(self, factor):
        self.__m_factor = factor

    def __call__(self, a):
        return self.m_factor * a

f = MyFunctor(10)
print f(3)    # prints 30</pre>
<p>This particular example can also be written using the “lambda” keyword:</p>
<pre><strong>f = lambda a: 10 * a</strong>
    # This is equivalent to:
    #    def f(a):
    #        return 10 * a
print f(3)    # prints 30</pre>
<h2>More</h2>
<p>There are many other special methods that we didn’t talk about. See chapter 3.4, “Special method names,” in the “Python Reference Manual.”</p>
<h1><a name="__RefHeading__33_1465704042"></a>Common Scripting Tasks</h1>
<h2><a name="__RefHeading__35_1465704042"></a>Walking a Directory Structure</h2>
<p>The “os.walk()” function walks a directory tree and returns a list (or rather, an iterator over a list) of tuples of the form <span style="font-family: Courier New,monospace;">“(dirpath, dirnames, filenames)”.</span> Here’s an example:</p>
<pre>import os
root = r"c:\temp"
for dirpath, dirnames, filenames in os.walk(root):
    print os.path.join(root, dirpath)
    print "  Sub-directories:",
    prefix = "\n   - "
    print prefix + prefix.join(dirnames)
    print "  Files:",
    print prefix + prefix.join(filenames)</pre>
<ul>
<li>To list the contents of a single directory, you can also use “os.listdir()”.</li>
<li>To list filenames that match a pattern (e.g., “*.txt” or “log????.*”), use “glob.glob()”.</li>
</ul>
<p>&nbsp;</p>
<p><strong>More info:</strong> See the docs of the following modules for other filesystem-related functions that you might find useful: “os”, “os.path”, “shutil”, “glob”</p>
<h2><a name="__RefHeading__37_1465704042"></a>Running External Programs</h2>
<p>There are several ways of running external programs. The most important ones are:</p>
<ul>
<li>Calling <strong>“os.system()”</strong> with the command as you would type it on the command line. This function waits for the program to finish and returns its exit code.</li>
<li>Using the <strong>“subprocess.Popen”</strong> class, you can run a program asynchronously (without blocking the calling Python program) and you can communicate with the program via stdin, stdout, and stderr.</li>
<li>Use “os.startfile()” to open a file with its associated program. For example, to open a Word document in Word, you can write <span style="font-family: Courier New,monospace;">“os.startfile(&#8216;document1.doc&#8217;)”.</span> This is like double-clicking the file in Explorer.</li>
</ul>
<p>Here’s an example of using “subprocess.Popen” to layout a graph using “dot.exe” (from the <a href="http://www.graphviz.org/">Graphviz</a> package):</p>
<pre>import subprocess

PROGRAM_PATH = r"dot.exe"

<strong>p = subprocess.Popen(PROGRAM_PATH + " -T plain",</strong>
                     <strong>stdin=subprocess.PIPE,</strong>
                     <strong>stdout=subprocess.PIPE)</strong>
graph = """digraph {
    a -&gt; b
    b -&gt; c
    a -&gt; c
    }
    """
<strong>stdout, stderr = p.communicate(graph)</strong>

print stdout</pre>
<p>Instead of redirecting the output of the program, we can just as well work with temporary files and “os.system()”:</p>
<pre>import os
import tempfile

DOT_PATH = r"dot.exe"
EXAMPLE_GRAPH = "digraph { a -&gt; b; b -&gt; c; a -&gt; c }"

def GetTempFilename():
    fh, fname = tempfile.mkstemp(suffix=".tmp", prefix="dot")
    os.close(fh)
    return fname

def LayoutGraph(graph):
    input_file = GetTempFilename()
    output_file = GetTempFilename()

    try:
        open(input_file, "w").write(graph)
        os.system(DOT_PATH + ' -T plain -o "%s" "%s"'
                  % (output_file, input_file))
        return open(output_file).read()
    finally:
        # Delete the temporary files.
        os.remove(input_file)
        os.remove(output_file)

if __name__ == "__main__":
    print LayoutGraph(EXAMPLE_GRAPH)</pre>
<h2>Regular Expressions</h2>
<p>Regular expressions facilitate searching for patterns in a string. The syntax appears a bit cryptic at first (which is probably due to it <em>being</em> cryptic), but don’t give up easily. It’s often easier to use a regular expression than to perform the same parsing using basic string operations like “find”, “split”, and slicing.</p>
<p>As an example, let’s assume we have a text file that contains dates in a certain format:</p>
<pre>...
Meeting on <strong>2007-09-13</strong>. Call Joe at 555-1232-4756.
... See last week's report (<strong>2008-02-01</strong>). ...
Reservations were made from <strong>2009-1-17</strong>
to <strong>2009-2-16</strong>.
...</pre>
<p>Using a Python script, we’d like to transform it to this:</p>
<pre>...
Meeting on <strong>Thursday, 13 September 2007</strong>. Call Joe at 555-1232-4756.
... See last week's report (<strong>Friday, 01 February 2008</strong>). ...
Reservations were made from <strong>Saturday, 17 January 2009</strong>
to <strong>Monday, 16 February 2009</strong>.
...</pre>
<p>Let’s start with a regex that matches only the string “2007-09-13” and use the “re.sub()” function to replace it with “Thursday, 13 September 2007”:</p>
<pre>import re

text = """...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...
"""

print re.sub(r"2007-09-13", "Thursday, 13 September 2007", text)</pre>
<p><strong>Note:</strong> You should <em>always</em> use a raw string (r”…”) for the pattern string. (Backslashes are used frequently as part of the regular expression syntax. If you don’t use raw strings, you have to escape each backslash, which makes patterns harder to read.)</p>
<p>Instead of hard-coding the replacement string, what we really want is to call a function each time the pattern matches and calculate the replacement string in the function. This is possible by passing a function to “re.sub()”:</p>
<pre>...
def ReplaceDate(match):
    return "Thursday, 13 September 2007"
print re.sub(r"2007-09-13", ReplaceDate, text)</pre>
<p>The argument “match” to the “ReplaceDate” function is a “re.MatchObject” instance. To find out what a match object can do, try this in an interactive shell:</p>
<pre>&gt;&gt;&gt; text = "Meeting on 2007-09-13."
&gt;&gt;&gt; m = re.search(r"2007-09-13", text)
&gt;&gt;&gt; m
… &lt;_sre.SRE_Match object at 0x01E84058&gt;
&gt;&gt;&gt; dir(_)
… ['__copy__', '__deepcopy__', 'end', 'expand', 'group', 'groupdict', 'groups', 'span', 'start']
&gt;&gt;&gt; m.group()
… '2007-09-13'
&gt;&gt;&gt; m.start(), m.end()
… 11, 21
&gt;&gt;&gt; text[11:21]
… '2007-09-13'</pre>
<p>Let’s write a better regular expression:</p>
<pre>def ReplaceDate(match):
    return repr(match.group())
print re.sub(r"(\d{4})-(\d{1,2})-(\d{1,2})", ReplaceDate, text)</pre>
<p><strong>Output:</strong></p>
<pre>...
Meeting on ('2007', '09', '13'). Call Joe at 555-1232-4756.
See last week's report (('2008', '02', '01')). ...
Reservations were made from ('2009', '1', '17')
to ('2009', '2', '16').
...</pre>
<p>Let’s pick apart the regular expression: (\d{4})-(\d{1,2})-(\d{1,2})</p>
<ul>
<li>“\d” matches any decimal digit.</li>
<li>Appending {m,n} matches m to n repetitions of the preceding pattern. For example, “\d{1,2}” matches a single digit or two digits.</li>
<li>Other ways of indicating repetitions are “\d?” (an optional digit), “\d+” (one or more digits), and “\d*” (zero or more digits).</li>
<li>The parentheses are used to create groups. A tuple of these groups is returned by the “groups” method of the match object.</li>
</ul>
<p>What’s missing is some code that converts a tuple like <span style="font-family: Courier New,monospace;">“(&#8217;2007&#8242;, &#8217;09&#8242;, &#8217;13&#8242;)” </span>to the string “Thursday, 13 September 2007”. Here’s the final code:</p>
<pre>import datetime
import re

text = """...
Meeting on 2007-09-13. Call Joe at 555-1232-4756.
... See last week's report (2008-02-01). ...
Reservations were made from 2009-1-17
to 2009-2-16.
...
"""

def ReplaceDate(match):
    year, month, day = map(int, match.groups())
    date = datetime.date(year, month, day)
    return date.strftime("%A, %d %B %Y")

print re.sub(r"(\d{4})-(\d{1,2})-(\d{1,2})", ReplaceDate, text)</pre>
<p><strong>More info:</strong> See the docs of the “re” module. An overview of the regular expression syntax can be found in the section “<a href="http://docs.python.org/library/re.html">Regular Expression Syntax</a>” in the “Python Library Reference.”</p>
<h1>Homework</h1>
<p>The homework combines several techniques presented in this handout. The program can be written in less than 100 lines (comments included).</p>
<p>Write a program that draws an “#include&#8221; graph of some C++ code of your choice:</p>
<ul>
<li>Walk one or more directories that contain your “.cpp” and “.h” files.</li>
<li>Open each “.cpp” file and search for “#include” directives (preferably using a regular expression).</li>
<li>For each “.cpp” file, store a list of all “.h” files that you find in the “#include” directives. The pairs of “.cpp” and “.h” files are the edges of your graph.</li>
<li>Once you have all the edges, write a graph file for “dot.exe” similar to this:
<pre>digraph {
    "main.cpp" -&gt; "helper.h"
    "main.cpp" -&gt; "container.h"
    "helper.cpp" -&gt; "os.h"
    "helper.cpp" -&gt; "helper.h"
    ...
}</pre>
</li>
<li>Invoke <span style="font-family: Courier New,monospace;">“\util\dot\dot.exe -T png -o includes.png the_graph.txt”</span> to draw the include graph.<br />
<a href="http://realmike.org/blog/wp-content/uploads/2012/06/include_graph.png"><img class="size-full wp-image-581 alignnone" title="include_graph" src="http://realmike.org/blog/wp-content/uploads/2012/06/include_graph.png" alt="" width="341" height="155" /></a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2012/06/07/python-training-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Training &#8211; Part 2</title>
		<link>http://realmike.org/blog/2012/06/07/python-training-part-2/</link>
		<comments>http://realmike.org/blog/2012/06/07/python-training-part-2/#comments</comments>
		<pubDate>Thu, 07 Jun 2012 16:47:40 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=566</guid>
		<description><![CDATA[Part 1 &#124;&#124; Part 3 &#124; Part 4 &#124; Part 5 This is part 2 of a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2012/06/07/python-training-part-2/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><a href="http://realmike.org/blog/2012/06/07/python-training-part-1/">Part 1</a> || <a href="http://realmike.org/blog/2012/06/07/python-training-part-3/">Part 3</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-4/">Part 4</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-5/">Part 5</a></p>
<p><em>This is part 2 of a Python training that I gave while I was working at <a href="http://spielo.com/careers">SPIELO International</a>. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a <a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.</em></p>
<p><span id="more-566"></span></p>
<p><em><strong>Note:</strong> The training was based on Python 2.x, because that&#8217;s what we were using at the time. I would love to update it to Python 3 at some point. Any help with this would be greatly appreciated.</em></p>
<h1>Namespaces and Scopes</h1>
<p>When you define a variable anywhere in a Python function, the variable is added to the local namespace of the function. Unlike in C++, even if the variable is defined inside an “if” statement or in a “for” loop, the variable name is also visible outside the “if” or “for” block:</p>
<pre>def SomeFunc(a):
    def InnerFunc():
        return 2 * x    # x is visible here as well
    if a:
        x = a
    else:
        x = 0
    return x + InnerFunc()</pre>
<p>If a name is not found in the local scope at runtime, Python tries the global scope (aka module scope, where a module refers to the “.py” file) next, then the built-in scope (containing things like “int()”, “sorted()”, etc.).</p>
<pre>the_global = 123

def SomeFunc():
    return the_global

def OtherFunc():
    the_global = 456
    return the_global

def GlobalFunc():
    global the_global
    the_global = 789
    return the_global

def BuggyFunc():
    x = the_global
    the_global = 456
    return x

print the_global    # prints 123
print SomeFunc()    # prints 123
print OtherFunc()   # prints 456
print the_global    # still prints 123
print GlobalFunc()  # prints 789
print the_global    # prints 789
print BuggyFunc()   # UnboundLocalError: local variable
                    # 'the_global' referenced before assignment</pre>
<p>Things to note:</p>
<ul>
<li>When you assign to a variable in a function, you are always writing to the local namespace of the function, even if the variable name exists in the global scope as well (see “OtherFunc”).</li>
<li>To assign to a variable in the global namespace, use the “global” keyword (see “GlobalFunc”).</li>
<li>As soon as you assign to a variable in a function, reading from the variable anywhere in the function accesses the local variable, even in a line that precedes the assignment. Therefore, you receive an error in “BuggyFunc”, because <span style="font-family: Courier New,monospace;">“x = the_global”</span> tries to read the local variable “the_global”, which isn’t assigned yet.</li>
</ul>
<p>You can use these built-in functions to access the contents of namespaces:</p>
<ul>
<li><strong>globals():</strong> Returns a dictionary containing the variables in the module scope.</li>
<li><strong>locals():</strong> Returns a dictionary containing the variables in the local (class or function) scope. You should not modify the dictionary.</li>
<li><strong>dir():</strong> When called without parameters, returns a list of variables in the local scope. When called with an object as the parameter, returns a list of attributes of the object.</li>
</ul>
<p>You can remove the binding of a name to an object by using the <strong>“del”</strong> keyword:</p>
<pre>&gt;&gt;&gt; the_global = 123
&gt;&gt;&gt; print the_global
…
&gt;&gt;&gt; del the_global
&gt;&gt;&gt; print the_global
… NameError: name 'the_global' is not defined</pre>
<h1>Classes</h1>
<p>To define a class in Python, use the “class” keyword. For the methods, use the “def” keyword, just like for functions.</p>
<pre>class SomeClass(<strong>object</strong>):
    """@brief The docstring for the class."""

    def <strong>__init__</strong>(<strong>self</strong>, initial_value):
        """@brief This is the constructor, or more precisely,
                the initializer of the class.
        """
        <strong>self.m_some_member</strong> = initial_value

    def SomeMethod(self, inc):
        self.m_some_member += inc
        return self.m_some_member

# Working with the class
c = SomeClass(100)
print c.SomeMethod(10)    # prints 110
c.m_some_member = 0
print c.m_some_member     # prints 0</pre>
<p>What we see:</p>
<ul>
<li>New-style classes are derived from the <strong>“object”</strong>class.
<ul>
<li>If you write just <span style="font-family: Courier New,monospace;">“class SomeClass:”</span>, you still get a class, but some of the features that we’ll discuss later (such as properties) won’t work. So you should always derive from “object” (or from a class that’s already derived from “object”) if possible.</li>
</ul>
</li>
<li>The equivalent to a C++ constructor is the <strong>“__init__”</strong> method. You can leave it out if you have no fields to initialize. (“__del__” is the opposite.)</li>
<li>The first parameter to all methods must be the object instance. By convention, you should name it <strong>“self”.</strong>
<ul>
<li>This is the equivalent to the implicit “this” parameter in C++.</li>
</ul>
</li>
<li>To create <strong>fields</strong>, assign to attributes of “self”. Typically, this is done in “__init__”, but it can be done everywhere.
<ul>
<li>According to our Python coding guidelines, fields are prefixed with “m_”, but this is just convention.</li>
</ul>
</li>
<li>To create <strong>instances</strong> of the class, call the class like a function with the parameters specified for “__init__”.</li>
<li>You can access fields directly from outside the class (<span style="font-family: Courier New,monospace;">“c.m_some_member = …”</span>).</li>
</ul>
<h2><a name="__RefHeading__7_1886880643"></a>Inheritance</h2>
<p>To create a derived class, specify a comma-separated list of base classes when you define the class:</p>
<pre>class DerivedClass(BaseA, BaseB):</pre>
<p>The derived class inherits all the attributes of the base classes. When an attribute name appears both in “BaseA” and in “BaseB”, the attribute from “BaseA” has precedence, because it appears first in the list of base classes.</p>
<p>An example demonstrating some aspects of inheritance:</p>
<pre>class BaseA(object):
    def <strong>__init__</strong>(self):
        self.m_member = None

    def <strong>Method</strong>A(self):
        print "BaseA. MethodA", <strong>self.m_member</strong>

    def <strong>CommonMethod</strong>(self):
        print "BaseA.CommonMethod"

class BaseB(object):
    def MethodB(self):
        print "BaseB. MethodB"

    def <strong>CommonMethod</strong>(self):
        print "BaseB.CommonMethod"

class DerivedClass(<strong>BaseA, BaseB</strong>):
    def <strong>MethodA</strong>(self):
        print "DerivedClass.MethodA"
        # Call the inherited method
        <strong>self.m_member</strong> = "hi!"
        <strong>BaseA.MethodA(self)</strong>

d = DerivedClass()
d.MethodA()
# Output:
#  DerivedClass.MethodA
#  BaseA.MethodA hi!

d.MethodB()
# Output:
#  BaseB.MethodB

d.CommonMethod()
# Output:
#  BaseA.CommonMethod</pre>
<p>What we see:</p>
<ul>
<li>“BaseA.__init__” is invoked automatically when you create an instance of “DerivedClass”.</li>
<li>An attribute in a derived class overwrites an attribute of the same name in the base class (“DerivedClass.MethodA”).</li>
<li>If an attribute appears in more than one base class, the attribute from the class that was specified first in the list of base classes has precedence (“BaseA.CommonMethod”).</li>
<li>To invoke the base class implementation of a method, write <span style="font-family: Courier New,monospace;">“</span><span style="font-family: Courier New,monospace;"><em>BaseClass</em></span><span style="font-family: Courier New,monospace;">.</span><span style="font-family: Courier New,monospace;"><em>MethodName</em></span><span style="font-family: Courier New,monospace;">”</span> and pass “self” as the first parameter explicitly.</li>
</ul>
<p>Please note that if you define your own “__init__” in the derived class, or if you have multiple base classes with an “__init__”, you are responsible for invoking the base class implementation of “__init__”:</p>
<pre>class BaseClass(object):
    def __init__(self):
        self.m_member = 0

    def SomeMethod(self):
        return self.m_member

class GoodDerivedClass(BaseClass):
    <strong>def __init__(self):</strong>
        <strong>BaseClass.__init__(self)</strong>

class BadDerivedClass(BaseClass):
    <strong>def __init__(self):</strong>
        <strong>pass</strong>

good = GoodDerivedClass()
print good.SomeMethod()    # prints 0

bad = BadDerivedClass()
print bad.SomeMethod()     # AttributeError: 'BadDerivedClass'
                           # object has no attribute 'm_member'</pre>
<h2>Public, private, protected</h2>
<p>Python distinguishes between <strong>public</strong> and <strong>private</strong> attributes. Any attribute name prefixed with two underscores becomes private (except for names of the form “__xy__”).</p>
<pre>class SomeClass(object):
    def PublicMethod(self):
        <strong>self.__m_private_field</strong> = "encapsulated"
        return self.__PrivateMethod()

    def <strong>__PrivateMethod</strong>(self):
        return self.__m_private_field

c = SomeClass()
print c.PublicMethod()    # prints "encapsulated"
print c.__PrivateMethod() # <strong>AttributeError</strong>: SomeClass instance has
                          # no attribute '__PrivateMethod'
print c.__m_private_field # <strong>AttributeError</strong>: SomeClass instance has
                          # no attribute '__m_private_field'</pre>
<p>What we see:</p>
<ul>
<li>To make an attribute (method or field) private, prefix it with “__”.</li>
<li>Private attributes can only be accessed from inside the class.</li>
<li>An “AttributeError” exception is raised when you try to access private attributes from outside the class.</li>
</ul>
<p>Python does not support <strong>protected</strong> attributes (i.e., attributes that you can access in derived classes only). By convention, we prefix such attributes with a single underscore, so that users of the class know they’re an implementation detail, but authors of derived classes can still access them:</p>
<pre>class BaseClass(object):
    def <strong>_ProtectedMethod</strong>(self):
        <strong>self._m_protected</strong> = "protected"

class DerivedClass(BaseClass):
    def PublicMethod(self):
        <strong>self._ProtectedMethod()</strong>
        print <strong>self._m_protected</strong>
    def __PrivateMethod(self):
        return self.__m_private_field</pre>
<h2>Properties</h2>
<p>Python supports “properties”, i.e., pairs of Get/Set methods that are called transparently when you access an attribute:</p>
<pre>class SomeClass(object):
    def __init__(self, initial_value):
        self.__m_read_write_prop = initial_value
        self.__m_read_only_prop = initial_value

    def <strong>__GetReadWriteProp</strong>(self):
        print "Someone's reading ReadWriteProp"
        return self.__m_read_write_prop

    def <strong>__SetReadWriteProp</strong>(self, new_value):
        print "Someone's writing ReadWriteProp"
        self.__m_read_write_prop = new_value

    <strong>ReadWriteProp = property(fget=__GetReadWriteProp,</strong>
                             <strong>fset=__SetReadWriteProp)</strong>

    def <strong>__GetReadOnlyProp</strong>(self):
        print "Someone's reading ReadOnlyProp"
        return self.__m_read_only_prop

    <strong>ReadOnlyProp = property(fget=__GetReadOnlyProp)</strong>

C = SomeClass("initial")
print "val =", c.ReadWriteProp
# Output:
#  Someone's reading ReadWriteProp
#  val = initial

c.ReadWriteProp = "new"
# Output:
#  Someone's writing ReadWriteProp

print "val =", c.ReadWriteProp
# Output:
#  Someone's reading ReadWriteProp
#  val = new

print "val =", c.ReadOnlyProp
# Output:
#  Someone's reading ReadWriteProp
#  val = initial

c.ReadOnlyProp = "new"    # AttributeError: can't set attribute</pre>
<p><strong>Note:</strong> For properties to work, the class must be derived from “object”. Otherwise, the property loses its special behavior and becomes a normal attribute as soon as you write <span style="font-family: Courier New,monospace;">“c.ReadWriteProp = 1000”</span>.</p>
<p><strong>Note:</strong> Properties are an application of Python’s “descriptor” concept. For more about this and other features of new-style classes, see my article “<a href="http://realmike.org/blog/2010/07/18/introduction-to-new-style-classes-in-python/">Introduction to New-Style Classes in Python</a>”.</p>
<h2>Static Fields and Static Methods</h2>
<p>Python supports static fields and methods:</p>
<pre>class InstanceCounter(object):
    <strong>s_num_instances</strong> = 0

    def __init__(self):
        <strong>InstanceCounter.s_num_instances</strong> += 1

    def <strong>GetNumInstances</strong>():    # no "self" here
        return <strong>InstanceCounter.s_num_instances</strong>

    <strong>GetNumInstances = staticmethod(GetNumInstances)</strong>

a = InstanceCounter()
b = InstanceCounter()
print InstanceCounter.GetNumInstances()    # prints 2
InstanceCounter.s_num_instances = 100
c = InstanceCounter()
print InstanceCounter.GetNumInstances()    # prints 101

print a.s_num_instances    # prints 101
<strong>a.s_num_instances = 5</strong>
print a.s_num_instances                    # prints 5
print InstanceCounter.GetNumInstances()    # still prints 101</pre>
<p>Things to note:</p>
<ul>
<li>According to our Python coding guidelines, static fields are prefixed with “s_”, but this is just convention.</li>
<li>Static methods do not have a “self” parameter, obviously.</li>
</ul>
<ul>
<li>Static fields and methods can be used both via the class and via an instance.</li>
<li>In the line <span style="font-family: Courier New,monospace;">“a.s_num_instances = 5”,</span> an attribute named “s_num_instances” is added to the symbol table of “a”. This attribute hides the static field of the same name in “SomeClass” when you access it through “a”. The static field of “SomeClass” is not changed.</li>
</ul>
<h2><a name="__RefHeading__15_1886880643"></a>Class Methods</h2>
<p>Class methods are similar to static methods (but less frequently used). Like static methods, they don’t receive a “self” parameter, but they receive a “cls” parameter with a reference to the class object:</p>
<pre>class SomeClass(object):
    def ClassMethod(cls):
        print cls.__name__

    ClassMethod = classmethod(ClassMethod)

class DerivedClass(SomeClass):
    pass

SomeClass.ClassMethod()    # prints "SomeClass"
c = SomeClass()
c.ClassMethod()            # prints "SomeClass"
DerivedClass.ClassMethod() # prints "DerivedClass"
d = DerivedClass()
d.ClassMethod()            # prints "DerivedClass"</pre>
<h1>Example – Parsing XML</h1>
<p>The following sections walk you through the task of writing a Python program that prints the contents of an XML document. This will give us plenty of opportunity to learn new things about Python programming in general.</p>
<p><strong>Note: </strong>You can find the code and example XML documents in the <a href="http://realmike.org/blog/wp-content/uploads/2012/06/all_sessions_code.zip">ZIP package</a> for this lesson.</p>
<h2><a name="__RefHeading__19_1886880643"></a>Setting up a ContentHandler</h2>
<p>The standard Python library contains an XML parser and modules to access XML documents using the SAX and DOM APIs. We’ll be using the SAX API from the “xml.sax” module. This module contains the function “parse”, which requires a user-defined class with callbacks for handling the various parts of the XML document.</p>
<pre># File: step_1\xml_printer.py
import sys
import xml.sax
import xml.sax.handler

class Printer(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_level = 0
        self.m_num_elements = 0

    def startElement(self, name, attrs):
        # Invoked for each opening tag
        print " " * self.__m_level + name
        self.__m_level += 1
        self.m_num_elements += 1

    def endElement(self, name):
        # Invoked for each closing tag
        self.__m_level -= 1

def Main(filename_or_stream):
    handler = Printer()
    xml.sax.parse(filename_or_stream, handler)
    print handler.m_num_elements, "elements"

if __name__ == "__main__":
    Main(sys.argv[1])  # The name of the XML file must be the
                       # first command-line parameter.</pre>
<p>The program produces output like this:</p>
<pre>...
     MOB_Object
      MOB_InfoLine
       MOB_Text
      MOB_InfoLine
       MOB_Text
     MOB_Object
      MOB_GDL_ReelSlotGameLine
...
2995 elements</pre>
<p>New things in the code:</p>
<ul>
<li>The code in <span style="font-family: Courier New,monospace;">“if __name__ == &#8220;__main__&#8221;”</span> is executed only when the “.py” file is the main program. When the “.py” file is loaded into another program via the “import” keyword, the code is not run. The importing module can call the “Main()” function later with an XML file of its own choice.</li>
<li>“sys.argv” is a list of command-line parameters to the Python program. “sys.argv[0]” contains the path to the program. The remaining elements contain the parameters.</li>
</ul>
<h2><a name="__RefHeading__21_1886880643"></a>Writing the Unit Test</h2>
<p><strong>In this section:</strong> In-memory files with the “StringIO” class and redirecting “sys.stdout”.</p>
<p>The program reads its input from a file and prints the output directly to STDOUT. One way of writing the unit test would be:</p>
<ul>
<li>Prepare an XML file “test_input.xml” with test data.</li>
<li>Invoke the program from the unit test by running another instance of Python with redirected output:
<pre>import os
import sys
…
os.system(sys.executable + " xml_printer.py "
          "test_input.xml &gt;output.txt")</pre>
<p>This is equivalent to running “xml_printer.py test_input.xml &gt;output.txt” on the command line.</li>
<li>Compare the expected results to the results in “output.txt”.</li>
</ul>
<p>While this approach might have its advantages, we’ll go down a different route:</p>
<ul>
<li>Prepare the XML input in an <strong>in-memory file,</strong> which we can pass directly to the “Main” function. The <strong>“StringIO”</strong> class serves as an in-memory file.</li>
<li><strong>Redirect STDOUT</strong> by setting the “sys.stdout” variable to another “StringIO” object. (The “print” statements in the program go through “sys.stdout” implicitly.)</li>
<li>Compare the expected results to the contents of the redirected STDOUT.</li>
</ul>
<p>This is our unit test:</p>
<pre># File: step_1\test_xml_printer.py
import StringIO
import sys
import unittest
import xml_printer

class TextXmlPrinter(unittest.TestCase):
    def <strong>setUp</strong>(self):
        # Redirect STDOUT so that all subsequent "print"
        # statements in the Python program go to a StringIO buffer.
        self.__m_old_stdout = sys.stdout
        <strong>sys.stdout = StringIO.StringIO()</strong>

    def <strong>tearDown</strong>(self):
        # Restore STDOUT so that prints go to the screen again.
        sys.stdout = self.__m_old_stdout

    def test_PrintHierarchy(self):
        # Prepare the XML in an in-memory file, i.e., in a
        # StringIO buffer.
        <strong>data = StringIO.StringIO(</strong>
            """&lt;?xml version="1.0"?&gt;
               &lt;A&gt;
                 &lt;B&gt;
                   &lt;C/&gt;
                 &lt;/B&gt;
                 &lt;D&gt;
                   &lt;E/&gt;
                 &lt;/D&gt;
               &lt;/A&gt;
            """)
        xml_printer.Main(data)
        expected = ("A\n"
                    " B\n"
                    "  C\n"
                    " D\n"
                    "  E\n"
                    "5 elements\n")
        # Compare the expected results to the contents of
        # our redirected STDOUT.
        self.assertEquals(expected,
                          <strong>sys.stdout.getvalue()</strong>)

if __name__ == "__main__":
    unittest.main()</pre>
<p>Things to note:</p>
<ul>
<li>The “setUp&#8221; method is called before each “test_” method.</li>
<li>The “tearDown&#8221; method is called after each “test_” method, even if the test fails.</li>
<li>We redirect STDOUT by temporarily setting the global variable “sys.stdout” to a “StringIO” buffer, and restoring the original stream afterwards.</li>
</ul>
<h2><a name="__RefHeading__23_1886880643"></a>Printing Attribute Values</h2>
<p><strong>In this section:</strong> Working with Unicode strings.</p>
<p>In the next step, we’ll print the XML attributes for each element. The attributes are passed as a dictionary to the “startElement” method of the “ContentHandler”. Try this:</p>
<pre># File: step_2\xml_printer.py
import sys
import xml.sax
import xml.sax.handler

class Printer(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_level = 0
        self.m_num_elements = 0

    def startElement(self, name, attrs):
        # Invoked for each opening tag
        print " " * self.__m_level + name
        self.__m_level += 1
        self.m_num_elements += 1
        <strong>for attr_name, attr_value in attrs.items():</strong>
            <strong>print " " * self.__m_level + " -", print attr_name, "=", attr_value</strong>
            <strong> # This might cause an error. Explanation follows.</strong>

    def endElement(self, name):
        # Invoked for each closing tag
        self.__m_level -= 1

def Main(filename_or_stream):
    handler = Printer()
    xml.sax.parse(filename_or_stream, handler)
    print handler.m_num_elements, "elements"

if __name__ == "__main__":
    Main(sys.argv[1])  # The name of the XML file must be the
                       # first command-line parameter.</pre>
<p>When you run this program on the command line with the file “scene.xml” from the example ZIP package, you receive this error in the line <span style="font-family: Courier New,monospace;">“print attr_name, &#8220;=&#8221;, attr_value”</span>:</p>
<pre>…
  File "C:\Python25\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position 0: character maps to &lt;undefined&gt;</pre>
<p>The reason for the error is this: The XML parser works with Unicode strings. One of the attribute values contains the € sign (Unicode 0x20ac). This character is stored in a Python string as follows:</p>
<pre>&gt;&gt;&gt; u"\u20ac"
… u'\u20ac'
&gt;&gt;&gt; type(_)
… &amp;lt;type 'unicode'&amp;gt;</pre>
<p>When you print a string of type “unicode” to STDOUT or to a file, it is converted to an 8-bit string using an encoding. The encoding used by the console (at least on my computer) is cp437. However, codepage 437 does not define the € sign, so the attempt to encode the string results in an error. Try it yourself:</p>
<pre>&gt;&gt;&gt; u"\u20ac".encode("cp437")
… UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position 0: character maps to &lt;undefined&gt;</pre>
<p>Important notes:</p>
<ul>
<li>In PyCrust or another graphical shell, you can <span style="font-family: Courier New,monospace;">“print u&#8217;\u20ac&#8217;”</span> without problems. The error occurs only on the command line. The reason is that PyCrust uses its own file object for “sys.stdout”, which supports Unicode strings directly.</li>
<li>There is no encoding error when you work with normal strings. For example, if you read the € sign from a file in the standard Windows encoding, cp1252, it ends up as the Python string “\x80”. When you print this, no encoding or decoding takes place. However, the wrong character might show up if the target of the print uses a different encoding than cp1252.</li>
</ul>
<p>To solve the problem, we could either print “repr(attr_value)” instead, or we can encode the string to ASCII before printing it, replacing all unknown characters with “?”:</p>
<pre>&gt;&gt;&gt; u"\u20ac".encode("ascii", "replace")
… '?'</pre>
<p>So, the new code for printing the attributes looks like this:</p>
<pre>for attr_name, attr_value in attrs.items():
    print " " * self.__m_level + " -", print attr_name, "=", \
          attr_value.encode("ascii", "replace")</pre>
<p>If you need to convert an 8-bit string that contains characters in a certain encoding to a Unicode object, use the “decode” method:</p>
<pre>&gt;&gt;&gt; "\x80".decode("cp1252")
… u'\u20ac'
&gt;&gt;&gt; _.encode("utf-8")
… '\xe2\x82\xac'
&gt;&gt;&gt; _.decode("utf-8")
… u'\u20ac'</pre>
<h3>Writing to a File Using an Encoding</h3>
<p>To write Unicode strings to a file using a special encoding, you can use the “open” function from the “codecs” module. This is an example of printing the € sign to a UTF-8-encoded XML file:</p>
<pre>import codecs
f = codecs.open("utf8.xml", "w", "utf-8")
print &gt;&gt; f, '&lt;?xml version="1.0" encoding="utf-8"?&gt;'
print &gt;&gt; f, u"&lt;Root&gt;\u20ac&lt;/Root&gt;"
f.close()</pre>
<p>When you open the resulting file in a hex editor, you can see that the € sign is stored as a sequence of three bytes, as defined by the UTF-8 encoding. When you open the file in a text editor that supports the UTF-8 encoding, the € sign appears correctly.</p>
<h2><a name="__RefHeading__25_1886880643"></a>Building an Object Tree</h2>
<p><strong>In this section:</strong> The “setattr” introspection function and more about lists.</p>
<p>Finally, let’s create Python objects from the XML elements. The unit test shows the desired interface of these objects:</p>
<pre><!--?xml version="1.0"?--># File: step_3\test_xml_objects.py
import unittest
import xml_objects

data = """&lt;?xml version="1.0"?&gt;
&lt;Page&gt;
  &lt;Paragraph align="left"&gt;
    This is &lt;Bold&gt;bold&lt;/Bold&gt; text.
  &lt;/Paragraph&gt;
  &lt;Paragraph align="center"&gt;
    &lt;Bold&gt;Bold&lt;/Bold&gt; and &lt;Italic&gt;italic&lt;/Italic&gt;.
  &lt;/Paragraph&gt;
  &lt;Table border="1"&gt;
    &lt;Row&gt;&lt;Cell&gt;A&lt;/Cell&gt;&lt;Cell&gt;B&lt;/Cell&gt;&lt;/Row&gt;
  &lt;/Table&gt;
&lt;/Page&gt;"""

class TextXmlObjects(unittest.TestCase):
    def setUp(self):
        <strong>self.__m_root = xml_objects.Load(data)</strong>
    def test_ChildNodes(self):
        self.assertEquals(
            set(["Paragraph", "Table"]),
            <strong>self.__m_root.GetChildNames())</strong>
        self.assertEquals(
            2, len(<strong>self.__m_root.GetChildNodes("Paragraph")</strong>))
        self.assertEquals(
            1, len(self.__m_root.GetChildNodes("Table")))

    def test_Attributes(self):
        first_para = self.__m_root.GetChildNodes("Paragraph")[0]
        self.assertEquals(
            ["align"],
            <strong>first_para.GetAttributeNames()</strong>)
        self.assertEquals(
            "left", <strong>first_para.align</strong>)

if __name__ == "__main__":
    unittest.main()</pre>
<p>To summarize the interface:</p>
<ul>
<li>The “Load” function returns the object for the root element.</li>
<li>The “GetChildNames” method returns a set of the child element names.</li>
<li>The “GetChildNodes” method returns a list of the child elements of a given name.</li>
<li>The “GetAttributeNames” method returns a list of XML attribute names.</li>
<li>The XML attribute values can be queried using normal Python attributes.</li>
</ul>
<p>This is the code of the program:</p>
<pre># File: step_3\xml_objects.py
import xml.sax
import xml.sax.handler

class Element(object):
    def __init__(self):
        self.__m_child_nodes = []
        self.__m_attributes = {}

    def AddChildNode(self, name, element):
        self.__m_child_nodes.append((name, element))

    def AddAttributes(self, attrs):
        self.__m_attributes.update(attrs)
        for (attr_name,
             attr_value) in self.__m_attributes.iteritems():
            <strong>setattr(self, attr_name, attr_value)</strong>

    def GetChildNames(self):
        <strong>return set([name for name, element in self.__m_child_nodes])</strong>

    def GetChildNodes(self, element_name):
        <strong>return [element for name, element in self.__m_child_nodes</strong>
                <strong>if name == element_name]</strong>

    def GetAttributeNames(self):
        return self.__m_attributes.keys()

class Loader(xml.sax.handler.ContentHandler):
    def __init__(self):
        self.__m_element_stack = []
        self.__m_root = None

    def GetRoot(self):
        return self.__m_root

    def startElement(self, name, attrs):
        element = Element()
        element.AddAttributes(attrs)
        <strong>self.__m_element_stack.append(element)</strong>

    def endElement(self, name):
        <strong>element = self.__m_element_stack.pop()</strong>
        if self.__m_element_stack:
            self.__m_element_stack[-1].AddChildNode(name, element)
        else:
            self.__m_root = element

def Load(xml_string):
    handler = Loader()
    xml.sax.parseString(xml_string, handler)
    return handler.GetRoot()</pre>
<p>The code works like this:</p>
<ul>
<li>When a new XML element starts, an “Element” instance is pushed on a stack.</li>
<li>The XML attributes are passed to the “AddAttributes” method of the “Element”.</li>
<li>In “AddAttributes”, new Python attributes are added to the instance using <strong>“setattr”:</strong> The code <span style="font-family: Courier New,monospace;">“setattr(x, &#8220;y&#8221;, z)”</span> has the same effect as <span style="font-family: Courier New,monospace;">“x.y = z”.</span> (“getattr” can be used for reading.)</li>
<li>When an XML element ends, the “Element” instance is popped off the stack and passed to the “AddChildNode” method of the parent element.</li>
<li>When there are no more elements on the stack, we have reached the root.</li>
<li>The “GetChildNames” and “GetChildNodes” methods use the <strong>list comprehension</strong> syntax (<span style="font-family: Courier New,monospace;"><strong>“[x for y in ys if expr]”</strong></span>) to return parts of the list “self.__m_child_nodes”.</li>
</ul>
<p>More list comprehension examples:</p>
<pre>&gt;&gt;&gt; names = ["John", "Frank", "Sue", "Jane"]
&gt;&gt;&gt; numbers = [1, 2, 3, 4, 5, 6, 7, 8]
&gt;&gt;&gt; [n.upper() for n in names if n.startswith("J")]
… ["JOHN", "JANE"]
&gt;&gt;&gt; # Convert two lists to a list of tuples by using "zip".
&gt;&gt;&gt; [x for x in <strong>zip(numbers, names)</strong>]
… [(1, 'John'), (2, 'Frank'), (3, 'Sue'), (4, 'Jane')]</pre>
<h1>Homework</h1>
<p>Extend “xml_objects.py” so that it handles the text contents of the elements. In the element…</p>
<pre>    &lt;Paragraph align="left"&gt;
        This is &lt;Bold&gt;bold&lt;/Bold&gt; text.
    &lt;/Paragraph&gt;</pre>
<p>… it should be possible to retrieve the text “This is”, the child element “&lt;Bold&gt;”, and the text “text.” in the correct order.</p>
<p>Write the unit test first to help you find a convenient interface.</p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2012/06/07/python-training-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Training &#8211; Part 1</title>
		<link>http://realmike.org/blog/2012/06/07/python-training-part-1/</link>
		<comments>http://realmike.org/blog/2012/06/07/python-training-part-1/#comments</comments>
		<pubDate>Thu, 07 Jun 2012 15:40:01 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=551</guid>
		<description><![CDATA[Part 2 &#124; Part 3 &#124; Part 4 &#124; Part 5 This is the course material for a Python training that I gave while I was working at SPIELO International. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a Creative Commons Attribution-ShareAlike 3.0 <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2012/06/07/python-training-part-1/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><a href="http://realmike.org/blog/2012/06/07/python-training-part-2/">Part 2</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-3/">Part 3</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-4/">Part 4</a> | <a href="http://realmike.org/blog/2012/06/07/python-training-part-5/">Part 5</a></p>
<p><em>This is the course material for a Python training that I gave while I was working at <a href="http://spielo.com/careers">SPIELO International</a>. My manager kindly gave me permission to publish the material. The material is Copyright © 2008-2011 SPIELO International, licensed under a <a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>.</em></p>
<p><em>The target audience for the trainings were software developers, testers, and mathematicians. Each training session took 3 to 4 hours and consisted mostly of interactive programming, roughly (but not slavishly) following the outline on this page. Depending on the prior programming experience of each group, I made small changes to the trainings as I went along.</em></p>
<p><em>Each participant received this tutorial printed to a booklet (12 or 16 pages). Even though I would usually deviate a lot from the exact examples in the handouts, having the handouts frees the participants from taking their own notes and allows them to actively participate in the classroom. I also gave out little exercises at the end of each session to be completed by the participants until the following week.</em></p>
<p><span id="more-551"></span></p>
<p><em><strong>Note:</strong> The training was based on Python 2.x, because that&#8217;s what we were using at the time. I would love to update it to Python 3 at some point. Any help with this would be greatly appreciated.</em></p>
<h1>About Python</h1>
<p>Python is an <strong>interpreted</strong>, <strong>dynamically-typed</strong>, <strong>object-oriented</strong> programming language. It can be used for writing small, throw-away scripts and large, object-oriented applications alike. Some of Python’s features:</p>
<ul>
<li>Built-in high-level data types</li>
<li>Garbage collector</li>
<li>Comprehensive standard library and a myriad of add-on libraries on the web</li>
<li>Easily extensible with libraries written in other programming languages</li>
</ul>
<h1>Getting Started</h1>
<h2><a name="__RefHeading__38_1139887251"></a>Installing</h2>
<p>To use Python locally, install these programs:</p>
<ul>
<li>Standard Python from <a href="http://www.python.org/">http://www.python.org/</a></li>
<li>wxPython (a library for GUI programming) and the wxPython Tools and Demos (includes PyCrust, a graphical interactive Python shell) from <a href="http://www.wxpython.org/">http://www.wxpython.org/</a></li>
<li>Python Windows extensions (the PythonWin IDE and libraries for using the Win API from Python) from <a href="http://sourceforge.net/projects/pywin32/">http://sourceforge.net/projects/pywin32/</a></li>
</ul>
<h2>Running</h2>
<p>Python source code has the filename extension “.py”. When Python is installed locally, you can double-click a “.py” file to run the program. You can also run “python.exe &lt;py-file&gt;” from the command line.</p>
<h2><a name="__RefHeading__42_1139887251"></a>Interactive Mode</h2>
<p>When you start “python.exe” from the command line without parameters, Python runs in interactive mode. In this mode, you can type Python source code directly on the console and see the results immediately.</p>
<p>If you prefer a GUI, you can run the interactive Python shell using PyCrust (from the wxPython Tools) or the PythonWin IDE. Both provide code completion and other useful features.</p>
<h2><a name="__RefHeading__44_1139887251"></a>Editing/Debugging</h2>
<p>To edit and debug Python programs, you can use PythonWin.</p>
<p>For developing larger Python programs consisting of many files, I recommend Eclipse and the <a href="http://pydev.org/">PyDev</a> plug-in.</p>
<h2><a name="__RefHeading__46_1139887251"></a>Manuals</h2>
<p>When you install Python locally, you can find the Python Manuals, including a tutorial, on the Start menu.</p>
<p>In an interactive Python shell, use the built-in “help” function to display help for any Python object. For example, <span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">help(str)</span></span> prints a list of all methods of <span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">str</span></span> objects.</p>
<p>Use PyCrust to receive IntelliSense-style help on available methods and method parameters.</p>
<h1>Hello, World!</h1>
<p>Write the following code into a file with the extension “.py” or type the code in an interactive Python shell.</p>
<p>The typical “Hello, World!” program is nearly too simple in Python:</p>
<p><span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">print &#8220;Hello, World!&#8221;</span></span></p>
<p>Let’s make it more interesting:</p>
<pre>who = raw_input("Who do you want to greet (default is World): ")
if not who:
    who = "World"
else:
    who = who.capitalize() # make the first character upper-case
print who, "is in da house!", # comma at the end prevents newline
print "Hello, " + who + "!"</pre>
<p>Things to note:</p>
<ul>
<li>You don’t declare the data type of variables, you just assign a value (“who”)</li>
<li>No curly braces for “if” and “else” blocks: The indentation alone determines where a block ends</li>
<li>Strings are objects (“capitalize” method, “+” operator overloaded, etc.). As we’ll see later, everything’s an object in Python.</li>
</ul>
<h1>Built-in Data Types</h1>
<p>In the following examples, “&gt;&gt;&gt;” denotes lines that you type in the Python shell and “…” denotes the output.</p>
<p><strong>Note:</strong> I recommend PyCrust as an interactive shell, because it has some features that PythonWin lacks. You can paste lines starting with “&gt;&gt;&gt;” into PyCrust when you click “Edit” → “Paste Plus”.</p>
<h2><a name="__RefHeading__52_1139887251"></a>Numbers</h2>
<p>There are no big surprises. The operators are more or less the same as in C. Python has built-in support for large integers. Here are a few examples of working with numbers:</p>
<pre>&gt;&gt;&gt; (12 + 3) * 47
… 705
&gt;&gt;&gt; 0xff
… 255
&gt;&gt;&gt; 32 / 7.0 * 1.3e-6
… 5.9428571428571423e-006
&gt;&gt;&gt; 2**16
… 65536
&gt;&gt;&gt; _ - (1 &lt;&lt; 16)
    # in interactive mode, “_” contains the last printed value,
    # i.e., 65536
… 0
&gt;&gt;&gt; 12**45
… 3657261988008837196714082302655030834027437228032L</pre>
<h2>Strings</h2>
<p>Examples of working with strings:</p>
<pre>&gt;&gt;&gt; "Here's a string"
… "Here's a string"
&gt;&gt;&gt; 'This "is" it'
… 'This "is" it'
&gt;&gt;&gt; print "Newline\n\"and\"\ttab"
… Newline
… "and"   tab
&gt;&gt;&gt; "C:\\temp"
… 'C:\\temp'
&gt;&gt;&gt; r"C:\temp"
    # in a raw string ('r' prefix), the backslash is not an escape character
… 'C:\\temp'
&gt;&gt;&gt; """Multi-
line string"""
… 'Multi-\nline string'
&gt;&gt;&gt; '''Multi-
  line string'''
… ‘Multi-\n  line string’</pre>
<p>You can convert any object to human-readable form by using the “str” function. The “repr” function has a similar purpose: If possible, it returns a string representation that you can later pass to the “eval” function to turn the string back into an object.</p>
<pre>&gt;&gt;&gt; str(5)
… '5'
&gt;&gt;&gt; repr(5)
… '5'
&gt;&gt;&gt; str(3.8)
… '3.8'
&gt;&gt;&gt; repr(3.8)
… '3.7999999999999998'
&gt;&gt;&gt; repr({1: 3, 2: 4})
… '{1: 3, 2: 4}'
&gt;&gt;&gt; eval('{1: 3, 2: 4}')
… {1: 3, 2: 4}</pre>
<p>Some useful string methods and operators:</p>
<pre>&gt;&gt;&gt; "Hello, " + "World!"
… 'Hello, World!'
&gt;&gt;&gt; "x" * 10
… 'xxxxxxxxxx'
&gt;&gt;&gt; w = '    word  '
&gt;&gt;&gt; w.strip()
… 'word'
&gt;&gt;&gt; w.lstrip()
… 'word  '
&gt;&gt;&gt; w.rstrip()
… '    word'
&gt;&gt;&gt; s = "Edward Kennedy Ellington"
&gt;&gt;&gt; s[1]
… 'd'
&gt;&gt;&gt; s.startswith("Edw")
… True
&gt;&gt;&gt; "Kennedy" in s
… True
&gt;&gt;&gt; s.lower()
… 'edward kennedy ellington'
&gt;&gt;&gt; s.upper().replace("E", "U")
… 'UDWARD KUNNUDY ULLINGTON'</pre>
<p>Using the slice notation, you can access sub-strings:</p>
<pre>&gt;&gt;&gt; s = "Hello, World!"
&gt;&gt;&gt; s[1:]    # sub-string starting at index 1
… 'ello, World!'
&gt;&gt;&gt; s[:3]    # sub-string up to but not including index 3
… 'Hel'
&gt;&gt;&gt; s[2:5]
… 'llo'
&gt;&gt;&gt; len(s[2:5] == 5 - 2)
… True
&gt;&gt;&gt; s[-1]    # the last character
… '!'
&gt;&gt;&gt; s[:-2]    # everything up to the second-last character
… 'Hello, Worl'</pre>
<p>Strings are immutable, i.e., you can’t modify them after they were created. For example, you can’t use the subscript operator “[]” to overwrite characters in the string.</p>
<pre>&gt;&gt;&gt; s[1] = 'x'
… TypeError: 'str' object does not support item assignment</pre>
<p>The “%” operator provides printf-like functionality:</p>
<pre>&gt;&gt;&gt; "%s has the value %i" % ("X", 25)
… 'X has the value 25'
&gt;&gt;&gt; "%s has the value %s" % ("X", 25)
… 'X has the value 25'
&gt;&gt;&gt; val1 = 3
&gt;&gt;&gt; val2 = 8
&gt;&gt;&gt; "%(val1)i and %(val2)i" % locals()
… '3 and 8'</pre>
<p>For scanf-like functionality, you should use regular expressions. See the “re” module in the standard library. We’ll cover this in one of the next lessons.</p>
<p><strong>What we didn’t cover:</strong> There’s a separate class for Unicode strings. We’ll get back to this in one of the next lessons.</p>
<p>If you’re interested, you can try it out yourself:</p>
<pre>&gt;&gt;&gt; u"äöü"
… u'\xe4\xf6\xfc'
&gt;&gt;&gt; u"äöü".encode("utf-8")
… '\xc3\xa4\xc3\xb6\xc3\xbc'
&gt;&gt;&gt; _.decode("utf-8")
… u'\xe4\xf6\xfc'</pre>
<h2>Lists</h2>
<p>Python has a built-in list data type that can store objects of arbitrary types.</p>
<pre>&gt;&gt;&gt; ls = [1, "text", 3, [4, 5]]
&gt;&gt;&gt; ls
… [1, 'text', 3, [4, 5]]
&gt;&gt;&gt; ls[1]
… 'text'
&gt;&gt;&gt; ls[1] = 2
&gt;&gt;&gt; ls
… [1, 2, 3, [4, 5]]
&gt;&gt;&gt; ls[1:3]
… [2, 3]
&gt;&gt;&gt; ls[1:3] = []    # same as del[1:3]
&gt;&gt;&gt; ls
… [1, [4, 5]]
&gt;&gt;&gt; ls[1:2] = [4, 3, 5, 2]
&gt;&gt;&gt; ls
… [1, 4, 3, 5, 2, [4, 5]]
&gt;&gt;&gt; del ls[-1]
&gt;&gt;&gt; ls
… [1, 4, 3, 5, 2]
&gt;&gt;&gt; ls.sort()
&gt;&gt;&gt; ls
… [1, 2, 3, 4, 5]
&gt;&gt;&gt; ls.reverse()
&gt;&gt;&gt; ls
… [5, 4, 3, 2, 1]
&gt;&gt;&gt; ls.append(11)
&gt;&gt;&gt; ls
… [5, 4, 3, 2, 1, 11]
&gt;&gt;&gt; ls.pop()
… 11
&gt;&gt;&gt; ls
… [5, 4, 3, 2, 1]
&gt;&gt;&gt; ls.extend([11, 12, 13])
&gt;&gt;&gt; ls
… [5, 4, 3, 2, 1, 11, 12, 13]</pre>
<h2>Tuples</h2>
<p>You can think of tuples as fixed-length lists. Here are a few examples:</p>
<pre>&gt;&gt;&gt; t = (1, 2, "text")
&gt;&gt;&gt; t
… (1, 2, 'text')
&gt;&gt;&gt; len(t)
… 3
&gt;&gt;&gt; t[0]
… 1
&gt;&gt;&gt; t[1] = 5
… TypeError: 'tuple' object does not support item assignment
&gt;&gt;&gt; a, b, c = t
&gt;&gt;&gt; print a, b, c
… 1 2 text
&gt;&gt;&gt; u = a, b
&gt;&gt;&gt; u
… (1, 2)
&gt;&gt;&gt; empty = ()
&gt;&gt;&gt; empty
… ()
&gt;&gt;&gt; singleton = (1,)
&gt;&gt;&gt; singleton
… (1,)
&gt;&gt;&gt; list(singleton)
… [1]
&gt;&gt;&gt; tuple([1, 2, 3])
… (1, 2, 3)
&gt;&gt;&gt; i, (name, age) = (0, ("John", 4))
&gt;&gt;&gt; print i, name, age
… 0 John 4
&gt;&gt;&gt; # BTW, this works with lists just as well:
&gt;&gt;&gt; i, (name, age) = [1, ["Sue", 27]]
&gt;&gt;&gt; print i, name, age
… 1 Sue 27</pre>
<h2>Dictionaries</h2>
<p>Dictionaries are associative containers, i.e., mappings between keys and values. Python dictionaries are implemented as hash tables and can store arbitrary data types. Here are a few examples:</p>
<pre>&gt;&gt;&gt; d = {5: 3.2,
&gt;&gt;&gt;      "John": 4,
&gt;&gt;&gt;      (1, 2): [3, 4, 5]}
… {5: 3.2, 'John': 4, (1, 2): [3, 4, 5]}
&gt;&gt;&gt; d[5]
… 3.2
&gt;&gt;&gt; d["John"] = "Doe"
&gt;&gt;&gt; d
… {5: 3.2, 'John': 'Doe', (1, 2): [3, 4, 5]}
&gt;&gt;&gt; d["Jane"] = "Doe"
&gt;&gt;&gt; d
… {5: 3.2, 'John': 'Doe', 'Jane': 'Doe', (1, 2): [3, 4, 5]}
&gt;&gt;&gt; d.keys()
… [5, 'John', 'Jane', (1, 2)]
&gt;&gt;&gt; d.values()
… [3.2, 'Doe', 'Doe', [3, 4, 5]]
&gt;&gt;&gt; d.items()
… [(5, 3.2), ('John', 'Doe'), ('Jane', 'Doe'), ((1, 2), [3, 4, 5])]
&gt;&gt;&gt; d.update({5: 4.8, 7: 3})
&gt;&gt;&gt; d
… [(5, 4.8), ('John', 'Doe'), 7: 3, ('Jane', 'Doe'), ((1, 2), [3, 4, 5])]
&gt;&gt;&gt; d2 = dict([(5, 3.2), ("John", "Doe")])
&gt;&gt;&gt; d2
… {'John': 'Doe', 5: 3.2}
&gt;&gt;&gt; # Dicts can be used with the “%” operator for strings
&gt;&gt;&gt; "My name is %(John)s" % d2
… 'My name is Doe'
&gt;&gt;&gt; del d2["John"]
&gt;&gt;&gt; d2
… {5: 3.2}</pre>
<h2>Sets</h2>
<p>Sets are similar to lists, but their values are unique and unordered. The “set” data type supports operations such as computing the union, difference, and intersection of sets.</p>
<pre>&gt;&gt;&gt; s = set([10, 3, 7, 3, 5, 4, 4])
&gt;&gt;&gt; s
… set([10, 3, 4, 5, 7]
&gt;&gt;&gt; s.add(8)
&gt;&gt;&gt; s
… set([3, 4, 5, 7, 8, 10])
&gt;&gt;&gt; s.difference([7, 10, 4])
… set([3, 5, 8])
&gt;&gt;&gt; s.union([3, "Joe"])
… set([3, 4, 5, 7, 8, 10, 'Joe'])
&gt;&gt;&gt; s.intersection([1, 5, 7])
… set([5, 7])</pre>
<h1>Control Structures</h1>
<h2><a name="__RefHeading__66_1139887251"></a>If Statements</h2>
<p>This is an example for an if-elif-else statement:</p>
<pre>if x == 5:
    This()
    AndThat()
elif y &gt; 3 and z != 4:
    Other()
else:
    SomethingElseEntirely()</pre>
<p>This, of course, is the same as:</p>
<pre>if x == 5:
    This()
    AndThat()
else:
    if y &gt; 3 and z != 4:
        Other()
    else:
        SomethingElseEntirely()</pre>
<p>If the Boolean expression gets too long, you might want to introduce a line break. The following attempt, however, would result in a syntax error:</p>
<pre>if y &gt; 3
  and z != 4:
    …</pre>
<p>You must either use the line continuation character “\”:</p>
<pre>if y &gt; 3 \
  and z != 4:
    …</pre>
<p>Or enclose the expression in parentheses:</p>
<pre>if (y &gt; 3
  and z != 4):
    …</pre>
<p>Python does not check the indentation inside parentheses, brackets (as used for lists), and curly braces (as used for dicts), so you can always insert line breaks inside those.</p>
<h2>While Loops</h2>
<p>Here’s an example for a “while” loop in Python:</p>
<pre>x = 1
while x &lt; 100:
    print x
    x *= 3</pre>
<p><strong>Note:</strong> There is no equivalent to “do-while” loops in Python.</p>
<h2>For Loops</h2>
<p>In Python, “for” loops are used exclusively to iterate over sequences. If you want something like “for (int i = 0; i &lt; 10; ++i)”, use the “xrange” function, which returns a sequence of integers:</p>
<pre>for i in xrange(10):
    print i,</pre>
<p><strong>Output:</strong> 0 1 2 3 4 5 6 7 8 9</p>
<pre>ls = [6, 1, 5, 3, 7]
for x in ls:
    print x,

for x in sorted(ls):
    print x,

for x in reversed(ls):
    print x,</pre>
<p><strong>Output:</strong></p>
<blockquote><p>6 1 5 3 7<br />
1 3 5 6 7<br />
7 3 5 1 6</p></blockquote>
<pre>d = {1: 10, 3: 30, 5: 50}
for k in d:
    print k,</pre>
<p><strong>Output:</strong> 1 3 5</p>
<pre>d = {1: 10, 3: 30, 5: 50}
for k, v in d.iteritems():
    print "%i = %i" % (k, v)</pre>
<p><strong>Output:</strong></p>
<blockquote><p>1 = 10<br />
3 = 30<br />
5 = 50</p></blockquote>
<pre>people = [("John", 4),
           ("Sue", 27),
           ("Frank", 15),
           ("Clara", 8)]
for i, (name, age) in enumerate(people):
    print "%i: %s is %i years old" % (i, name, age)</pre>
<p><strong>Output:</strong></p>
<blockquote><p>0: John is 4 years old<br />
1: Sue is 27 years old<br />
2: Frank is 15 years old<br />
3: Clara is 8 years old</p></blockquote>
<p>The “break” and “continue” keywords known from C exist in Python as well.</p>
<pre>people = [("John", 4),
           ("Sue", 27),
           ("Frank", 15),
           ("Clara", 8)]
for i, (name, age) in enumerate(people):
    if age &gt; 25:
        continue
    print "%i: %s is %i years old" % (i, name, age)
    if age == 15:
        break</pre>
<p><strong>Output:</strong></p>
<blockquote><p>0: John is 4 years old<br />
2: Frank is 15 years old</p></blockquote>
<p>Python supports an “else” clause for loops. Consider this common construct:</p>
<pre>found = False
for name, age in people:
    if age &gt; 25:
        found = True
        break
if not found:
    print "No one is older than 25."</pre>
<p>This can be written shorter using “for-else”:</p>
<pre>for name, age in people:
    if age &gt; 25:
        break
else:
    print "No one is older than 25."</pre>
<p>The “else” clause is entered when the iteration continued all the way to the end, i.e., if no “break” was executed.</p>
<p>The “while” loop supports an “else” clause as well.</p>
<h1><a name="__RefHeading__72_1139887251"></a>Defining Functions</h1>
<p>Functions are defined using the “def” keyword. Here’s a simple function taking two parameters:</p>
<pre>def Add(a, b):
    return a + b

# Invoking it
print Add(5, 3)    # prints 8
print Add("Hello, ", "World!")    # prints “Hello, World!”
print Add([1, 2], [3, 4])    # prints [1, 2, 3, 4]</pre>
<p>Things to note:</p>
<ul>
<li>You don’t declare the data type of the parameters or of the return type
<ul>
<li>The function can be invoked with any two objects that support the “+” operator, similar to how template functions work in C++.</li>
</ul>
</li>
<li>The indentation alone determines where the function ends</li>
</ul>
<p><strong>Note:</strong> C++ programmers might wonder whether the lack of explicit data type declarations results in more bugs. My experience is that it’s very rare for a bug in a Python program to be caused by a data type mismatch. It is still a good idea to have a unit-testing suite with high code coverage for your Python code. This way, you can often detect such problems faster than the C++ compiler could parse the header files. <span style="font-family: Wingdings;"> <img src='http://realmike.org/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </span></p>
<p>Default parameters are also possible:</p>
<pre>def Greet(greeting, who="World"):
    print "%s, %s!" % (greeting, who)

# Invoking it
Greet("Hello")    # prints “Hello, World!”
Greet("Good morning", "Vietnam")    # prints “Good morning, Vietnam!”</pre>
<p>You can always specify the parameter names explicitly, in any order, regardless of whether there’s a default value or not:</p>
<pre>Greet(who="Forrest", greeting="Run")    # prints “Run, Forrest!”</pre>
<p>Another example, with several default parameters:</p>
<pre>def Print(a, b=1, c=2, d=3):
    print a, b, c, d
    # If there is no explicit “return”, this is equivalent to
    # return None

# Invoke it:
Print(0)    # prints 0 1 2 3
Print(0, 4, 8 )    # prints 0 4 8 3
Print(0, c=5)    # prints 0 1 5 3
Print(0, c=7, 33)    # syntax error “non-keyword arg after keyword arg”</pre>
<h2>Returning More Than One Value</h2>
<p>It is possible to return more than one value. More precisely, you can return a tuple and assign the elements of the tuple to individual variables. There is no magic involved here. We’ve seen all of this in the section about Tuples already.</p>
<pre>def GetPerson():
    return "John", 4

name, age = GetPerson()
print name, age
person = GetPerson()
print person</pre>
<h2>Parameter Passing</h2>
<p>When you pass parameters to a function, a reference to the passed object is added to the local scope of the function.</p>
<ul>
<li>When you modify the object in-place, the caller sees the changes, because the local parameter name is a reference to the same object that the caller sees.</li>
<li>When you assign a new value to the local parameter name, the object doesn’t change. The parameter name is now a reference to a different object, but this does not affect the contents of the original object.</li>
</ul>
<pre>def Func(a, b, c):
    a += "!!!"
    b.append(5)
    c = []

txt = "???"
list1 = [1, 2, 3]
list2 = [1.0, 2.0]
Func(txt, list1, list2)
print "txt =", txt
print "list1 =", list1
print "list2 =", list2</pre>
<p><strong>Output:</strong></p>
<blockquote><p>txt = &#8220;???&#8221;<br />
list1 = [1, 2, 3, 5]<br />
list2 = [1.0, 2.0]</p></blockquote>
<p>This is what happens:</p>
<ul>
<li>When the function is invoked, it creates three local names that contain references to the passed objects: <span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">a = txt, b = list1, c = list2</span></span></li>
<li>The line <span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">a += &#8220;!!!&#8221;</span></span> assigns a new object reference to the local name <em>a</em>. It does not change the string in-place. (Strings cannot be changed in-place at all because they are <em>immutable objects</em>.)</li>
<li>The line <span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">b.append(5)</span></span> appends a value to the object pointed to by <em>b</em>. This is the same object that the caller knows by the name <em>list1</em>.</li>
<li>The line <span style="font-family: Courier New,monospace;"><span style="font-size: x-small;">c = []</span></span> assigns a new object reference to the local name <em>c</em>. It does not modify the object that c referred to earlier.</li>
</ul>
<h1>Working with Text Files</h1>
<p>The built-in “open” function can be used to return file objects of existing or new files. The file objects contain “read” and “write” methods. For writing to a file, the “print” keyword supports the “&gt;&gt;” operator.</p>
<p>This code creates a new file:</p>
<pre>f = open("new.txt", "w")
f.write("This is the first line\n")
print &gt;&gt; f, "The second line"
print &gt;&gt; f, "\n".join(["a", "b", "c", "d"])
f.close()</pre>
<p>The new file contains these lines:</p>
<pre>This is the first line
The second line
a
b
c
d</pre>
<p>There are several ways of reading a text file. You can read the entire contents at once as a string:</p>
<pre>&gt;&gt;&gt; open("new.txt").read()
… 'This is the first line\nThe second line\na\nb\nc\nd\n'</pre>
<p>You can read the entire contents at once as a list of lines:</p>
<pre>&gt;&gt;&gt; open("new.txt").readlines()
… ['This is the first line\n', 'The second line\n', 'a\n', 'b\n', 'c\n', 'd\n']</pre>
<p>You can iterate over the file object, which reads the lines one by one:</p>
<pre>&gt;&gt;&gt; for ln in open("new.txt"):
&gt;&gt;&gt;     print ln.rstrip("\n")    # strip the newline at the end of each line
… This is the first line
… The second line
… a
… b
… c
… d</pre>
<p>You can call the “readline” method repeatedly until it returns an empty string:</p>
<pre>&gt;&gt;&gt; f = open("new.txt")
&gt;&gt;&gt; while True:
&gt;&gt;&gt;     ln = f.readline()
&gt;&gt;&gt;     if not ln:
&gt;&gt;&gt;         break
&gt;&gt;&gt;     print ln.rstrip("\n")    # strip the newline at the end of each line
… This is the first line
… The second line
… a
… b
… c
… d</pre>
<h1>Using the Standard Library</h1>
<p>Python comes with a comprehensive library of modules for various tasks. See the Global Module Index in the Python Manuals for a list of available modules.</p>
<p>This is a list of some of the most commonly used modules and their purpose:</p>
<ul>
<li>os: Operating system interfaces for working with files and directories, accessing environment variables, invoking external programs, etc.</li>
<li>sys: Command-line arguments, file objects for STDOUT and STDERR, internal interpreter variables, etc.</li>
<li>pprint: Pretty-print Python objects such as lists and dicts</li>
<li>StringIO: Provides the StringIO class, which behaves like an in-memory file. Similar to std::stringstream in C++.</li>
<li>unittest: Unit-testing framework</li>
<li>Many others…</li>
</ul>
<p>To use any of these modules, use the “import” keyword. Once imported, you can use the classes and functions defined in the module:</p>
<pre>&gt;&gt;&gt; import os
&gt;&gt;&gt; help(os)
… help text stripped
&gt;&gt;&gt; os.listdir("c:\\temp")
… ['many', 'files', 'in', 'here']
&gt;&gt;&gt; from os import listdir
&gt;&gt;&gt; listdir("c:\\temp")</pre>
<p>We’ll take a closer look at the standard library in the next lessons.</p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2012/06/07/python-training-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>EuroPython 2011 Notes</title>
		<link>http://realmike.org/blog/2011/07/10/europython-2011-notes/</link>
		<comments>http://realmike.org/blog/2011/07/10/europython-2011-notes/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 13:05:42 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[europython]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=520</guid>
		<description><![CDATA[These are some of my notes from EuroPython 2011. I mostly collect projects and tools that are of immediate interest to me in my own work. If you&#8217;re interested in the complete list of talks and would like to download the slides or watch the talks on video, you can find those on the EuroPython <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2011/07/10/europython-2011-notes/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><em>These are some of my notes from <a href="http://ep2011.europython.eu/">EuroPython 2011</a>. I mostly collect projects and tools that are of immediate interest to me in my own work</em>.<em> If you&#8217;re interested in the complete list of talks and would like to download the slides or watch the talks on video, you can find those on the <a href="http://ep2011.europython.eu/p3/schedule/ep2011/list/">EuroPython talks page</a>. Also, Julie Pichon has <a href="http://www.jpichon.net/tag/europython/">summaries of some interesting talks</a> on her site.<br />
</em></p>
<div id="attachment_532" class="wp-caption alignright" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2011/07/europython_bag.jpg"><img class="size-medium wp-image-532" title="EuroPython Bag" src="http://realmike.org/blog/wp-content/uploads/2011/07/europython_bag-300x254.jpg" alt="" width="300" height="254" /></a><p class="wp-caption-text">EuroPython Bag</p></div>
<p><strong>Python Environment</strong></p>
<p>At our company, we deploy non-interactive Python tools to our internal users. So far, we deploy these tools into the users&#8217; local Python installations. Problems arise when the package versions that our tools require are different from the versions that the user has installed, or when updates to our tools require new dependencies to be installed. To avoid such problems, the &#8220;virtualenv&#8221; package allows you to keep separate Python environments for different tools. The &#8220;pip&#8221; package can then be used to automatically install dependencies into these environments. Alex Clemesha has an article up on his blog about the <a href="http://www.clemesha.org/blog/modern-python-hacker-tools-virtualenv-fabric-pip">Tools of the Modern Python Hacker: Virtualenv, Fabric and Pip</a>.</p>
<p>Another useful addition to the Python toolbox is the &#8220;nose&#8221; unit-testing package, which extends the Python &#8220;unittest&#8221; module. It has advanced support for test fixtures, generated test cases, running test batteries, and more. More at the <a href="http://somethingaboutorange.com/mrl/projects/nose/">&#8220;nose&#8221; project homepage</a>.</p>
<p><span id="more-520"></span><strong>JavaScript</strong></p>
<p>As a language, I find JavaScript pretty&#8230;awful. But it can be used to build pretty cool stuff for the web, so it&#8217;s worth learning. That&#8217;s why I attended Jonathan Fine&#8217;s &#8220;<a href="http://ep2011.europython.eu/conference/talks/javascript-for-python-programmers">JavaScript for Python Programmers</a>&#8221; training.</p>
<p>One of the learning resources that Jonathan recommended was &#8220;<a href="http://www.youtube.com/watch?v=hQVTIJBZook">JavaScript: The Good Parts</a>&#8221; (on YouTube):</p>
<p><object width="480" height="390"><param name="movie" value="http://www.youtube.com/v/hQVTIJBZook?version=3&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="480" height="390" src="http://www.youtube.com/v/hQVTIJBZook?version=3&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p><strong>Network Programming</strong></p>
<p>For the grid computing system at our company, we rely mostly on the standard XML-RPC modules in Python. All the high-level logic, such as message queues, failure handling, authentication, etc. are custom-built on top of that.</p>
<p>Unsurprisingly, there is a wealth of existing libraries that might make our lives easier by not having to reinvent the wheel. (While adding their own layers of complexity, I&#8217;m sure.)</p>
<ul>
<li><a href="http://django-rest-framework.org/">Django REST Framework</a>—A library for adding a <a href="http://en.wikipedia.org/wiki/Representational_State_Transfer">RESTful</a> API to a Django application.</li>
<li><a href="http://celeryproject.org/">Celery</a>—An asynchronous task queue based on distributed message passing.</li>
<li><a href="http://www.zeromq.org/">ZeroMQ</a>—A &#8220;socket library that acts as a concurrency framework.&#8221;</li>
<li><a href="http://www.rabbitmq.com/">RabbitMQ</a>—Message-oriented middleware. <a href="http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/">Jason Williams describes</a> what it is and what it can be used for.</li>
<li><a href="http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol">AMQP</a> (Advanced Message Queuing Protocol)—The protocol that RabbitMQ implements.</li>
<li><a href="http://www.gevent.org/">gevent</a>—&#8221;A coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.&#8221; Geez, I don&#8217;t even begin to understand what this means&#8230;</li>
<li><a href="http://twistedmatrix.com/">Twisted</a>—An event-driven network engine. (<a href="http://ep2011.europython.eu/conference/talks/asynchronous-programming-with-twisted">Twisted training</a> by Orestis Markou.)</li>
</ul>
<p>Alex Clemesha&#8217;s article &#8220;<a href="http://www.clemesha.org/blog/realtime-web-apps-python-django-orbited-twisted">Real-world real-time web apps with Python equals Django + Orbited + Twisted</a>&#8221; describes how some of these fit in the big picture—and adds even more buzzwords and acronyms for good measure.</p>
<p>Some of these libraries come with implementations for the architectural patterns described in the book &#8220;<a href="http://www.cse.wustl.edu/~schmidt/POSA/POSA2/">Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects</a>.&#8221;</p>
<p><strong>Little Tidbits</strong></p>
<ul>
<li><a href="http://www.lag.net/paramiko/">Paramiko</a>—An SSH client/server library written in Python (using PyCrypto). Jesse Noller has an <a href="http://jessenoller.com/2009/02/05/ssh-programming-with-paramiko-completely-different/">introduction</a> on his site.</li>
<li><a href="http://kivy.org/">Kivy</a>—A &#8220;library for rapid development of applications that make use of innovative user interfaces, such as multi-touch apps.&#8221; For his lightning talk, <a href="http://txzone.net/2011/06/kivy-at-europython-lightning-explanation/">Mathieu Virbel</a> used a really cool presentation tool, <a href="http://github.com/tito/presemt">PreseMT</a>, that is itself based on Kivy. (<a href="http://youtu.be/SwYnUhWx0FY">Watch on YouTube</a>)<br />
<object width="480" height="390"><param name="movie" value="http://www.youtube.com/v/SwYnUhWx0FY?version=3&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="480" height="390" src="http://www.youtube.com/v/SwYnUhWx0FY?version=3&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object></li>
<li><a href="http://www.tarsnap.com/scrypt.html">scrypt</a> key derivation function—This can be used to calculate password hashes in a way that makes brute-force attacks way harder than using traditional MD5 or SHA hashes. <a href="http://pypi.python.org/pypi/scrypt/0.5.1">Python bindings</a> exist.</li>
<li><a href="http://tk0miya.bitbucket.org/blockdiag/build/html/index.html">blockdiag</a>—A block diagram image generator. It uses an input file format that&#8217;s similar to GraphViz.</li>
</ul>
<p><em><strong>See you at EuroPython 2012!</strong></em></p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2011/07/10/europython-2011-notes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Continued Fractions for Representing Real Numbers</title>
		<link>http://realmike.org/blog/2011/07/09/continued-fractions-for-representing-real-numbers/</link>
		<comments>http://realmike.org/blog/2011/07/09/continued-fractions-for-representing-real-numbers/#comments</comments>
		<pubDate>Sat, 09 Jul 2011 16:36:27 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[europython]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=507</guid>
		<description><![CDATA[This is something I learned at EuroPython 2011. I think it came up in a lightning talk by Alex Martelli, but I don&#8217;t recall exactly. Continued fractions are a representation of real numbers that allows for arbitrary-precision arithmetic. If you&#8217;ve worked with floating-point numbers in Python (or most any other programming language, for that matter), <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2011/07/09/continued-fractions-for-representing-real-numbers/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<p><em>This is something I learned at EuroPython 2011. I think it came up in a lightning talk by Alex Martelli, but I don&#8217;t recall exactly.</em></p>
<p>Continued fractions are a representation of real numbers that allows for arbitrary-precision arithmetic.</p>
<p>If you&#8217;ve worked with floating-point numbers in Python (or most any other programming language, for that matter), you are aware of precision problems like this:</p>
<pre><code>x = 1.0 / 3.0
total = 0.0
for i in range(300):
    total += x
print repr(total)
print total == 100.0
</code></pre>
<p>This prints 99.99999999999966, not 100.0. The reason is that the IEEE 754 floating-point representation of 1/3 isn&#8217;t exact, and this (initially small) error accumulates 300 times.</p>
<p><span id="more-507"></span>There is a different representation for real numbers: continued fractions.</p>
<ul>
<li>Consider the number PI = <strong>3</strong>.141592653589793.</li>
<li>This can also be written as: <strong>3</strong> + 1 / <strong>7</strong>.062513305931052</li>
<li>This can be written as: <strong>3</strong> + 1 / (<strong>7</strong> + 1 / <strong>15</strong>.996594406684103)</li>
<li>This can be written as: <strong>3</strong> + 1 / (<strong>7</strong> + 1 / (<strong>15</strong> + 1 / <strong>1</strong>.0034172310150002))</li>
<li>This can be written as: <strong>3</strong> + 1 / (<strong>7</strong> + 1 / (<strong>15</strong> + 1 / (<strong>1</strong> + 1 / <strong>292</strong>.6345908750162)))</li>
<li>This can be written as: <strong>3</strong> + 1 / (<strong>7</strong> + 1 / (<strong>15</strong> + 1 / (<strong>1</strong> + 1 / (<strong>292</strong> + 1 / <strong>1</strong>.5758184357354204))))</li>
</ul>
<p>This can be stored efficiently in some form of list: PI = [3, 7, 15, 1, 292, 1, ...]. The <a href="http://sun.aei.polsl.pl/~mciura/software/cf.py">&#8220;cf&#8221; module</a> is a Python implementation of continued fractions by Marcin Ciura that uses lazy evaluation. Here&#8217;s the previous example rewritten using continued fractions:</p>
<pre><code>from cf import cf
x = cf(1.0) / cf(3.0)
total = cf(0.0)
for i in range(300):
    total += x
print repr(float(total))    # prints 100.0
print total == 100.0
</code></pre>
<p>The mathematical background is explained on <a href="http://en.wikipedia.org/wiki/Continued_fraction">Wikipedia</a> and in this <a href="http://perl.plover.com/classes/cftalk/">presentation by Mark Jason Dominus</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2011/07/09/continued-fractions-for-representing-real-numbers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Live Desktop Streaming via DLNA on GNU/Linux</title>
		<link>http://realmike.org/blog/2011/02/09/live-desktop-streaming-via-dlna-on-gnulinux/</link>
		<comments>http://realmike.org/blog/2011/02/09/live-desktop-streaming-via-dlna-on-gnulinux/#comments</comments>
		<pubDate>Wed, 09 Feb 2011 20:38:39 +0000</pubDate>
		<dc:creator>Michael Fötsch</dc:creator>
				<category><![CDATA[GNU/Linux]]></category>
		<category><![CDATA[dlna]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[fuse]]></category>
		<category><![CDATA[matroska]]></category>
		<category><![CDATA[mediatomb]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[screencasting]]></category>
		<category><![CDATA[streaming]]></category>

		<guid isPermaLink="false">http://realmike.org/blog/?p=450</guid>
		<description><![CDATA[TWiT and the Ubuntu terminal on a TV set via DLNA Many modern TVs (and set-top boxes, gaming consoles, etc.) support DLNA streaming. Suppose you have a PC that stores all your music, downloaded podcasts, video podcasts, photos, and so on. You can run some DLNA media server software on your PC and stream your <span style="color:#777"> . . . &#8594; <a href="http://realmike.org/blog/2011/02/09/live-desktop-streaming-via-dlna-on-gnulinux/">Read More</a></span>]]></description>
				<content:encoded><![CDATA[<div id="attachment_459" class="wp-caption alignright" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2011/02/twit_on_tv.jpg"><img class="size-medium wp-image-459" title="twit_on_tv" src="http://realmike.org/blog/wp-content/uploads/2011/02/twit_on_tv-300x173.jpg" alt="" width="300" height="173" /></a><p class="wp-caption-text">TWiT and the Ubuntu terminal on a TV set via DLNA</p></div>
<p>Many modern TVs (and set-top boxes, gaming consoles, etc.) support <a href="http://www.dlna.org/digital_living/getting_started/">DLNA</a> streaming. Suppose you have a PC that stores all your music, downloaded podcasts, video podcasts, photos, and so on. You can run some DLNA media server software on your PC and stream your entire media collection to your TV over your home network. No more carrying around USB sticks, it&#8217;s all in your home cloud.</p>
<p>On GNU/Linux, I am using <a href="http://mediatomb.cc/">MediaTomb</a> as my DLNA server. It&#8217;s nothing fancy (it&#8217;s a file server, after all), and it just works.</p>
<p>Okay, this takes care of media files stored on your PC. But can we do more? Is it possible to stream a live capture of your desktop to the TV?</p>
<p>Let&#8217;s say you&#8217;re watching a Flash video in your browser, and there&#8217;s no way to download the video file. Or, you&#8217;re watching a live event being streamed via Flash or whatever. It would be kinda cool to be able to stream that to your TV via DLNA. And it&#8217;s possible—not trivial, mind you, but I&#8217;ve seen it working at least once&#8230;</p>
<p><span id="more-450"></span></p>
<p>The same approach that&#8217;s taken here for live streaming might also be useful for on-the-fly transcoding (e.g., an .ogg file needs to be transcoded to .vob for the player to be able to read it).</p>
<p>(I should mention that something like this isn&#8217;t unheard of. In fact, my Philips TV came with the Windows-only(!) WiFi MediaConnect software to do desktop streaming. I have never seen it in action, because I don&#8217;t use Windows. Of course, Philips based the TV&#8217;s firmware on the Linux kernel, like so many other manufacturers do. But, also unsurprisingly, they use it because it&#8217;s &#8220;free as in beer&#8221;, not because they care about users&#8217; freedom. The fact that I could still get all this to work on GNU/Linux is thanks to the invaluable work of many free software projects: MediaTomb, <a href="http://www.ffmpeg.org/">FFmpeg</a>, <a href="http://www.matroska.org/index.html">Matroska</a>, <a href="http://fuse.sourceforge.net/">FUSE</a>, <a href="http://python.org/">Python</a>, to name just a few.)</p>
<p><strong>What you can find here</strong></p>
<p>I wrote a couple of scripts that allow you to capture your desktop and stream it to a DLNA-capable player. To use the code as-is, your player must support the Matroska (.mkv) file format.</p>
<p>I used the scripts in conjunction with MediaTomb, but other media servers should work just as well.</p>
<p>The scripts are very rough at the edges, and if you are afraid of the command line or of reading Python code, you shouldn&#8217;t attempt to use them.</p>
<p><a href="http://realmike.org/blog/wp-content/uploads/2011/02/dlna_live_streaming.zip">Download the scripts</a></p>
<p>See <a href="#usage">Usage Instructions</a> below.</p>
<p><strong>What is missing / </strong><strong>Invitation to contributors</strong></p>
<p>As I mentioned, the scripts only work with devices that can play Matroska files via DLNA right now. The basic ideas and concepts should apply equally as well to MPEG-2 and other formats. I&#8217;m not sure about MPEG-4 files, though, and I would appreciate feedback from someone more familiar with the format.</p>
<p>The scripts could definitely use better error checking, a nicer command-line interface, and lots of testing. If anyone&#8217;s interested in helping with that, please contact me.</p>
<p><a name="usage"><strong>Usage Instructions</strong></a></p>
<ul>
<li>Make <a href="#dlna_fuse_config">some changes to dlna_fuse.py</a> to specify the temporary file, control the captured display region, set the output format, etc.</li>
<li>Configure MediaTomb, as <a href="#mediatomb_config">described below</a>.</li>
<li>Mount dlna_fuse.py with &#8220;python dlna_fuse.py -s -f fuse_mnt&#8221;. This automatically starts the capture.</li>
<li>At this point, add the file &#8220;fuse_mnt/a/fuse_live.mkv&#8221; to MediaTomb&#8217;s database. (You only have to do this once.)</li>
<li>Start playback.</li>
<li>As this is more of a proof-of-concept than a polished tool, please read on below and let me know if you have any feedback.</li>
</ul>
<p><strong>The Basic Idea</strong></p>
<p>First off, how do we capture the desktop? Someone named Verb3k explains this in &#8220;<a href="http://verb3k.wordpress.com/2010/01/26/how-to-do-proper-screencasts-on-linux/">How to do Proper Screencasts on Linux Using FFmpeg</a>&#8220;. Here&#8217;s an example:</p>
<blockquote><p>ffmpeg -f alsa -ac 2 -i pulse -f x11grab -r 20 -s 1024&#215;576  -i :0.0+128,224 -acodec ac3 -ac 1 -vcodec libx264 -vpre fast -threads 0 -f matroska ~/Videos/capture.mkv</p></blockquote>
<p>This command line takes sound from PulseAudio and screen images from X11 (at 20 fps) and combines them into a Matroska file using the H.264 codec for video and AC3 for audio. It grabs a rectangular area of 1024×576 pixels, 128 pixels from the left edge of the screen and 224 pixels from the top edge of the screen.</p>
<p>Now, what happens when we have MediaTomb serve up the file capture.mkv to the player while the file is still being captured? If you are luckier than I was, it might just work, and you&#8217;re done. Maybe you can find some other combination of video codec, audio codec, and container file format that your player likes better. (Before attempting to do live streaming, you should have ffmpeg convert an existing video file, and finish the conversion before starting playback, in order to find a format that your player understands.)</p>
<p>Starting the playback while the capture was still in progress didn&#8217;t work for me. Instead, when starting playback too soon after capturing had begun, the player would simply tell me the file was unplayable. When waiting for a few minutes, the player would play up until the point where I had started playback. That is, when I started playback after having captured for five minutes, playback would stop after five minutes, even if the captured file contained 10 minutes at that point.</p>
<p>You probably have an idea already why this might have failed. Let&#8217;s take a look at what the file contents might look like while the capture is in progress.</p>
<div id="attachment_456" class="wp-caption aligncenter" style="width: 635px"><a href="http://realmike.org/blog/wp-content/uploads/2011/02/live_file_format.png"><img class="size-full wp-image-456" title="live_file_format" src="http://realmike.org/blog/wp-content/uploads/2011/02/live_file_format.png" alt="" width="625" height="401" /></a><p class="wp-caption-text">A video file after 2 minutes of capturing, after 5 minutes, after it is complete</p></div>
<p>In the figure, you can see a hypothetical file format that stores the data length and video duration in the front, then the video data, then a table containing seek information and other stuff that&#8217;s only known after encoding has finished. This particular encoder seems to update the duration field periodically while encoding is in progress. On the other hand, it leaves the data size field blank until it has finished.</p>
<p>This hypothetical case shows many things that can go wrong when playback starts in the middle of encoding (which, in the case of live streaming, basically means at any time):</p>
<ul>
<li>The player might encounter the &#8220;unknown&#8221; size field and decide that the file is broken.</li>
<li>Or, it takes the &#8220;unknown&#8221; size as an indication that it should seek to the end of the file and determine the size itself (which breaks, because the file has no end yet).</li>
<li>The player might read the duration of 2:00 min. and never look at it again—playback will simply stop after 2:00 min., no matter what happens to the file in the meantime.</li>
<li>The player might know that there&#8217;s supposed to be a seek table, a list of keyframes, a checksum, etc., at the end and fail when it tries to read it.</li>
<li>&lt;endless other complications&gt;</li>
</ul>
<p>The scripts that I wrote try to circumvent these issues in two steps:</p>
<ul>
<li>Modify the file that ffmpeg produces during the capture so that the file appears to be a regular, albeit very, very long, video. Give the player all the information that it needs right away, so that it does not try to seek (through the media server) to various places in the file, searching for the information.
<ul>
<li>For Matroska files, this is what &#8220;matroska_live_filter.py&#8221; does. Hopefully, it will be possible to write filters for other container file formats in the future.</li>
</ul>
</li>
<li>Intercept calls to the filesystem, so that when the player (through the media server) tries to access parts of the file that don&#8217;t exist yet, we can wait for ffmpeg to produce them (if it&#8217;s just video data for 5 seconds in the future, for example), or come up with fake data.
<ul>
<li>This is what &#8220;dlna_fuse.py&#8221; does. This is a virtual (FUSE) filesystem that simply waits for ffmpeg to produce more data when the media server tries to prefetch more data than is available in the file.</li>
</ul>
</li>
</ul>
<p>Matroska is a container format that lends itself well to this kind of interception. Matroska has <a href="http://www.matroska.org/technical/streaming/index.html">streaming support</a> (i.e., it defines what should go into the headers so that players know it&#8217;s a live stream), but unfortunately, my particular player didn&#8217;t care much.</p>
<p>There are other container formats where I&#8217;m not sure that such a thing is possible. In MP4, for example, there are atoms like &#8220;stsz&#8221; (&#8220;sample table sizes&#8221;) and &#8220;stss&#8221; (&#8220;sample table sync samples&#8221;) that seem to go <em>before</em> the video stream and that contain information about the encoded sizes of frames—I&#8217;m not sure there&#8217;s a way to fake this data without waiting for the encoding to finish. If you are familiar with the MP4 or QuickTime formats and have an idea, please leave a comment!</p>
<p><strong>Filtering Matroska for Live Streaming</strong></p>
<p>The <a href="http://www.matroska.org/technical/streaming/index.html">Matroska specification</a> points out that a live stream is designated by setting the &#8220;Segment&#8221; size to &#8220;unknown&#8221;. ffmpeg does this, but it didn&#8217;t convince my TV to treat the file as a live stream. Instead, I ended up simply setting the size to a very large value and setting the duration of the video to 100 hours.</p>
<p>In addition, I suppress &#8220;SeekHead&#8221; elements (pointers to other sections of the file) and &#8220;Cues&#8221; elements (pointers to keyframes for fast-forwarding). This isn&#8217;t strictly necessary; ffmpeg only produces these elements when encoding finishes (which it never does with a live capture). However, this functionality was quite handy when testing out the DLNA streaming with existing .mkv files.</p>
<p>As a final hack, I output 128 KB of &#8220;Void&#8221; data after each &#8220;Cluster&#8221; (which appears to be a block of ~5 seconds of audio/video data). The &#8220;Void&#8221; data doesn&#8217;t serve any purpose other than being able to send data to the player when it requests some. The player pre-fetches data. Without the &#8220;Void&#8221; blocks, there is sometimes not as much data as it requests, because ffmpeg hasn&#8217;t produced it yet. If the requested data can&#8217;t be delivered fast enough, though, the player appears to give up. By producing the &#8220;Void&#8221; data, there is always enough data to satisfy the player, even though the data doesn&#8217;t contain anything useful.—At least, that&#8217;s what I think is happening&#8230;</p>
<p>All this is done in the Python script &#8220;matroska_live_filter.py&#8221;.</p>
<p>Usage:</p>
<ul>
<li><code>python matroska_live_filter.py &lt;mkv-filename&gt;</code> : Output a pretty-printed tree of the Matroska file structure. (Similar to the mkvinfo tool from the <a href="http://www.bunkus.org/videotools/mkvtoolnix/">mkvtoolnix</a> package.)</li>
<li><code>python matroska_live_filter.py -</code> : Read a Matroska file from stdin (maybe the output of ffmpeg writing to stdout). Writes the modified Matroska file to stdout on-the-fly, i.e., it writes data as soon as it becomes available and doesn&#8217;t wait for the input to end.</li>
</ul>
<p>As an example, to filter an ffmpeg-produced live stream, write this:</p>
<blockquote><p>ffmpeg -f alsa -ac 2 -i pulse -f x11grab -r 30 -s 1024&#215;768 -i :0.0 -acodec pcm_s16le -vcodec libx264 -vpre fast -threads 0 &#8211; | python matroska_live_filter.py &#8211; &gt;~/Videos/filtered_live.mkv</p></blockquote>
<p>(I tried to use <a href="http://www.matroska.org/downloads/linux.html">libebml and libmatroska</a> in C++ first. However, documentation was hard to come by, and the code wasn&#8217;t quite self-explanatory. I found a Matroska tag reader written in Python by Johannes Sasongko and built the filter based on that.)</p>
<p><strong>FUSE Filesystem to Fool the Media Server</strong></p>
<p>When you&#8217;re using GNOME, chances are you&#8217;re using FUSE filesystems already. When you use &#8220;Places&#8221;→&#8221;Connect to Server&#8221; to connect to an FTP server, for example, the remote server appears as a local folder in ~/.gvfs. This is a virtual filesystem that uses FUSE.</p>
<p>For the DLNA streaming, I decided to write a FUSE filesystem in Python. This filesystem would appear to MediaTomb as a regular directory containing a Matroska video file. Whenever MediaTomb would access the file or read parts of it, my Python code could intercept these calls and do its magic.</p>
<p>When the filesystem is mounted (i.e., when the Python script is started), the desktop capture is started and redirected to a temporary file. When MediaTomb (or any other program) requests some part of the file, the script can check whether there&#8217;s enough data in the file. If yes, it simply returns the data. If not, it blocks briefly until ffmpeg has written enough data. If the player tries to read too far ahead, this might indicate that the file isn&#8217;t suitable for live streaming yet, and the script will log an error. (This shouldn&#8217;t happen for Matroska files anymore, but it will be useful when trying to add support for more container formats later.)</p>
<p><a name="dlna_fuse_config">The FUSE filesystem is in &#8220;dlna_fuse.py&#8221;. It requires the Python bindings for FUSE (package &#8220;python-fuse&#8221; on Ubuntu).</a></p>
<p>You should make a few changes to the file to adapt it to your needs:</p>
<ul>
<li>Change the variable TEMP_FILE. While ffmpeg captures the desktop, the resulting video is not kept in memory but written to this file. This means, you need some free memory on your hard disk while watching live streams. Of course, the whole purpose of the FUSE filesystem is that the file doesn&#8217;t need to exist physically. At a later point, I will change the FUSE script to keep only a part of the ffmpeg output in memory, and discard older parts once the player has read them. For now, the temporary file is used as a buffer.</li>
<li>Change the ffmpeg command line. The current command corresponds roughly to this:
<ul>
<li><code>MONITOR=$(pactl list | grep -A2 '^Source #' | grep 'Name: .*\.monitor$' | awk '{print $NF}' | tail -n1)</code><br />
<code>parec -d "$MONITOR" | ffmpeg -f s16le -ac 2 -ar 44100 -i - -f x11grab -r 20 -s 1024x576  -i :0.0+128,224 -acodec ac3 -ac 1 -vcodec libx264 -vpre medium -threads 0 -f matroska - | python matroska_live_filter - &gt; ~/Videos/live.mkv</code></li>
</ul>
</li>
<li>Instead of using &#8220;ffmpeg -f alsa -i pulse&#8221;, which produced crackling noises every now and then, I use &#8220;parec&#8221; (PulseAudio recorder) to capture the audio. The part &#8220;-f s16le -ac 2 -ar 44100&#8243; is the format that parec produces (at least for me): 44 kHz, 16-bit stereo. &#8220;-r 20&#8243; instructs ffmpeg to capture at 20 fps. I chose &#8220;-s 1024&#215;576 -i :0.0+128,224&#8243; to capture a 1024-pixel-wide rectangle with an aspect ratio of 16:9 at the center of my screen, which is 1280×1024. You can change this to whatever suits you (as long as your computer can encode it fast enough). &#8220;-acodec ac3 -ac1&#8243; converts the audio to the AC3 codec in mono (the TV had problems with stereo AC3 streams). &#8220;-vcodec libx264 -vpre medium&#8221; uses the &#8220;medium&#8221; profile for the H.264 encoding. &#8220;-vpre&#8221; can also be &#8220;fast&#8221;, &#8220;ultrafast&#8221;, &#8220;lossless_ultrafast&#8221; and lots of others—you need to experiment to find an encoding profile that provides good quality, yet doesn&#8217;t overwhelm your CPU or network.</li>
<li><strong>Note:</strong> If &#8220;parec&#8221; doesn&#8217;t record anything, open the &#8220;PulseAudio Volume Control&#8221; (installed with &#8220;sudo apt-get install pavucontrol&#8221;) and make sure that on the &#8220;Input Devices&#8221; tab, the device named &#8220;Monitor of Internal Audio Analog Stereo&#8221; isn&#8217;t muted.</li>
<li>Make a directory for the mount point and mount the filesystem:
<ul>
<li><code>mkdir fuse_mnt<br />
python dlna_fuse.py -s -f fuse_mnt</code></li>
<li>&#8220;-s&#8221; means single-threaded (just in case my implementation isn&#8217;t entirely thread-safe), &#8220;-f&#8221; means foreground (so that you can see log output on stdout).</li>
</ul>
</li>
<li>To test it, you can point Nautilus (assuming you use GNOME) at the fuse_mnt directory and play the file fuse_live.mkv that you find in there using a player of your choice.</li>
<li>Add the file &#8220;fuse_mnt/a/fuse_live.mkv&#8221; to MediaTomb&#8217;s database.</li>
<li><strong>Note: </strong>I start &#8220;mediatomb&#8221; manually from a terminal, which works just fine. The MediaTomb service that&#8217;s started automatically during boot, on the other hand, can&#8217;t see the file &#8220;fuse_live.mkv&#8221; due to permission problems&#8211;I&#8217;m not sure why.</li>
<li>To stop the FUSE filesystem, run &#8220;sudo umount fuse_mnt&#8221;. If this doesn&#8217;t work, you can also kill the dlna_fuse.py process:
<ul>
<li><code>ps aux | grep dlna_fuse.py<br />
kill -9 &lt;pid&gt;</code></li>
</ul>
</li>
<li>If starting dlna_fuse.py fails with &#8220;bad mount point: Transport endpoint is not connected&#8221;, make sure the process has been killed and run &#8220;sudo umount fuse_mnt&#8221; again.</li>
</ul>
<p><a name="mediatomb_config"><strong>Configuring MediaTomb</strong></a></p>
<div id="attachment_457" class="wp-caption alignright" style="width: 310px"><a href="http://realmike.org/blog/wp-content/uploads/2011/02/mediatomb_add.png"><img class="size-medium wp-image-457" title="mediatomb_add" src="http://realmike.org/blog/wp-content/uploads/2011/02/mediatomb_add-300x175.png" alt="" width="300" height="175" /></a><p class="wp-caption-text">MediaTomb web UI: Ugly but functional. And ugly.</p></div>
<p>Install MediaTomb via your package manager (package &#8220;mediatomb&#8221; in Ubuntu). In Ubuntu, MediaTomb is started automatically as a service. The configuration file is in /etc/mediatomb/config.xml.</p>
<p>I prefer to start mediatomb manually whenever I need it and place the configuration file in ~/.mediatomb/config.xml.</p>
<p>Provided that the config.xml contains &#8220;&lt;ui enabled=&#8221;yes&#8221;&gt;&#8221;, you can open the MediaTomb GUI in a web browser at http://localhost:49152/ (or a subsequent port number). Once the FUSE filesystem is up and running, add the file fuse_mnt/a/fuse_live.mkv to the database. At this point, you should be able to find it and play it back on your DLNA player. (Of course, it can&#8217;t hurt to try a regular file first to check whether it works at all.)</p>
<p><strong>How well does it work?</strong></p>
<p>The image quality is astonishing, even with the &#8220;fast&#8221; encoding profile that I am using. The fonts and window details are very crisp. You have to look very closely to see the typical MPEG compression artifacts, if you can see them at all.</p>
<p>I am not trying to watch HD movies this way. I mostly use this for the <a href="http://live.twit.tv/">TWiT live stream</a> and similar talking-heads programs, so the 20 fps that my PC can deliver are good enough.</p>
<p>The latency is currently whatever it takes me to start the FUSE filesystem (which automatically starts the capture), walk over to the TV, and start playback there. I think if I would delay the capture until the file is actually accessed, I could reduce latency to just a few seconds (although I do think it&#8217;s a good idea to give the encoder a head start). Maybe it will be possible at some point to reduce the latency far enough to be able to remote-control the PC and get feedback almost instantly. We&#8217;ll see.</p>
<p>If the video hangs, try the obvious things: Reduce the frame rate, make the captured screen area smaller, lower the bitrate, etc.</p>
<p><strong>Tested Hardware and Software</strong></p>
<p>I tried all this on Ubuntu 10.10 (Maverick) with MediaTomb 0.12.1, FFmpeg 0.6, and Python 2.6.</p>
<p>The TV is a Philips PFL 7605H/12 with firmware 000.140.025.000. (As far as I can tell, models 8605 and 9705 use the same firmware, so they might work as well.)</p>
<p>If any of you can successfully replicate this on other player models and brands (or even if you can&#8217;t), please leave a comment.</p>
<p><strong>Request for Comments</strong></p>
<p>Again, this is a proof-of-concept. I hope to expand and improve the scripts in the future. If you have any questions, suggestions, or comments, please leave a comment, contact me via <a href="http://identi.ca/mfoetsch">identi.ca/mfoetsch</a>, <a href="http://twitter.com/mfoetsch">twitter.com/mfoetsch</a>, or e-mail.</p>
<p style="text-align: right;"><a href="http://realmike.org/blog/wp-content/uploads/2011/02/live_file_format.odg">Figure source</a></p>
]]></content:encoded>
			<wfw:commentRss>http://realmike.org/blog/2011/02/09/live-desktop-streaming-via-dlna-on-gnulinux/feed/</wfw:commentRss>
		<slash:comments>32</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.657 seconds -->
<!-- Cached page served by WP-Cache -->
