A Question Answer System for Python
In Python, variables that are only referenced inside a function are implicitly global. If a variable is assigned a value anywhere within the function's body, it's assumed to be a local unless explicitly declared as global.
Though a bit surprising at first, a moment's consideration explains this. On one hand, requiring global
for assigned variables provides a bar against unintended side-effects. On the other hand, if global
was required for all global references, you'd be using global
all the time. You'd have to declare as global every reference to a built-in function or to a component of an imported module. This clutter would defeat the usefulness of the global
declaration for identifying side-effects.
It can be a surprise to get the UnboundLocalError in previously working code when it is modified by adding an assignment statement somewhere in the body of a function.
This code:
>>> x = 10 >>> def bar(): ... print(x) >>> bar() 10
works, but this code:
>>> x = 10 >>> def foo(): ... print(x) ... x += 1
results in an UnboundLocalError:
>>> foo() Traceback (most recent call last): ... UnboundLocalError: local variable ''x'' referenced before assignment
This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope. Since the last statement in foo assigns a new value to x
, the compiler recognizes it as a local variable. Consequently when the earlier print(x)
attempts to print the uninitialized local variable and an error results.
In the example above you can access the outer scope variable by declaring it global:
>>> x = 10 >>> def foobar(): ... global x ... print(x) ... x += 1 >>> foobar() 10
This explicit declaration is required in order to remind you that (unlike the superficially analogous situation with class and instance variables) you are actually modifying the value of the variable in the outer scope:
>>> print(x) 11
You can do a similar thing in a nested scope using the nonlocal
keyword:
>>> def foo(): ... x = 10 ... def bar(): ... nonlocal x ... print(x) ... x += 1 ... bar() ... print(x) >>> foo() 10 11
Python sequences are indexed with positive numbers and negative numbers. For positive numbers 0 is the first index 1 is the second index and so forth. For negative indices -1 is the last index and -2 is the penultimate (next to last) index and so forth. Think of seq[-n]
as the same as seq[len(seq)-n]
.
Using negative indices can be very convenient. For example S[:-1]
is all of the string except for its last character, which is useful for removing the trailing newline from a string.
Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants, on the Mac, and on Windows 2000 and later.
To find out more, start with The Python Tutorial. The Beginner's Guide to Python links to other introductory tutorials and resources for learning Python.
The Python Software Foundation is an independent non-profit organization that holds the copyright on Python versions 2.1 and newer. The PSF's mission is to advance open source technology related to the Python programming language and to publicize the use of Python. The PSF's home page is at https://www.python.org/psf/.
Donations to the PSF are tax-exempt in the US. If you use Python and find it helpful, please contribute via the PSF donation page.
A class is the particular object type created by executing a class statement. Class objects are used as templates to create instance objects, which embody both the data (attributes) and code (methods) specific to a datatype.
A class can be based on one or more other classes, called its base class(es). It then inherits the attributes and methods of its base classes. This allows an object model to be successively refined by inheritance. You might have a generic Mailbox
class that provides basic accessor methods for a mailbox, and subclasses such as MboxMailbox
, MaildirMailbox
, OutlookMailbox
that handle various specific mailbox formats.
Python is a high-level general-purpose programming language that can be applied to many different classes of problems.
The language comes with a large standard library that covers areas such as string processing (regular expressions, Unicode, calculating differences between files), Internet protocols (HTTP, FTP, SMTP, XML-RPC, POP, IMAP, CGI programming), software engineering (unit testing, logging, profiling, parsing Python code), and operating system interfaces (system calls, filesystems, TCP/IP sockets). Look at the table of contents for The Python Standard Library to get an idea of what's available. A wide variety of third-party extensions are also available. Consult the Python Package Index to find packages of interest to you.
Delegation is an object oriented technique (also called a design pattern). Let's say you have an object x
and want to change the behaviour of just one of its methods. You can create a new class that provides a new implementation of the method you're interested in changing and delegates all other methods to the corresponding method of x
.
Python programmers can easily implement delegation. For example, the following class implements a class that behaves like a file but converts all written data to uppercase:
class UpperOut: def __init__(self, outfile): self._outfile = outfile def write(self, s): self._outfile.write(s.upper()) def __getattr__(self, name): return getattr(self._outfile, name)
Here the UpperOut
class redefines the write()
method to convert the argument string to uppercase before calling the underlying self.__outfile.write()
method. All other methods are delegated to the underlying self.__outfile
object. The delegation is accomplished via the __getattr__
method; consult the language reference for more information about controlling attribute access.
Note that for more general cases delegation can get trickier. When attributes must be set as well as retrieved, the class must define a __setattr__()
method too, and it must do so carefully. The basic implementation of __setattr__()
is roughly equivalent to the following:
class X: ... def __setattr__(self, name, value): self.__dict__[name] = value ...
Most __setattr__()
implementations must modify self.__dict__
to store local state for self without causing an infinite recursion.
A method is a function on some object x
that you normally call as x.name(arguments...)
. Methods are defined as functions inside the class definition:
class C: def meth(self, arg): return arg * 2 + self.attribute
Parameters are defined by the names that appear in a function definition, whereas arguments are the values actually passed to a function when calling it. Parameters define what types of arguments a function can accept. For example, given the function definition:
def func(foo, bar=None, **kwargs): pass
foo, bar and kwargs are parameters of func
. However, when calling func
, for example:
func(42, bar=314, extra=somevar)
the values 42
, 314
, and somevar
are arguments.
Python is a programming language. It's used for many different applications. It's used in some high schools and colleges as an introductory programming language because Python is easy to learn, but it's also used by professional software developers at places such as Google, NASA, and Lucasfilm Ltd.
If you wish to learn more about Python, start with the Beginner's Guide to Python.
Comma is not an operator in Python. Consider this session:
>>> "a" in "b", "a" (False, ''a'')
Since the comma is not an operator, but a separator between expressions the above is evaluated as if you had entered:
("a" in "b"), "a"
not:
"a" in ("b", "a")
The same is true of the various assignment operators (=
, +=
etc). They are not truly operators but syntactic delimiters in assignment statements.
The Python project's infrastructure is located all over the world. www.python.org is graciously hosted by Rackspace, with CDN caching provided by Fastly. Upfront Systems hosts bugs.python.org. Many other Python services like the Wiki are hosted by Oregon State University Open Source Lab.
The canonical way to share information across modules within a single program is to create a special module (often called config or cfg). Just import the config module in all modules of your application; the module then becomes available as a global name. Because there is only one instance of each module, any changes made to the module object get reflected everywhere. For example:
config.py:
x = 0 # Default value of the ''x'' configuration setting
mod.py:
import config config.x = 1
main.py:
import config import mod print(config.x)
Note that using a module is also the basis for implementing the Singleton design pattern, for the same reason.
Yes, there is. The syntax is as follows:
[on_true] if [expression] else [on_false] x, y = 50, 25 small = x if x < y else y
Before this syntax was introduced in Python 2.5, a common idiom was to use logical operators:
[expression] and [on_true] or [on_false]
However, this idiom is unsafe, as it can give wrong results when on_true has a false boolean value. Therefore, it is always better to use the ... if ... else ...
form.
If you can't find a source file for a module it may be a built-in or dynamically loaded module implemented in C, C++ or other compiled language. In this case you may not have the source file or it may be something like mathmodule.c
, somewhere in a C source directory (not on the Python Path).
There are (at least) three kinds of modules in Python:
modules written in Python (.py);
modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc);
modules written in C and linked with the interpreter; to get a list of these, type:
import sys print(sys.builtin_module_names)
For Unix variants: The standard Python source distribution comes with a curses module in the Modules subdirectory, though it's not compiled by default. (Note that this is not available in the Windows distribution â there is no curses module for Windows.)
The curses
module supports basic curses features as well as many additional functions from ncurses and SYSV curses such as colour, alternative character set support, pads, and mouse support. This means the module isn't compatible with operating systems that only have BSD curses, but there don't seem to be any currently maintained OSes that fall into this category.
For Windows: use the consolelib module.
The atexit
module provides a register function that is similar to C's onexit()
.
Yes.
The pdb module is a simple but adequate console-mode debugger for Python. It is part of the standard Python library, and is documented in the Library Reference Manual
. You can also write your own debugger by using the code for pdb as an example.
The IDLE interactive development environment, which is part of the standard Python distribution (normally available as Tools/scripts/idle), includes a graphical debugger.
PythonWin is a Python IDE that includes a GUI debugger based on pdb. The Pythonwin debugger colors breakpoints and has quite a few cool features such as debugging non-Pythonwin programs. Pythonwin is available as part of the Python for Windows Extensions project and as a part of the ActivePython distribution (see https://www.activestate.com/activepython).
Boa Constructor is an IDE and GUI builder that uses wxWidgets. It offers visual frame creation and manipulation, an object inspector, many views on the source like object browsers, inheritance hierarchies, doc string generated html documentation, an advanced debugger, integrated help, and Zope support.
Eric is an IDE built on PyQt and the Scintilla editing component.
Pydb is a version of the standard Python debugger pdb, modified for use with DDD (Data Display Debugger), a popular graphical debugger front end. Pydb can be found at http://bashdb.sourceforge.net/pydb/ and DDD can be found at https://www.gnu.org/software/ddd.
There are a number of commercial Python IDEs that include graphical debuggers. They include:
See the chapters titled Internet Protocols and Support and Internet Data Handling in the Library Reference Manual. Python has many modules that will help you build server-side and client-side web systems.
A summary of available frameworks is maintained by Paul Boddie at https://wiki.python.org/moin/WebProgramming.
Cameron Laird maintains a useful set of pages about Python web technologies at http://phaseit.net/claird/comp.lang.python/web_python.
Very stable. New, stable releases have been coming out roughly every 6 to 18 months since 1991, and this seems likely to continue. Currently there are usually around 18 months between major releases.
The developers issue "bugfix" releases of older versions, so the stability of existing releases gradually improves. Bugfix releases, indicated by a third component of the version number (e.g. 2.5.3, 2.6.2), are managed for stability; only fixes for known problems are included in a bugfix release, and it's guaranteed that interfaces will remain the same throughout a series of bugfix releases.
The latest stable releases can always be found on the Python download page. There are two recommended production-ready versions at this point in time, because at the moment there are two branches of stable releases: 2.x and 3.x. Python 3.x may be less useful than 2.x, since currently there is more third party software available for Python 2 than for Python 3. Python 2 code will generally not run unchanged in Python 3.
Users are often surprised by results like this:
>>> 1.2 - 1.0 0.19999999999999996
and think it is a bug in Python. It's not. This has little to do with Python, and much more to do with how the underlying platform handles floating-point numbers.
The float
type in CPython uses a C double
for storage. A float
object's value is stored in binary floating-point with a fixed precision (typically 53 bits) and Python uses C operations, which in turn rely on the hardware implementation in the processor, to perform floating-point operations. This means that as far as floating-point operations are concerned, Python behaves like many popular languages including C and Java.
Many numbers that can be written easily in decimal notation cannot be expressed exactly in binary floating-point. For example, after:
>>> x = 1.2
the value stored for x
is a (very good) approximation to the decimal value 1.2
, but is not exactly equal to it. On a typical machine, the actual stored value is:
1.0011001100110011001100110011001100110011001100110011 (binary)
which is exactly:
1.1999999999999999555910790149937383830547332763671875 (decimal)
The typical precision of 53 bits provides Python floats with 15-16 decimal digits of accuracy.
For a fuller explanation, please see the floating point arithmetic chapter in the Python tutorial.
There are several advantages.
One is performance: knowing that a string is immutable means we can allocate space for it at creation time, and the storage requirements are fixed and unchanging. This is also one of the reasons for the distinction between tuples and lists.
Another advantage is that strings in Python are considered as "elemental" as numbers. No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string "eight" to anything else.
A try/except block is extremely efficient if no exceptions are raised. Actually catching an exception is expensive. In versions of Python prior to 2.0 it was common to use this idiom:
try: value = mydict[key] except KeyError: mydict[key] = getvalue(key) value = mydict[key]
This only made sense when you expected the dict to have the key almost all the time. If that wasn't the case, you coded it like this:
if key in mydict: value = mydict[key] else: value = mydict[key] = getvalue(key)
For this specific case, you could also use value = dict.setdefault(key, getvalue(key))
, but only if the getvalue()
call is cheap enough because it is evaluated in all cases.
Not as such.
For simple input parsing, the easiest approach is usually to split the line into whitespace-delimited words using the split()
method of string objects and then convert decimal strings to numeric values using int()
or float()
. split()
supports an optional "sep" parameter which is useful if the line uses something other than whitespace as a separator.
For more complicated input parsing, regular expressions are more powerful than C's sscanf()
and better suited for the task.
You can do this easily enough with a sequence of if... elif... elif... else
. There have been some proposals for switch statement syntax, but there is no consensus (yet) on whether and how to do range tests. See PEP 275 for complete details and the current status.
For cases where you need to choose from a very large number of possibilities, you can create a dictionary mapping case values to functions to call. For example:
def function_1(...): ... functions = {''a'': function_1, ''b'': function_2, ''c'': self.method_1, ...} func = functions[value] func()
For calling methods on objects, you can simplify yet further by using the getattr()
built-in to retrieve methods with a particular name:
def visit_a(self, ...): ... ... def dispatch(self, value): method_name = ''visit_'' + str(value) method = getattr(self, method_name) method()
It's suggested that you use a prefix for the method names, such as visit_
in this example. Without such a prefix, if values are coming from an untrusted source, an attacker would be able to call any method on your object.
Yes, .pyd files are dll's, but there are a few differences. If you have a DLL named foo.pyd
, then it must have a function PyInit_foo()
. You can then write Python "import foo", and Python will search for foo.pyd (as well as foo.py, foo.pyc) and if it finds it, will attempt to call PyInit_foo()
to initialize it. You do not link your .exe with foo.lib, as that would cause Windows to require the DLL to be present.
Note that the search path for foo.pyd is PYTHONPATH, not the same as the path that Windows uses to search for foo.dll. Also, foo.pyd need not be present to run your program, whereas if you linked your program with a dll, the dll is required. Of course, foo.pyd is required if you want to say import foo
. In a DLL, linkage is declared in the source code with __declspec(dllexport)
. In a .pyd, linkage is defined in a list of available functions.
Lists and tuples, while similar in many respects, are generally used in fundamentally different ways. Tuples can be thought of as being similar to Pascal records or C structs; they're small collections of related data which may be of different types which are operated on as a group. For example, a Cartesian coordinate is appropriately represented as a tuple of two or three numbers.
Lists, on the other hand, are more like arrays in other languages. They tend to hold a varying number of objects all of which have the same type and which are operated on one-by-one. For example, os.listdir(''.'')
returns a list of strings representing the files in the current directory. Functions which operate on this output would generally not break if you added another file or two to the directory.
Tuples are immutable, meaning that once a tuple has been created, you can't replace any of its elements with a new value. Lists are mutable, meaning that you can always change a list's elements. Only immutable elements can be used as dictionary keys, and hence only tuples and not lists can be used as keys.
You can use exceptions to provide a "structured goto" that even works across function calls. Many feel that exceptions can conveniently emulate all reasonable uses of the "go" or "goto" constructs of C, Fortran, and other languages. For example:
class label(Exception): pass # declare a label try: ... if condition: raise label() # goto label ... except label: # where to goto pass ...
This doesn't allow you to jump into the middle of a loop, but that's usually considered an abuse of goto anyway. Use sparingly.
A global interpreter lock (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via sys.setswitchinterval()
. Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.
In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that "look atomic" really are.
For example, the following operations are all atomic (L, L1, L2 are lists, D, D1, D2 are dicts, x, y are objects, i, j are ints):
L.append(x) L1.extend(L2) x = L[i] x = L.pop() L1[i:j] = L2 L.sort() x = y x.field = y D[x] = y D1.update(D2) D.keys()
These aren't:
i = i+1 L.append(L[-1]) L[i] = L[j] D[x] = D[x] + 1
Operations that replace other objects may invoke those other objects' __del__()
method when their reference count reaches zero, and that can affect things. This is especially true for the mass updates to dictionaries and lists. When in doubt, use a mutex!
The global interpreter lock (GIL) is often seen as a hindrance to Python's deployment on high-end multiprocessor server machines, because a multi-threaded Python program effectively only uses one CPU, due to the insistence that (almost) all Python code can only run while the GIL is held.
Back in the days of Python 1.5, Greg Stein actually implemented a comprehensive patch set (the "free threading" patches) that removed the GIL and replaced it with fine-grained locking. Adam Olsen recently did a similar experiment in his python-safethread project. Unfortunately, both experiments exhibited a sharp drop in single-thread performance (at least 30% slower), due to the amount of fine-grained locking necessary to compensate for the removal of the GIL.
This doesn't mean that you can't make good use of Python on multi-CPU machines! You just have to be creative with dividing the work up between multiple processes rather than multiple threads. The ProcessPoolExecutor
class in the new concurrent.futures
module provides an easy way of doing so; the multiprocessing
module provides a lower-level API in case you want more control over dispatching of tasks.
Judicious use of C extensions will also help; if you use a C extension to perform a time-consuming task, the extension can release the GIL while the thread of execution is in the C code and allow other threads to get some work done. Some standard library modules such as zlib
and hashlib
already do this.
It has been suggested that the GIL should be a per-interpreter-state lock rather than truly global; interpreters then wouldn't be able to share objects. Unfortunately, this isn't likely to happen either. It would be a tremendous amount of work, because many object implementations currently have global state. For example, small integers and short strings are cached; these caches would have to be moved to the interpreter state. Other object types have their own free list; these free lists would have to be moved to the interpreter state. And so on.
And I doubt that it can even be done in finite time, because the same problem exists for 3rd party extensions. It is likely that 3rd party extensions are being written at a faster rate than you can convert them to store all their global state in the interpreter state.
And finally, once you have multiple interpreters not sharing any state, what have you gained over running each interpreter in a separate process?
There are a number of alternatives to writing your own C extensions, depending on what you're trying to do.
Cython and its relative Pyrex are compilers that accept a slightly modified form of Python and generate the corresponding C code. Cython and Pyrex make it possible to write an extension without having to learn Python's C API.
If you need to interface to some C or C++ library for which no Python extension currently exists, you can try wrapping the library's data types and functions with a tool such as SWIG. SIP, CXX Boost, or Weave are also alternatives for wrapping C++ libraries.
In general, don't use from modulename import *
. Doing so clutters the importer's namespace, and makes it much harder for linters to detect undefined names.
Import modules at the top of a file. Doing so makes it clear what other modules your code requires and avoids questions of whether the module name is in scope. Using one import per line makes it easy to add and delete module imports, but using multiple imports per line uses less screen space.
It's good practice if you import modules in the following order:
sys
, os
, getopt
, re
It is sometimes necessary to move imports to a function or class to avoid problems with circular imports. Gordon McMillan says:
Circular imports are fine where both modules use the "import <module>" form of import. They fail when the 2nd module wants to grab a name out of the first ("from module import name") and the import is at the top level. That's because names in the 1st are not yet available, because the first module is busy importing the 2nd.
In this case, if the second module is only used in one function, then the import can easily be moved into that function. By the time the import is called, the first module will have finished initializing, and the second module can do its import.
It may also be necessary to move imports out of the top level of code if some of the modules are platform-specific. In that case, it may not even be possible to import all of the modules at the top of the file. In this case, importing the correct modules in the corresponding platform-specific code is a good option.
Only move imports into a local scope, such as inside a function definition, if it's necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in sys.modules
.
Self is merely a conventional name for the first argument of a method. A method defined as meth(self, a, b, c)
should be called as x.meth(a, b, c)
for some instance x
of the class in which the definition occurs; the called method will think it is called as meth(x, a, b, c)
.
See also Why must 'self' be used explicitly in method definitions and calls?.
str
and bytes
objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length.
To accumulate many str
objects, the recommended idiom is to place them into a list and call str.join()
at the end:
chunks = [] for s in my_strings: chunks.append(s) result = ''''.join(chunks)
(another reasonably efficient idiom is to use io.StringIO
)
To accumulate many bytes
objects, the recommended idiom is to extend a bytearray
object using in-place concatenation (the +=
operator):
result = bytearray() for b in my_bytes_objects: result += b