This page of Python warts is inspired by A. M. Kuchling's former popular Python Warts page. It lists some issues that I consider shortcomings of the Python programming language and its standard library. As opposed to the original warts page, I don't limit myself just to "warts", but also missing functionality. Also I focus on shortcomings in the standard library. Unfortunately this is a very inconsistent and sometimes rather "un-pythonic" compilation of functions and classes from various and quite diverse sources. Nonetheless I am hopeful that it is possible to improve the standard library gradually.
I plan to add new warts to this page whenever I come across new ones and remove warts that were either fixed or if I changed my opinion about them. So this page is a constant work-in-progress. Please send any comments to srittau@jroger.in-berlin.de.
len() vs. a length propertyTo get the number of elements in a collection in Python you have to use the
len() function. I would prefer a
length property
on collection classes. You can simulate this behaviour in custom classes
by using the following code:
class Custom(list): length = property(list.__len__) ...
I don't recommend using this approach in actual production code, though.
Stick to idiomatic Python (i.e. len).
Python currently supports two kinds of strings: Unicode strings and
traditional strings. Traditional strings have no defined character set.
These need to go. Actually
Python 3.0 will
unify
these two types and provide an additional
bytes type.
Python 2.4 introduced the concept of function and method decorators. I think class decorators should be added as well. There are several use cases, for example the possibility to flag classes in a special way.
PEP 3129 suggests to introduce class decorators and has been accepted. Python 3.0 will contain them.
range() vs.
xrange()Python does not have a traditional "for" statement that consists of a starting statement, a loop iteration statement, and a condition. Instead it has a for statement that iterates over the elements of an "iterable" object. If you need to iterate over a sequence of numbers, Python offers the built-in range function. range returns a list of integers and can be used to emulate the traditional for statement:
print range(5, 8) # prints "[5, 6, 7]" for i in range(5, 8): pass
xrange provides an alternative. While range always returns a list, xrange returns an "xrange object". This will not construct the list in memory, but will return the next integer from the list with each iteration. Nowadays both are kind of outdated and should be replaced by an implementation that returns a generator object. Actually Python 3.0 will unify xrange and range like this (as outlined in PEP 3100).
Another problem is the API of these two functions: They return an interval
of values, starting with the first (optional) argument inclusive and ending
with the second argument exclusive. So range(0,5)
will return the list [0,1,2,3,4]. While I
understand the original reason for that
("for i
in range(0,5)" emulates the C construct
"for (i=0; i<5;
i++)")
I find it unintuitive and stumble across it regularily. Instead I would
propose an API where the first (optional) argument is still the starting index,
but the second argument is the number of elements. For the most common case,
i.e. where the "start" equal 0 such an API would match the range API. It would
also match the C idiom above. An it is consistent and makes sense.
A python implementation of such a function (let's call it seq()) would look like this:
def seq(start, length=None, step=1): if length is None: length = start start = 0 current = 0 while current < length: yield start + current * step current += 1
If you really need a sequence, you can always use the
list() function:
list(seq(3,4,2)) # yields [3, 5, 7, 9]
Alternatively I would like to see syntactic sugar for ranges of numbers, like this:
for i in 4..7: pass
Many if not most modules in the standard library still use old-style classes. Here is an incomplete list of modules:
unittestSince Python 3.0 does not support old-style classes, these will all be fixed.
Some modules and packages use MixedCase, some use lowercase. The Python naming conventions allow both. This will be addressed in Python 3000. See PEP 3108 for details.
The Python naming conventions allow both MixedCase and lowercase to be used as method names. I would prefer if only one type of cases was used, lowercase. Actually the Style Guide for Python Code (PEP 8) now requires just that.
Module-level function names are a different beast. In my opinion,
function names should be lowercase, except for factory functions. Factory
functions can use CamelCase if they are named as if they were a class.
For example, the threading module contains
the functions Lock and
RLock.
An list of affected modules:
imaplibmodulefinderunittestPyUnit (the unittest module) currently uses
inheritance and reflection to define tests. For example a testing class may
look like this:
class FooTest(unittest.TestCase): def setUp(self): ... def test_foo(self): self.assert_(...)
The unit test module will automatically run all tests that are contained
in a test class in methods that start with test. Each test class
must be derived from unittest.TestCase. This is
similar how JUnit up until
version 3 worked. But JUnit version 4 now uses a decorator syntax as does
NUnit, the xUnit framework for C#. I
think pyUnit should follow suit. The code above could then look like this:
class FooTest(object): @unittest.before def initialize(self): ... @unittest.test def foo(self): unittest.assert_(...)
This would get rid of the use of "magic" method names that have a big
potential for subtle bugs. If has the additional benefit of getting rid of the
ugly mixed case method names
setUp and
tearDown.
PyUnit (the unittest module) contains the
assertEqual method in its
TestCase class. This compares the two parameters
passed in for equality and returns a failure and a meaningful error message
if they differ. This works well, unless you try two compare two long strings
that only differ slightly. Unfortunately it is very hard to find the mismatch,
since the output looks like this:
AssertionError: 'Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut elit. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Maecenas sit amet elit ac urna tincidunt sagittis. Maecenas felis urna, pulvinar nec, porta vulputate, ultrices non, magna. Praesent pretium felis quis est. Maecenas condimentum leo et ante. Suspendisse potenti. Maecenas vel dui at ipsum pretium blandit. Aliquam vitae diam.' != 'Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut elit. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Maecenas sit amet elit ac urna tincidunt sagittis. Maecenas felis urna, pulvinar nec, porta vulputate, ultrices non, magna. Praesent pretium quis est. Maecenas condimentum leo et ante. Suspendisse potenti. Maecenas vel dui at ipsum pretium blandit. Aliquam vitae diam.'
Spot the difference? Well, this is the comparable output from JUnit 3:
junit.framework.ComparisonFailure: expected:<...felis ...> but was:<......>
You see quite clearly that somehow the word "felis" is gone missing. I wish pyUnit had a similar output, with a slight modification: JUnit sometimes has output like this:
junit.framework.ComparisonFailure: expected:<... ...> but was:<......>
This also makes it hard to find the exact place where a space is missing. A little more context would be useful:
AssertionError: expected:<...gen sta...> but was:<...gen sta...>
Other xUnit variants have the ability to "turn off" single tests. This is for example useful if you have a test that you know won't pass for reasons outside of your control. In this case it is better to disable that particular test, but still get it listed in the output. If there are disabled tests you usually get a "yellow bar", instead of the usual "green bar". This indicates that while all tests passed, there were disabled tests. So you are still encouraged to fix these problems.
Both, re.match
and re.search
(as well of their method counterparts on regular expression objects)
return None to indicate that a particular
expression was not found in a string. Otherwise they return a match object.
This leads to the following, common idiom:
m = re.match(...) if m: # read matches from m
I think these functions should either throw an exception or return a null match object if the string doesn't match. The first solution follows the Python idiom "it's easier to ask for forgiveness than permission." It would allow code that is clearer about its intentions and keeps control flow out of the task at hand:
try: m = re.match(...) # read matches from m except re.NotMatchedError: pass
The other solution uses a common (and underused) object-oriented pattern: the Null Object. On the one hand it allows the traditional conditional-checking to be still used, but on the other hand it means that it might be possible to avoid control flow statements completely, by having a sane null match object.
Last update: 2008-07-27