Performant Python

There are many ways to improve Python's performance, such as cython or writing C modules and calling them from Python. These are invasive methods that require a specific setup or build. However, there are a number of ways to improve Python's performance that are non-invasive and require little effort.

Python has been designed to be user-friendly and intuitive and simpler constructs are presented as defaults. Sometimes performance is needed and Python has many performant constructs that provide significant improvements in memory and execution time.

Memory

Python uses a lot of memory but there are alternatives to default data structures that can reduce memory usage.

Counting memory in Python is complicated, there are a number of ways to do it, often producing different results, even if off by few bytes. As general rule, I will try to get the exclusive memory of an object.

I am going to use the objsize library to count the exclusive memory in use by an item; for generators I will use a custom script and sys.getsizeof since objsize does not work with generators.

Tuples

Tuples are immutable lists.

('hello', 'world')

Tuples take a bit less memory than lists. With more elements, the relative memory savings becomes smaller:

Graph of memory occupied by Python tuples and lists

Tuples can also replace dictionaries, infact dict() can produce a dictionary from a tuple, in case one needs that transformation:

t = (('company', 'Strangemachines'), ('city', 'Amsterdam'))
dict(t) # {'company': 'Strangemachines', 'city': 'Amsterdam'}

Tuples provide significant savings when replacing dictionaries:

Graph of memory occupied by Python tuples and dictionaries

Generators

Generators are lazy lists. Instead of computing and putting in memory the entire list, they do so one item at a time and only when the item is requested. Working with generators is more complicated than regular lists, because they can't be indexed or sliced.

g = (x for x in data)

Generators are smaller than large tuples, but bigger than small tuples:

Graph of memory occupied by Python generators and tuples

Slots

Slots are a mechanism that reduces the memory used by class instances. Slots disable setting dynamic properties on instances and they must be declared when defining the class.

class Users:
    __slots__ = ('username', 'email')

    def __init__(self, username, email):
        self.username = username
        self.email = email

Graph of memory occupied by Python class instances and slotted class instances

Comparing the memory for small classes, we can see that a slotted class will use at least 50% less memory. Larger clases have smaller savings, but still around half.

Named tuples

Named tuples provide a way to create objects without declaring a class:

from collections import namedtuple

Point = namedtuple('Point', 'x', 'y')
p = Point(0, 1) # we can access p.x and p.y

Many Python performance articles encourage to use namedtuples for objects on the fly; at Strangemachines we know enough metaprogramming and maths to know better: a namedtuple is smaller than a class, but a slotted class is smaller than a namedtuple.

Graph of memory occupied by namedtuples, classes and slotted classes

Since Namedtuple is used to create inline objects, we'll need to use type to create a slotted class, in order to replace a namedtuple:

def init(self, x, y):
    self.x = x
    self.y = y


Point = type('Point', (), {'__slots__': ('x', 'y'), '__init__': init})

Iteration time

In Python it's often possible to save iteration time by using alternatives to for.

As a note of caution, notice that for in Python is not slow by itself. It becomes slower in cases where there is an optimized alternative for that specific type of iteration.

For example, calling append to add elements to a list will make for slower than map, but the decrease in performance comes from the additional functions calls of the for loop.

Tuples

Iterating lists is slightly faster than tuples.

Scatter graph of time taken to iterate Python tuples and lists

Generators

Iterating generators is much faster than lists.

Scatter graph of time taken to iterate Python lists and generators

Map

Map is a function that applies a function to the elements of an iterable and produces a new iterable with the results of the function calls.

def plus_one(x):
    return x + 1

map(plus_one, [1, 2, 3, 4, 5]) # [2, 3, 4, 5]

With for it would look like this:

result = []

for i in [1, 2, 3, 4, 5]:
    result.append(i+1)

Map is faster than for, but only if the map is not converted to a list (probably because of the extra function call):

Scatter graph of time taken to modify a list by map and for

Filter

Filter is a function that applies a function to the elements of an iterable, producing a new iterable that contains only the element for which the function returned a truthy value.

def even(number):
    if number % 2 == 0:
        return True

filter(even, [1, 2, 3, 4, 5]) # [2, 4]

Again the equivalent with for:

result = []
for i in [1, 2, 3, 4, 5]:
    if i % 2 == 0:
        result.append(i)

Scatter graph of time taken to modify a list by filter and for

List Comprehensions

List comprehensions are mechanism to process an iterable and create a list the same time. Common cases include filtering or applying a function to the elements of the iterable.

[city.name for city in cities]

List comprehensions are slower than map and filter and using generators instead of list comprehension does not change the outcome:

Scatter graph of time taken to modify a list by map and list comprehension

Scatter graph of time taken to modify a list by filter and list comprehension

Even with a lambda function, map is faster:

Scatter graph of time taken to modify a list by map plus lambda and list comprehension

Dictionary comprehensions

Comprehensions also work on dictionaries:

{city.name: city for city in cities}

Dictionary comprehensions are slower than map and filter as well:

Scatter graph of time taken to modify a dictionary by map and dictionary comprehension

Scatter graph of time taken to modify a dictionary by filter and dictionary comprehension

Conclusion

I know it's a lot of graphs, so I'll summarize them:

  • Tuples save a bit of memory when replacing lists
  • Tuples save lots of memory when replacing dictionaries
  • Generators save memory only for large tuples, but are faster to iterate
  • Slotted classes save a lot of memory
  • Namedtuples are better than classes but worse than slotted classes
  • Map and filter are sometimes faster than comprehensions