I help my clients acquire new users and make more money with their web businesses. I have ten years of experience with SaaS projects. If that’s something you need help with, we should get in touch!
< Back to article list

Fast Python with multiprocessing

In this post I want to highlight 3 patterns to implement very fast python code:

Processes and queues

You create many processes, and you can communicate their results through a queue.

from multiprocessing import Process, Queue
from random import randint

def do_stuff(queue, table):
    """some irrelevant, long computation"""
    print(f"Doing table {table}")
    result = sum([randint(0, 40) for i in range(1_000_000)])
    # 4) When we are done, we send the result through the queue

def main():
    # 0) The queue will be our communication channel
    queue = Queue()

    # 1) Create our list of processes
    tables = [i for i in range(40)]
    processes = []
    for table in tables:
        process = Process(target=do_stuff, args=(queue, table))
        processes.append((process, queue))

    # 2) we run them all
    out = []
    for process, queue in processes:
        # 3) individual processes block until they terminate
        process.join()  # 
        # 5) We now have a result we can append to a global state
        result = queue.get()

if __name__ == "__main__":

Process pool

You can manage how many processes to create using a process pool.

from multiprocessing import Pool
import multiprocessing
from random import randint

def do_stuff(table):
    """some irrelevant, long computation"""
    print(f"Doing table {table}")
    return sum([randint(0, 40) for i in range(1_000_000)])
def main():
    n_cores = multiprocessing.cpu_count()

    # Generate a list of some input data
    tables = [i for i in range(20)]

    # Create the process pool, with half the availables cores
    with Pool(n_cores//2) as pool:
        # same as map(do_stuff, tables), except
        # it’s using CPU processes and
        # splits the workload (tables) in pieces of size 'chunksize'.
        results = set(, tables, chunksize=1))
    print(f'{len(results)} results:')
if __name__ == '__main__':

Much faster ?

Taking advantage of multiple threads can only get you so far. At some point, you may have to reach for a faster language, at least for the core algorithm that will take most of the execution time. You can have python generate threads for you, and use FFI (fluent foreign interfaces) to communicate with a fast implementation of the algorithm in a fast language.

Then, you take advantage of the best of both worlds:

I wrote about this in another post.

Here is the interesting bit: python for the CLI, rust for the algorithm, FFI to call Rust from Python.

I also wrote a bit before about the methodology I used to make a simple Monte Carlo simulation (for another context) much faster using Rust.