Celery: Get Your Task Together

illustrations illustrations illustrations illustrations illustrations illustrations

Celery: Get Your Task Together

Published on Mar 07, 2016 by Sep Dehpour

Table Of Contents

Why Distributed Task Queues? 

  • Offload long jobs to background processes example: video conversion
  • Offload too many [small] jobs to background processes example: commenting system
  • Keep track of jobs, monitor, auto-restart
  • Schedule jobs
  • Replace cron jobs. Cron jobs can run maximum once a minute. That is a limitation. And they don’t have the best interface.

Message Queue vs. Task Queue 

  • Message Queue are the basic functionality of passing, holding, and delivering messages Example: Redis, RabbitMQ
  • Tasks Queue manage work to be done and is considered a type of message queue Example: Celery

Distributed Task Queus 

in Python

Based on Popularity 

Solution Stars on git Downloads/mo
Celery 4600 400,000
RQ (Redis Queue) 2600 40,000
⤿ Django RQ 428 13,000
Huey 824 3,000
MrQ (Mr. Q) 340 5,000
Taskmaster 346 1,000

Who uses them? Everyone. 

Solution User
Celery Instagram, Mozilla, Truecar
RQ (Redis Queue) ?
⤿ Django RQ ?
Huey ?
MrQ (Mr. Q) Pricing Assistant (creator)
Taskmaster Disqus (creator)

Celery Architecture 

Celery Architecture Image from parallel programming in Python book

RQ Architecture 

Client ⥂ Redis ⥄ Worker

Celery Task Example 

from celery import Celery
app = Celery('tasks', broker='amqp://guest@localhost//')
@app.task
def add(x, y):
    import time; sleep(5*60)
    return x + y

------------------

>>> from tasks import add
>>> result = add.delay(4, 4)
>>> result.ready()
False
5 minutes later:
>>> result.ready()
True

RQ Task example 

from rq import Queue
from redis import Redis
from somewhere import count_words_at_url

# Tell RQ what Redis connection to use
redis_conn = Redis()
# no args implies the default queue
q = Queue(connection=redis_conn)

# Delay execution of count_words_at_url('http://nvie.com')
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job.result   # => None

# Now, wait a while, until the worker is finished
time.sleep(2)
print job.result   # => 889

Monitoring 

  • Celery - Flower
  • RQ - Dashboard

Celery monitoring: Flower 

Workers Flower monitoring for Celery

Tasks Flower monitoring for Celery. Tasks.

CPU usage Flower monitoring for Celery. CPU usage.

RQ Monitoring: RQ Dashboard 

RQ monitoring RQ monitoring

MRQ Monitoring: MRQ Dashboard 

MRQ monitoring MRQ monitoring

Celery vs. RQ - overview 

Celery RQ
Complexity of code Very complicated Easy to understand
Documentation Take a while to read Simple
Monitoring Flower RQ Dashboard
message brokers RabbitMQ, Redis, MongoDB Redis
result backends RabbitMQ, Redis, Memcached, MongoDB, Cassandra,… Redis
Concurrency Master-Slave processes supervisord() + fork
Scheduler Celerybeats 3rd party
Language Can send tasks from one language to another Only Python
Subtasks Can create tasks within tasks Nope
Django support Built-in Django-rq

Why use RQ? 

It all comes down to simplicity. and… MEMORY LEAK

Redis Memory Leak 

Redis Memory Leak

  • Celery has Memory Leak issues
  • Some memory leak can happen with older broker libraries. i.e. librabbitmq
  • Celery monitor (Flower) has huge memory leak.
  • RQ offers less but its memory leak should be much smaller than Celery (not verified it myself)

Why use Celery? 

Use Celery when:

  • RQ limits you to use Redis both as message broker and result backend. If you need another broker/result backend.
  • Redis can drop messages. But it will pick it up later. If that bothers you, you can’t use RQ.
  • Celery is way more feature rich and flexible than RQ.
  • If you don’t ever really need to know the magic behind the scene in Celery.

Complexity 

How complex is RQ:

RQ complexity

How complex is Celery:

Celery complexity

And in case you were wondering…

How complex is Django:

Django complexity

So if you can understand Django source code, you should be able to understand Celery’s too.