We love designing and developing websites, but what really drives us is solving problems and cultivating strong relationships with our clients
Django aggregation tutorial
By : shabda
One of the new and most awaited features with Django 1.1 was aggregation. As usual, Django comes with a very comprehensive documentation for this. Here, I have tried to put this in how-to form.
Jump to howtos or Get source on Github.
Essentially, aggregations are nothing but a way to perform an operation on group of rows. In databases,
they are represented by operators as sum, avg etc.
To do these operations Django added two new methods to querysets.
aggregateannotate
When you are have a queryset you can do two operations on it,
- Operate over the rowset to get a single value from it. (Such as sum of all salaries in the rowset)
- Operate over the rowset to get a value for each row in the rowset via some related table.
The thing to notice is that option 1, will create one row from rowset, while option 2 will not change the number of rows in the rowset. If you are into analogies, you can think that option 1 is like a reduce and option 2 is like a map.
In sql terms, aggregate is a operation(SUM, AVG, MIN, MAX), without a group by, while annotate is a operation with a group by on rowset_table.id. (Unless explicitly overriden).
Ok enough talk, on to some actual work. Here is a fictional models.py representing a HRMS application. We will use this to see how to use aggreagtion to solve some common problems.
from django.db import models
class Department(models.Model):
dept_name = models.CharField(max_length = 100)
established_on = models.DateField()
def __unicode__(self):
return self.dept_name
class Level(models.Model):
level_name = models.CharField(max_length = 100)
pay_min = models.PositiveIntegerField()
pay_max = models.PositiveIntegerField()
def __unicode__(self):
return self.level_name
class Employee(models.Model):
emp_name = models.CharField(max_length = 100)
department = models.ForeignKey(Department)
level = models.ForeignKey(Level)
reports_to = models.ForeignKey('self', null=True, blank=True)
pay = models.PositiveIntegerField()
joined_on = models.DateField()
class Leave(models.Model):
employee = models.ForeignKey(Employee)
leave_day = models.DateField()
"""
#Populate DB, so we can do some meaningful queries.
#Create Dept, Levels manually.
#Get the names file from http://dl.getdropbox.com/u/271935/djaggregations/names.pickle
#Or the whole sqlite database from http://dl.getdropbox.com/u/271935/djaggregations/bata.db
import random
from datetime import timedelta, date
import pickle
names = pickle.load(file('/home/shabda/names.pickle'))
for i in range(1000):
emp = Employee()
emp.name = random.choice(names)
emp.department = random.choice(list(Department.objects.all()))
emp.level = random.choice(Level.objects.all())
try: emp.reports_to = random.choice(list(Employee.objects.filter(department=emp.department)))
except:pass
emp.pay = random.randint(emp.level.pay_min, emp.level.pay_max)
emp.joined_on = emp.department.established_on + timedelta(days = random.randint(0, 200))
emp.save()
"""
"""
employees = list(Employees.objects.all())
for i in range(100):
employee = random.choice(employees)
leave = Leave(employee = employee)
leave.leave_day = date.today() - timedelta(days = random.randint(0, 365))
leave.save()
"""
Find the total number of employees.
In sql you might want to do something like,
select count(id) from hrms_employee
Which becomes,
Employee.objects.all().aggregate(total=Count('id'))
If fact doing a connection.queries.pop() shows the exact query.
SELECT COUNT("hrms_employee"."id") AS "total" FROM "hrms_employee"
But wait, we have a builtin method already for that, Employee.objects.all().count(), so lets try something else.
Find the total pay of employees.
The CEO wants to find out what is the total salary expediture, this also converts
the queryset to a single value, so we want to .aggregate here.
Employee.objects.all().aggregate(total_payment=Sum('pay'))
Gives you the total amount you are paying to your employees.
Find the total number of employees, per department.
Here we want a value per row in queryset, so we need to use aggregate here. Also, there would be one aggregated value per dpeartment, so we need to annotate Department queryset.
Department.objects.all().annotate(Count('employee'))
If you are only interested in name of department and employee count for it, you can do,
Department.objects.values('dept_name').annotate(Count('employee'))
The sql is
SELECT "hrms_department"."dept_name", COUNT("hrms_employee"."id") AS "employee__count" FROM "hrms_department" LEFT OUTER JOIN "hrms_employee" ON ("hrms_department"."id" = "hrms_employee"."department_id") GROUP BY "hrms_department"."dept_name"
Find the total number of employees, for a specific department.
Here you could use either of .annotate or .aggregate,
Department.objects.filter(dept_name='Sales').values('dept_name').annotate(Count('employee'))
Department.objects.filter(dept_name='Sales').aggregate(Count('employee'))
If you see the SQLs, you will see that .annotate did a group by, while the .aggregate
did not, but as there was only one row, group by had no effect.
Find the total number of employees, per department, per level
This time, we can not annotate either Department model, or the Level model, as we
need to group by both department and level. So we will annotate on Employee
Employee.objects.values('department__dept_name', 'level__level_name').annotate(Count('id'))
This leads to the sql,
SELECT "hrms_department"."dept_name", "hrms_level"."level_name", COUNT("hrms_employee"."id") AS "id__count" FROM "hrms_employee" INNER JOIN "hrms_department" ON ("hrms_employee"."department_id" = "hrms_department"."id") INNER JOIN "hrms_level" ON ("hrms_employee"."level_id" = "hrms_level"."id") GROUP BY "hrms_department"."dept_name", "hrms_level"."level_name
Which combination of Employee and Deparments employes the most people
We can order on the annotated fields, so the last query is modified to,
Employee.objects.values('department__dept_name', 'level__level_name').annotate(employee_count = Count('id')).order_by('-employee_count')[:1]
Which employee name is the most common.
We can want to group by emp_name, so emp_name is added to values. After that we order on the annotated field
and get the first element, to get the most common name.
Employee.objects.values('emp_name').annotate(name_count=Count('id')).order_by('-name_count')[:1]
This was a overview of how django annotations work. These remove a whole class of queries for which you had to use custom sql queries in the past.
Resources
Want to build a Django app? Talk to us
Comments
I have made a translation to pt_BR, put it here: http://artigos.waltercruz.com/agregacao_django_1_1/
@walter: Thanks very much. I would love to see more translations. All the content here is CC-BY-SA licensed. I should put up a notice for that somewhere.
Hi..Thanks for an article on Django.Presently I am working on a couple of Database projexts and really it was a tough task for me to perform aggregation.Now i found a solution for it..
I have a question, what if I want to get an aggregate that is a result of an arithmetic operation on two columns?
eg. sql would be
select foo*bar from buz
do I have to have a custom sql or is there some way to do this using the api? (some sort of Buz.objects.aggregate('foo*bar') )
In the last query, I think the Count should be with emp_name, not with id.
Great article!
Hi Shabda, thanks for the great tutorial! I have one scenario that i can't seem to solve with one of the examples presented, i posted it on stackoverflow here. Any chance i could persuade you to have a look at it?
Thanks!
Martin
Reactions
Reddit/p: Django aggregation tutorial http://bit.ly/9BzOs
This comment was originally posted on Twitter
Django aggregation tutorial http://su.pr/2VqY8V
This comment was originally posted on Twitter
http://bit.ly/nqIDs New blog post
This comment was originally posted on Twitter
StartupNews: Django aggregation tutorial http://bit.ly/hWS54
This comment was originally posted on Twitter
Django aggregation tutorial http://bit.ly/5Q5Fy
This comment was originally posted on Twitter
@twoodham May I suggest http://bit.ly/5Q5Fy
This comment was originally posted on Twitter
@apgwoz Employee.objects.values(’name’).annotate(Count(’id’)) -> select count(id) from emp group by name http://bit.ly/nqIDs
This comment was originally posted on Twitter
Django aggregation tutorial http://bit.ly/CLtMh django
This comment was originally posted on Twitter
- How to use pep8.py to write better Django code
- Screencast: Django Tutorial Part 1
- How and why to use pyflakes to write better Python
- Getting started with South for Django DB migrations
- A brief overview of Vagrant
- Writing jQuery plugins using Coffeescript
- Behind the Scenes: Request to Response
- Using SQLite Database with Android
- Haml for Django developers
- Coffeescript for Python programmers
- rails
- django
- linkroundup
- django opinion
- opinion
- business
- API
- appengine
- python
- satire
- startup
- Uncategorized
- marketing
- personal
- rambling
- search
- interviews
- seo-interviews
- 5startupideas
- ideas
- seo
- tips
- forms
- paypal
- utilities
- datetime
- web2.0
- Amazon
- algorithms
- presentations
- products
- pinax
- satchmo
- ecommerce
- microsoft
- yahoo
- book
- tutorial
- models
- aggreagtion
- meta
- India
- apps
- about
- CSS
- Design
- wordpress
- test slug
- vim
- urls
- reviews
- javascript
- xmpp
- emacs
- Typography
- Grid Theory
- Color Theory
- iphone
- android
- titanium
- mobile applications
- CSS3
- Browser Compatibility
- mobile
- jobs
- lamson
- django setup
- files
- upload
- jsTree
- hierarchical view
- web page
- Treeview
- coffeescript
- request
- response
- South
- django south
- django migration
- --fake
- screencasts
- February 2012
- January 2012
- December 2011
- October 2011
- September 2011
- July 2011
- June 2011
- April 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- June 2010
- April 2010
- March 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- April 2009
- March 2009
- February 2009
- November 2008
- October 2008
- June 2008
- May 2008
- April 2008
Thank you very much for the detailed overview! I've been looking forward to aggregation because of some challenges in a project towards the beginning of this year. This is definitely a great new feature for Django.