您现在的位置是:网站首页> 编程资料编程资料

Django中QuerySet查询优化之prefetch_related详解_python_

2023-05-25 355人已围观

简介 Django中QuerySet查询优化之prefetch_related详解_python_

前言

在数据库有外键的时候,使用 select_related() 和 prefetch_related() 可以很好的减少数据库请求的次数,从而提高性能。本文通过一个简单的例子详解这两个函数的作用。虽然QuerySet的文档中已经详细说明了,但本文试图从QuerySet触发的SQL语句来分析工作方式,从而进一步了解Django具体的运作方式。

实例背景

假定一个个人信息系统,需要记录系统中各个人的故乡、居住地、以及到过的城市。数据库设计如下:

models.py 内容:

from django.db import models class Province(models.Model): name = models.CharField(max_length=10) def __unicode__(self): return self.name class City(models.Model): name = models.CharField(max_length=5) province = models.ForeignKey(Province) def __unicode__(self): return self.name class Person(models.Model): firstname = models.CharField(max_length=10) lastname = models.CharField(max_length=10) visitation = models.ManyToManyField(City, related\_name = "visitor") hometown = models.ForeignKey(City, related\_name = "birth") living = models.ForeignKey(City, related\_name = "citizen") def __unicode__(self): return self.firstname + self.lastname 

PS:

注1:创建的app名为“QSOptimize”

注2:为了简化起见,qsoptimize_province 表中只有2条数据:湖北省和广东省,qsoptimize_city表中只有三条数据:武汉市、十堰市和广州市  

prefetch_related()

对于多对多字段(ManyToManyField)和一对多字段,可以使用prefetch_related()来进行优化。或许你会说,没有一个叫OneToManyField的东西啊。实际上 ,ForeignKey就是一个多对一的字段,而被ForeignKey关联的字段就是一对多字段了。

作用和方法

prefetch_related()和select_related()的设计目的很相似,都是为了减少SQL查询的数量,但是实现的方式不一样。后者是通过JOIN语句,在SQL查询内解决问题。但是对于多对多关系,使用SQL语句解决就显得有些不太明智,因为JOIN得到的表将会很长,会导致SQL语句运行时间的增加和内存占用的增加。若有n个对象,每个对象的多对多字段对应Mi条,就会生成Σ(n)Mi 行的结果表。

prefetch_related()的解决方法是,分别查询每个表,然后用Python处理他们之间的关系。继续以上边的例子进行说明,如果我们要获得张三所有去过的城市,使用prefetch_related()应该是这么做:

>>> zhangs = Person.objects.prefetch_related('visitation').get(firstname=u"张",lastname=u"三") >>> for city in zhangs.visitation.all() : ... print city ... 

上述代码触发的SQL查询如下:

SELECT `QSOptimize_person`.`id`, `QSOptimize_person`.`firstname`, `QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`  FROM `QSOptimize_person`  WHERE (`QSOptimize_person`.`lastname` = '三'  AND `QSOptimize_person`.`firstname` = '张');  SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`id`,  `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`  FROM `QSOptimize_city`  INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE `QSOptimize_person_visitation`.`person_id` IN (1);

第一条SQL查询仅仅是获取张三的Person对象,第二条比较关键,它选取关系表QSOptimize_person_visitation中person_id为张三的行,然后和city表内联(INNER JOIN 也叫等值连接)得到结果表。

+----+-----------+----------+-------------+-----------+
| id | firstname | lastname | hometown_id | living_id |
+----+-----------+----------+-------------+-----------+
|  1 | 张        | 三       |           3 |         1 |
+----+-----------+----------+-------------+-----------+
1 row in set (0.00 sec)

+-----------------------+----+-----------+-------------+ | _prefetch_related_val | id | name | province_id | +-----------------------+----+-----------+-------------+ | 1 | 1 | 武汉市 | 1 | | 1 | 2 | 广州市 | 2 | | 1 | 3 | 十堰市 | 1 | +-----------------------+----+-----------+-------------+ 3 rows in set (0.00 sec) 

显然张三武汉、广州、十堰都去过。

又或者,我们要获得湖北的所有城市名,可以这样:

>>> hb = Province.objects.prefetch_related('city_set').get(name__iexact=u"湖北省") >>> for city in hb.city_set.all(): ... city.name ... 

触发的SQL查询:

SELECT `QSOptimize_province`.`id`, `QSOptimize_province`.`name`  FROM `QSOptimize_province`  WHERE `QSOptimize_province`.`name` LIKE '湖北省' ; SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`  FROM `QSOptimize_city`  WHERE `QSOptimize_city`.`province_id` IN (1);

得到的表:

+----+-----------+ | id | name      | +----+-----------+ |  1 | 湖北省    | +----+-----------+ 1 row in set (0.00 sec) +----+-----------+-------------+ | id | name      | province_id | +----+-----------+-------------+ |  1 | 武汉市    |           1 | |  3 | 十堰市    |           1 | +----+-----------+-------------+ 2 rows in set (0.00 sec)

使用方法

*lookups 参数

prefetch_related()在Django < 1.7 只有这一种用法。和select_related()一样,prefetch_related()也支持深度查询,例如要获得所有姓张的人去过的省:

>>> zhangs = Person.objects.prefetch_related('visitation__province').filter(firstname__iexact=u'张') >>> for i in zhangs: ... for city in i.visitation.all(): ... print city.province ... 

触发的SQL:

SELECT `QSOptimize_person`.`id`, `QSOptimize_person`.`firstname`,  `QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`  FROM `QSOptimize_person`  WHERE `QSOptimize_person`.`firstname` LIKE '张' ; SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city`  INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE `QSOptimize_person_visitation`.`person_id` IN (1, 4); SELECT `QSOptimize_province`.`id`, `QSOptimize_province`.`name`  FROM `QSOptimize_province`  WHERE `QSOptimize_province`.`id` IN (1, 2);

获得的结果:

+----+-----------+----------+-------------+-----------+ | id | firstname | lastname | hometown_id | living_id | +----+-----------+----------+-------------+-----------+ |  1 | 张        | 三       |           3 |         1 | |  4 | 张        | 六       |           2 |         2 | +----+-----------+----------+-------------+-----------+ 2 rows in set (0.00 sec) +-----------------------+----+-----------+-------------+ | _prefetch_related_val | id | name      | province_id | +-----------------------+----+-----------+-------------+ |                     1 |  1 | 武汉市    |           1 | |                     1 |  2 | 广州市    |           2 | |                     4 |  2 | 广州市    |           2 | |                     1 |  3 | 十堰市    |           1 | +-----------------------+----+-----------+-------------+ 4 rows in set (0.00 sec) +----+-----------+ | id | name      | +----+-----------+ |  1 | 湖北省    | |  2 | 广东省    | +----+-----------+ 2 rows in set (0.00 sec)

值得一提的是,链式prefetch_related会将这些查询添加起来,就像1.7中的select_related那样。

要注意的是,在使用QuerySet的时候,一旦在链式操作中改变了数据库请求,之前用prefetch_related缓存的数据将会被忽略掉。这会导致Django重新请求数据库来获得相应的数据,从而造成性能问题。这里提到的改变数据库请求指各种filter()、exclude()等等最终会改变SQL代码的操作。而all()并不会改变最终的数据库请求,因此是不会导致重新请求数据库的。

举个例子,要获取所有人访问过的城市中带有“市”字的城市,这样做会导致大量的SQL查询:

plist = Person.objects.prefetch_related('visitation') [p.visitation.filter(name__icontains=u"市") for p in plist] 

因为数据库中有4人,导致了2+4次SQL查询:

SELECT `QSOptimize_person`.`id`, `QSOptimize_person`.`firstname`, `QSOptimize_person`.`lastname`,  `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`  FROM `QSOptimize_person`; SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`  FROM `QSOptimize_city`  INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`) WHERE `QSOptimize_person_visitation`.`person_id` IN (1, 2, 3, 4); SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`  FROM `QSOptimize_city`  INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`)  WHERE(`QSOptimize_person_visitation`.`person_id` = 1  AND `QSOptimize_city`.`name` LIKE '%市%' ); SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`  FROM `QSOptimize_city`  INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`)  WHERE (`QSOptimize_person_visitation`.`person_id` = 2  AND `QSOptimize_city`.`name` LIKE '%市%' );  SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`  FROM `QSOptimize_city` INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`)  WHERE (`QSOptimize_person_visitation`.`person_id` = 3  AND `QSOptimize_city`.`name` LIKE '%市%' ); SELECT `QSOptimize_city`.`id`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`  FROM `QSOptimize_city`  INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`id` = `QSOptimize_person_visitation`.`city_id`)  WHERE (`QSOptimize_person_visitation`.`person_id` = 4  AND `QSOptimi
                
                

-六神源码网