查询计划者不使用过滤器来限制高成本联接中的计算



我正在尝试解决我在postgresql中查询的性能问题。

正在建模的一般概念:要购买和分配的软件许可证。我想我已经删除了其他建模的东西,现在它与标准的酒店房间预订系统非常相似,只是当前任务(酒店预订)是正常的,没有已知的结束日期。

查询目的:这是一个视图,该视图组装了显示有关许可证的信息及其来自购买的信息。当应用程序查询视图时,它提供了tLicence.id,以便返回一行。

查询中剩下的类似于酒店的概念:

  • 一些许可协议限制了如何重新分配软件的速度;这已被用力地编码为1天。
  • 从理论上讲,许可证可以同时进行过去和现在。此不应发生,并且该应用程序会阻止它,但是如果人类在现实世界中犯了错误,则该应用程序确实允许将该错误输入系统。这显然与普通酒店系统不同,如果客人走进错误的房间,目前的居民会反对。

带有别名purchase_quantities_assignnments的嵌套SELECT是数据库中的视图(为方便起见,在此处夹住)。理想情况下,我希望对我的性能问题进行任何修复,以免将此视图的修改版本归为查询。理想情况下,该视图可以继续存在于原样,并以其他方式在其他查询中使用。

问题

如果我使用WHERE tLicences.id = 19查询此视图(查询),则结果需要很长时间才能生成。特别是,它似乎正在为periodsOfAvailability_start(慢速)生成整个集合,然后加入;该结论基于解释分析GroupAggregate返回10行(这是购买的数量)。i 感觉喜欢查询计划者应该能够确定tAssignments.purchase_id可用于显着减少需要生成多少periodsOfAvailability_start

但是,如果我查询使用WHERE tLicences.id = 19 AND tLicences.purchase_id = ? [?是该许可证的购买ID]的查询(查询),则查询按预期运行,仅生成具有该购买ID的periodsOfAvailability_start的集合;该结论基于解释分析GroupAggregate返回1行(这是许可证所属的购买数)。

查询

SELECT *
FROM test.tPurchases AS tPurchases
INNER JOIN test.tLicences
    ON tLicences.purchase_id = tPurchases.id
LEFT JOIN (
    SELECT
        purchase_id,
        SUM(
            CASE
                assignment_newer_id IS NOT null
                WHEN true THEN 1
                WHEN false THEN 0
            END
        ) AS prchs_quantity_assigned,
        SUM(
            CASE
                assignment_newer_id IS null AND
                current_timestamp
                    BETWEEN licence_availability_start AND
                        licence_availability_end
                WHEN true THEN 1
                WHEN false THEN 0
            END
        ) AS prchs_quantity_notAssignedAndCanBeAssigned,
        SUM(
            CASE
                assignment_newer_id IS null AND
                current_timestamp < licence_availability_start
                WHEN true THEN 1
                WHEN false THEN 0
            END
        ) AS prchs_quantity_notAssignedAndCannotBeAssigned
    FROM (
        SELECT
            tPurchases.id AS purchase_id,
            tPurchases.date_ AS purchase_date,
            tLicences.id AS licence_id,
            GREATEST(
                tPurchases.date_,
                older.end_,
                older.start + '1 day'::interval
            ) AS licence_availability_start,
            CASE
                WHEN newer.id IS NULL THEN 'infinity'
                ELSE newer.start - '1 day'::interval
            END AS licence_availability_end,
            COALESCE(newer.start, 'infinity') AS licence_availability_uninstallBy,
            older.id AS assignment_older_id,
            older.start AS assignment_older_start,
            older.end_ AS assignment_older_end,
            newer.id AS assignment_newer_id,
            newer.start AS assignment_newer_start,
            newer.end_ AS assignment_newer_end
        FROM test.tLicences
        INNER JOIN test.tPurchases
            ON tPurchases.id = tLicences.purchase_id
        LEFT JOIN test.tAssignments AS older
            ON (
                NOT older.deleted AND
                older.licence_id = tLicences.id
            )
        LEFT JOIN test.tAssignments AS newer
            ON (
                NOT newer.deleted AND
                newer.id <> older.id AND
                newer.licence_id = older.licence_id
            )
        WHERE
            NOT tLicences.deleted
        UNION
        SELECT
            tPurchases.id AS purchase_id,
            tPurchases.date_ AS purchase_date,
            tLicences.id AS licence_id,
            tPurchases.date_ AS licence_availability_start,
            oldest.start - '1 day'::interval AS licence_availability_end,
            oldest.start AS licence_availability_uninstallBy,
            null AS assignment_older_id,
            null AS assignment_older_start,
            null AS assignment_older_end,
            oldest.id AS assignment_newer_id,
            oldest.start AS assignment_newer_start,
            oldest.end_ AS assignment_newer_end
        FROM test.tLicences
        INNER JOIN test.tPurchases
            ON tPurchases.id = tLicences.purchase_id
        INNER JOIN test.tAssignments AS oldest
            ON oldest.licence_id = tLicences.id
        WHERE
            NOT tLicences.deleted AND
            NOT oldest.deleted
    ) AS periodsOfAvailability_start
    WHERE
        (assignment_newer_id IS null OR assignment_newer_end IS null)
    GROUP BY purchase_id
) AS purchase_quantities_assignnments
    ON
        purchase_quantities_assignnments.purchase_id = tPurchases.id
WHERE
    tLicences.id = 19 /* [Unexpected behaviour] The full set for "purchase_quantities_assignnments" is generated */
    --tLicences.id = 19 AND tLicences.purchase_id = ? /* [Expected behaviour] Only the single relevant row for "purchase_quantities_assignnments" appears to be generated */
    --tLicences.id = 19 AND tPurchases.id = ? /* [Expected behaviour] Only the single relevant row for "purchase_quantities_assignnments" appears to be generated */
    --tLicences.purchase_id = ? /* [Expected behaviour] Only the single relevant row for "purchase_quantities_assignnments" appears to be generated. Note: This is a different query *result* than the others */

问题:是否有某种方法可以解决此问题,而无需提供tLicences.purchase_id

数据库版本: PostgreSQL 9.0


SQL生成模式,表和填充这些表:

这有点长时间,因为我想要一个类似于实际数据的数量。如果运行时间是问题,可以减少许可证(30000)和任务数量(100000)。

CREATE SCHEMA test;
CREATE TABLE test.tPurchases (
    id                  serial      not null,
    date_               date        not null,               
    /* … */
    deleted             boolean     not null    DEFAULT false,
    PRIMARY KEY (id)
);
CREATE TABLE test.tLicences (
    id                  serial      not null,
    purchase_id         integer     not null,
    /* … */
    deleted             boolean     not null    DEFAULT false,
    PRIMARY KEY (id),
    FOREIGN KEY (purchase_id)
        REFERENCES test.tPurchases (id)
        ON UPDATE RESTRICT
        ON DELETE RESTRICT
);
CREATE INDEX ON test.tLicences(purchase_id);
CREATE TABLE test.tAssignments (
    id                  serial      not null,
    licence_id          integer     not null,
    start               date        not null,
    end_                date,
    /* … */
    deleted             boolean     not null    DEFAULT false,
    PRIMARY KEY (id),
    FOREIGN KEY (licence_id)
        REFERENCES test.tLicences (id)
        ON UPDATE RESTRICT
        ON DELETE RESTRICT,
    CHECK (start <= end_)
);
CREATE INDEX ON test.tAssignments(licence_id);

INSERT INTO test.tPurchases(id, date_)
    SELECT
        id,
        '2000-01-01'::timestamp +  random() * '1 year'::interval AS date_
    FROM generate_series(1, 10) AS id
;
INSERT INTO test.tLicences(id, purchase_id, deleted)
    SELECT
        id,
        trunc(random() * 10 + 1) AS purchase_id,
        (random() > 0.99) AS deleted
    FROM generate_series(1, 30000) AS id
;
INSERT INTO test.tAssignments(id, licence_id, start, end_, deleted)
    SELECT
        assignments.id,
        assignments.licence_id,
        tPurchases.date_ + ((rank * 20 + random() * 10) || ' days')::interval AS start,
        CASE
            assignments.rank = max(assignments.rank) OVER (PARTITION BY assignments.licence_id) AND
            random() > 0.5
            WHEN true THEN null
            ELSE tPurchases.date_ + ((rank * 20 + 10 + random() * 10) || ' days')::interval
        END AS end_,
        (random() > 0.95) AS deleted
    FROM (
        SELECT
            assignments.id,
            assignments.licence_id,
            rank() OVER (PARTITION BY assignments.licence_id ORDER BY assignments.id) AS rank
        FROM (
            SELECT
                id,
                trunc(random() * 30000 + 1) AS licence_id
            FROM generate_series(1, 100000) AS id
        ) AS assignments
    ) AS assignments
    INNER JOIN test.tLicences
        ON tLicences.id = assignments.licence_id
    INNER JOIN test.tPurchases
        ON tPurchases.id = tLicences.purchase_id
;

您可能需要运行统计信息,但通常可以强制使用CTE的优化。在这里,我还将您所有的子征服带到CTE中,以表明它很清楚:

WITH myPurchases AS
( 
  SELECT *
  FROM test.tPurchases AS tPurchases
  WHERE tLicences.id = 19 
), periodsOfAvailability_start AS
(
  SELECT
      tPurchases.id AS purchase_id,
      tPurchases.date_ AS purchase_date,
      tLicences.id AS licence_id,
      GREATEST(tPurchases.date_, older.end_, older.start + '1 day'::interval) AS licence_availability_start,
      CASE WHEN newer.id IS NULL THEN 'infinity' ELSE newer.start - '1 day'::interval END AS licence_availability_end,
      COALESCE(newer.start, 'infinity') AS licence_availability_uninstallBy,
      older.id AS assignment_older_id,
      older.start AS assignment_older_start,
      older.end_ AS assignment_older_end,
      newer.id AS assignment_newer_id,
      newer.start AS assignment_newer_start,
      newer.end_ AS assignment_newer_end
  FROM test.tLicences
  INNER JOIN myPurchases AS tPurchases ON tPurchases.id = tLicences.purchase_id
  LEFT JOIN test.tAssignments AS older ON (NOT older.deleted AND older.licence_id = tLicences.id)
  LEFT JOIN test.tAssignments AS newer ON (NOT newer.deleted AND newer.id <> older.id AND newer.licence_id = older.licence_id)
  WHERE NOT tLicences.deleted
  UNION
  SELECT
      tPurchases.id AS purchase_id,
      tPurchases.date_ AS purchase_date,
      tLicences.id AS licence_id,
      tPurchases.date_ AS licence_availability_start,
      oldest.start - '1 day'::interval AS licence_availability_end,
      oldest.start AS licence_availability_uninstallBy,
      null AS assignment_older_id,
      null AS assignment_older_start,
      null AS assignment_older_end,
      oldest.id AS assignment_newer_id,
      oldest.start AS assignment_newer_start,
      oldest.end_ AS assignment_newer_end
  FROM test.tLicences
  INNER JOIN myPurchases AS tPurchases ON tPurchases.id = tLicences.purchase_id
  INNER JOIN test.tAssignments AS oldest ON oldest.licence_id = tLicences.id
  WHERE NOT tLicences.deleted AND NOT oldest.deleted
), purchase_quantities_assignnments AS
(
  SELECT
    purchase_id,
    SUM(CASE WHEN assignment_newer_id IS NOT null THEN 1 ELSE 0 END) AS prchs_quantity_assigned,
    SUM(CASE WHEN assignment_newer_id IS null AND current_timestamp BETWEEN licence_availability_start AND licence_availability_end THEN 1 ELSE false END) AS prchs_quantity_notAssignedAndCanBeAssigned,
    SUM(CASE WHEN assignment_newer_id IS null AND current_timestamp < licence_availability_start THEN 1 ELSE 0 END) AS prchs_quantity_notAssignedAndCannotBeAssigned
  FROM periodsOfAvailability_start
  WHERE assignment_newer_id IS null OR assignment_newer_end IS null
  GROUP BY purchase_id
)
SELECT *
FROM myPurchases AS tPurchases
INNER JOIN test.tLicences ON tLicences.purchase_id = tPurchases.id
LEFT JOIN purchase_quantities_assignnments ON purchase_quantities_assignnments.purchase_id = tPurchases.id

最新更新