我有一个非常大的结果集,其中包含近 2 GB 的产品数据,分布在多个表中,每个表总共约有 500,000 条记录。我需要处理每条记录以导出到一组文件。
以下内容将使服务器在尝试保存结果集时崩溃,因此我不得不切换到仅创建一个查询以仅获取与查询结果匹配的每条记录的主 ID,然后对每个主 ID 执行第二个查询以获取该单个产品。由于所有这些辅助查询,这是非常低效和数据库密集型的结果。
下面是导致它崩溃的查询和代码。我怎么能不做呢?
$query =
"SELECT SQL_NO_CACHE SQL_BIG_RESULT
products.*,
inventory.*,
pricing.*,
markets.*
FROM
products,
categories,
markets,
pricing,
inventory
WHERE
products.catid = categories.id AND
markets.id = products.marketid AND
pricing.productid = products.id AND
inventory.productid = products.id AND
inventory.all_stock > 0 AND
products.sale = 'Y' AND
categories.active = 'Y' AND
inventory.last_update > UNIX_TIMESTAMP(NOW() - INTERVAL 1 DAY)
GROUP BY
products.id";
$Db = new DbConnector();
$r = $Db->query($query); // !Never gets past this point!
while ($product = $r->fetch(PDO::FETCH_ASSOC)) {
// Stuff gets done here.
}
查询是否仅在数据库服务器上运行?如果是这样,瓶颈很可能是与您的 Web 服务器以及与您的数据库服务器的通信。如果你正在拉取大量的数据,或者你被迫运行大量的查询(如果你必须对检索到的每个id运行额外的查询),我建议使用STORED PROCEDURE(mysql称它们为"例程")。你可以从这里开始:http://net.tutsplus.com/tutorials/an-introduction-to-stored-procedures/
你不就是把id字段放到一个临时表中,然后"水合"并批量处理整行吗?
首先是只有 id 的临时表:
CREATE TEMPORARY TABLE tempy
SELECT SQL_NO_CACHE SQL_BIG_RESULT
products.id AS product_id,
inventory.id AS inventory_id,
pricing.id AS pricing_id,
markets.id AS markets_id
FROM
products,
categories,
markets,
pricing,
inventory
WHERE
products.catid = categories.id AND
markets.id = products.marketid AND
pricing.productid = products.id AND
inventory.productid = products.id AND
inventory.all_stock > 0 AND
products.sale = 'Y' AND
categories.active = 'Y' AND
inventory.last_update > UNIX_TIMESTAMP(NOW() - INTERVAL 1 DAY)
GROUP BY
products.id
重复此查询,直到所有内容都得到处理,但在每一步中递增 OFFSET 值:
SELECT SQL_NO_CACHE SQL_BIG_RESULT
products.*,
inventory.*,
pricing.*,
markets.*
FROM
( SELECT *
FROM tempy
LIMIT 1000 -- slice size
OFFSET 1000*123 -- slice number
ORDER BY whatever.you.want
) AS t,
products,
inventory,
pricing,
markets
WHERE
products.id = t.products_id
inventory.id = t.inventory_id
pricing.id = t.pricing_id
markets.id = t.markets_id