是否有更好/更快的方法来做这些多个插入使用PHP和MariaDB?

所以我试图建立一个小型数据库来跟踪我们联盟弹球机上的高分。我有一个表的用户，这只是一个AI id列，并与他们的电子邮件地址列。然后我有一个游戏表，这是一个AI id列，以及机器的名称。由于这是一个多对多关系，我创建了第三个名为scores的表，其中包含user_id, game_id和score作为列。

编辑:包括读代码:

$file = fopen('scores.txt', 'r') or die("Unable to open file.");
// Loop through the file line by line
$line_number = 1;
while (($line = fgets($file)) !== false) {
// Reset flags
$email_exists = 0;
$game_exists = 0;
if (isset($email_id)) unset($email_id);
if (isset($game_id)) unset($game_id);
echo ($line_number . " ");
// Split the line into components
$line = rtrim($line);
$array = explode(",", $line, 3);
$email = strtolower($array[0]);
$game = $array[1];
$score = $array[2];
$stmt = $db->prepare ("SELECT email_id FROM users WHERE email_address = ?");
$stmt->execute(array($email))
if ($stmt->rowCount() < 1) {
$stmt = $db->prepare("INSERT INTO users (email_address) VALUES (?)");
$stmt->execute(array($email))
$email_id = $db->lastInsertId();
} else {
$row = $stmt->fetch();
$email_id = $row['email_id'];
$email_exists = 1;
}

我使用类似的代码来检查我是否已经在数据库中列出了游戏。我需要获得游戏的id号和第三部分的电子邮件地址，即查看用户是否已经拥有该游戏的分数，以及如果他们已经拥有新分数，则新分数是否更高。

if ($email_exists == 0 || $game_exists == 0) {
// New user or game added to DB - no prior score can exist
$stmt = $db->prepare("INSERT INTO scores VALUES (?,?,?)");
$stmt->execute(array($email_id,$game_id,$score));
} else {
$stmt = $db->prepare("SELECT * FROM scores WHERE email_id = ? AND game_id = ?");
$stmt->execute(array($email_id,$game_id));
if ($stmt->rowCount() == 0) {
// No previous score for this game
$stmt = $db->prepare("INSERT INTO scores VALUES (?,?,?)");
$stmt->execute(array($email_id,$game_id,$score));
} else {
// Previous score exists
$row = $stmt->fetch();
if ($score > $row['score']) {
// New score is higher
$stmt = $db->prepare("UPDATE scores SET score = ? " .
. "WHERE email_id = ? AND game_id =?");
$stmt->execute(array($score, $email_id, $game_id));
// No action needed if new score is lower
}
}
}

代码似乎工作良好，但它是缓慢的。另外，它似乎使脚本在几千条记录之后超时或发生其他事情。有没有更好的办法来解决这个问题?

我尝试在Python中重新编码，但它甚至更慢，似乎它甚至没有在数据库中插入行。我几乎不懂Python，这可能对我没有帮助。

我正在考虑创建一个数组并存储需要插入的项目，然后一次插入100行或类似的东西，但我需要获得分数连接表的id。我也在考虑在数据库中使用唯一的约束，我试图找出如何重写插入代码使用，以防止重复的电子邮件地址或游戏。

创建一个包含参数email, game_id和score的存储过程。让过程为您完成所有SQL工作。您的PHP代码将简化为调用该过程的单个循环。结果应该更快，更容易维护:

create procedure sp_add_email_score (
in_email varchar(320),
in_game_id int,
in_score int
)
begin
declare v_email_id int;
select email_id into v_email_id
from users 
where email_address = in_email;
if (v_email_id is null) then
INSERT INTO users (email_address) VALUES (in_email);

set v_email_id = LAST_INSERT_ID();
end if;
INSERT INTO scores (email_id, game_id, score) 
VALUES(v_email_id, in_game_id, in_score) 
ON DUPLICATE KEY UPDATE score=if(in_score>score, in_score, score);;
end

如果循环仍然太慢，则是其他原因导致了减速。

这里有很多改进的空间。当涉及到数据库速度时，您的主要目标通常应该是减少对数据库服务器的访问次数。

首先，您正在为每个CSV行执行email-to-id查询，没有必要这样做。您最多应该为每个用户执行一次并缓存它。更好的做法是，对整个集合只做一次，将整个集合读入内存数组。比如:

$stmt = $db->prepare('SELECT email_address, email_id FROM users');
$idMap = array_column($stmt->execute(), 'email_id', 'email_address');

这将给你一个像这样的数组:

[
'foo@bar.com' => 1,
'baz@bar.com' => 2,
]

在脚本开始时执行一次，并在持续时间内将其保存在内存中。通过这个，你可以立即查找给定电子邮件的id。这将删除对数据库的7999次命中。你实际上是在用内存换取CPU和磁盘时间。如果你遇到一个电子邮件，它还没有在数组中，你可以插入它，并将其添加到数组中。

接下来，将准备移到循环迭代之外。这将删除至少3 * 7999次对数据库的点击，可能多达5 * 7999次。

接下来，使用fgetcsv()而不是explosion()，因为它更简单，并且可以正确处理引用。并在执行单个插入之前处理整个CSV。如果要丢弃大部分记录，那么创建如此庞大的数据库流量是愚蠢的。因此，首先计算最高的，然后只使用那些命中数据库:

$top = [];
$fp = fopen('scores.txt', 'r');
while ([$email, $gameId, $score] = fgetcsv($fp)) {
if ($score > ($top[$email][$gameId] ?? 0)) {
$top[$email][$gameId] = $score;
}
}

给定输入文件

foo@bar.com,g1,3
foo@bar.com,g1,1
foo@bar.com,g2,2
baz@bar.com,g1,4
baz@bar.com,g2,5
baz@bar.com,g2,6

这将生成每个用户的最高分数组:

Array
(
[foo@bar.com] => Array
(
[g1] => 3
[g2] => 2
)
[baz@bar.com] => Array
(
[g1] => 4
[g2] => 6
)
)

然后，您可以遍历该数组，并仅基于这些记录执行插入/更新。这将为每个冗余的CSV行节省两个查询。

foreach ($top as $email => $scores) {
foreach ($scores as $gameId => $score) {
// INSERT INTO scores ($idMap[$email], $gameId, $score)
}
}

相关内容

最新更新

热门标签：