c# - SQL -读取数据库中的每一行，对行进行数据挖掘，然后将结果保存在另一个数据库中-如何提高速度 - C# - SQL - Reading every row in a database, datamining the row, then saving results in another database

我的问题是:代码工作得很好，但是速度对于需要处理的行数来说太慢了。

我在做什么:我启动一个COUNT(*)来提取总行数(昨晚是~58000)，并使用它来创建循环来执行以下操作:从该行提取两列数据，对一列的文本模式进行数据挖掘。

完成后，我搜索第二个表，查看按用户名查找的个人是否已经存在——如果存在，我更新他们的行。如果没有，我添加一个新的。

有44列数据，一列是名称，另外43列存储我的数据挖掘结果的值。

在大约8个小时内，它完成了最初启动时的58000个表中的26500个(在同一时期，表已增长到~100000，但我并不担心)。

是否有更好的方法来提高读/写速率?

部分代码-我删除了许多int声明和Regex。匹配，因为它们是更改了匹配值的第一个副本。

azCheck是确定消息是否包含我们正在寻找的任何内容，它仍然是'0'，那么我们就不需要为代码的最后一部分操心了。

using (new MySqlConnection(ConnectiongString))
            {
                using (MySqlCommand cmd = new MySqlCommand("select count(*) from messages", connection))
                {
                    using (MySqlDataReader reader = cmd.ExecuteReader())
                    {
                        StringBuilder sb = new StringBuilder();
                        while (reader.Read())
                        {
                            sb.Append(reader.GetInt32(0).ToString());
                        }
                        total_messages = int.Parse(sb.ToString());
                    }
                }
            }
            Console.WriteLine(total_messages.ToString());
            connection.Close();
            for (int i = 1; i <= total_messages; i++)
            {
                connection.Open();
                using (new MySqlConnection(ConnectiongString))
                {
                    using (MySqlCommand cmd = new MySqlCommand("select * from messages WHERE id="+i+"", connection))
                    {
                        using (MySqlDataReader reader = cmd.ExecuteReader())
                        {
                            StringBuilder sb = new StringBuilder();
                            while (reader.Read())
                            {
                                username = reader["username"].ToString();
                                message = reader["message"].ToString();
                            }
                        }
                    }
                }
                connection.Close();
                Console.Write("r{0}   ", i);
                int aiCount = 0;
                aiCount += Regex.Matches(message, "ai", RegexOptions.IgnoreCase).Count;
                azCheck += aiCount;
//There are ~42 of the regex.matches after the first one.
MySqlCommand cmd1 = connection.CreateCommand();
                connection.Open();
                cmd1.CommandText = "SELECT username FROM users";
                cmd1.CommandType = CommandType.Text;
                cmd1.Connection = connection;
                MySqlDataReader dr = cmd1.ExecuteReader();
                while (dr.Read())
                {
                    if (dr[0].ToString() == username)
                    {
                        check++;
                    }
                }
                connection.Close();
if (check == 0)
                {
                    MySqlConnection connection2 = new MySqlConnection(ConnectiongString);
                    connection2.Open();
                    try
                    {
                        MySqlCommand cmd2 = connection2.CreateCommand();
                        cmd2.CommandText = "INSERT INTO users (username,aiCount) VALUES (@username,@aiCount)";
                        cmd2.Parameters.AddWithValue("@username", username);
                        cmd2.Parameters.AddWithValue("@aiCount", aiCount);
                        cmd2.ExecuteNonQuery();
                        connection2.Close();

                    }
                    catch (Exception)
                    {
                        throw;
                    }
} else {
int aiCount_old = 0;
if (azCheck > 0)
                    {
//Here we are taking the existing values from this users row, 
//which we then add the new values from above and save.
                        MySqlConnection connection4 = new MySqlConnection(ConnectiongString);
                        connection4.Open();
                        try
                        {
                            MySqlCommand cmd2 = connection4.CreateCommand();
                            cmd2.CommandType = CommandType.Text;
                            cmd2.CommandText = "SELECT * from users WHERE username = @username";
                            cmd2.Parameters.AddWithValue("@username", username);
                            MySqlDataReader reader = cmd2.ExecuteReader();
                            while (reader.Read())
                            {
                                aiCount_old = Convert.ToInt32(reader["aiCount"].ToString());
}
                        }
                        catch (Exception)
                        {
                            throw;
                        }
                        connection4.Close();
                        aiCount += aiCount_old;
MySqlConnection connection5 = new MySqlConnection(ConnectiongString);
                        connection5.Open();
                        try
                        {
                            MySqlCommand cmd4 = connection5.CreateCommand();
                            cmd4.CommandType = CommandType.Text;
                            cmd4.CommandText = "UPDATE users SET aiCount = @aiCount WHERE LOWER(LTRIM(RTRIM(username))) = @username";
                            cmd4.Parameters.AddWithValue("@username", username.Trim().ToLower());
                            cmd4.Parameters.AddWithValue("@aiCount", aiCount.ToString());
                              cmd4.ExecuteNonQuery();
                            Console.WriteLine("User updated.");
                        }
                        catch (Exception ex)
                        {
                            throw;
                        }
                        connection5.Close();

你有几个效率不高的地方，我一眼就能看出来。

您一直在打开和关闭您的连接字符串。这可能是你最大的瓶颈。打开连接一次，然后在所有处理完成后关闭它一次，您可能会看到性能的巨大提高。

你也可以使用不同的连接对象，这将减少你打开和关闭连接的需要。

你似乎对在连接对象上使用"using"也有误解。我看到了using (new MySqlConnection(ConnectiongString))，但这段代码是完全无用的，因为它除了初始化一个连接对象之外什么也不做，因为它没有分配给对象，因此立即丢失。

由于您是顺序地处理所有内容，因此在每种情况下都使用connection作为连接对象，仅在处理开始时打开它，并在处理完成时关闭它，然后执行Dispose方法(using语句的点)。

这个变化可能会将处理时间减少一个数量级。

注意:如果您需要在数据读取器打开时进行更新或其他查询，则需要为数据读取器单独连接。

c# - SQL -读取数据库中的每一行，对行进行数据挖掘，然后将结果保存在另一个数据库中-如何提高速度

相关内容

最新更新

热门标签：