为什么第二个SQL语句破坏了我的循环

我有一个sqlite表，其中包含一列文件名。有些文件名与其他文件重复，所以我想遍历每一行，在列中搜索类似的条目，并将这些结果打印到控制台。

print(row[0])表示我的findDupes循环的前半部分工作，遍历每一行。当我制作另一个sqlite语句来查找类似的条目并打印输出时，事情变得很奇怪。循环只打印第一个条目，而不是继续循环。

我不是SQL专家，所以不知道我做错了什么。如有任何帮助，我们将不胜感激。谢谢

def getFiles():
    dirs = os.listdir(path)
    for files in dirs:
        c.execute('INSERT INTO myTable(files) VALUES(?)', (files,))
def findDupes():
    row = c.execute('select files from myTable order by files')
    while True:
        row = c.fetchone()
        if row == None:
            break
        c.execute('select files from myTable where files like ?',(row[0]+'%',))
        dupe = c.fetchone()
        print (dupe[0])

首先，您的代码没有揭示c是什么——它是连接对象还是光标？（两者都可以在这个对象中使用，但光标通常更可取）为什么它是全局的？

假设它是一个游标对象，那么发生的事情是在你第一次通过循环时，对c.execute的第二次调用重置了查询，所以第二次c.fetchone被调用时，sqlite正在查找select files from myTable where files like ? 的结果

解决这个问题的一种方法是使用多个游标；一个用于迭代文件名，另一个用于执行重复查找。

def findDupes(conn): #pass in your database connection object here
    file_curs = conn.cursor()
    file_curs.execute('select files from myTable order by files')
    while True:
        row = file_curs.fetchone()
        if row == None:
            break
        dup_curs = conn.cursor()
        dup_curs.execute('select files from myTable where files like ?',(row[0]+'%',))
        dupe = dup_curs.fetchone()
        print (dupe[0])

请注意，您可以完全在SQL中执行重复数据消除（例如，请参阅从sqlite数据库中删除重复的行），但如果您是SQL新手，则可能需要坚持以上操作。

您需要更改使用c的方式。findDupes()中的循环第一次运行时，会获取一行文件列表。之后，在循环的同一迭代中执行类似ness select的c.execute()。当第二次迭代发生时，第一个c.fetchone()从您的likeness查询中获取一行，而不是您在循环之外的原始所有文件按顺序查询。

对这两个查询使用不同的变量或光标。

您的问题是，在循环中，您每次都调用row = c.fetchone()，它将从上最近执行的查询返回一行。在第二个循环中，这将是c.execute('select files from myTable where files like ?',(row[0]+'%',))的结果，其中一行已经提取（所以您实际上是在用当前代码将row设置为c.execute('select files from myTable where files like ?',(row[0]+'%',))的第二个结果，我认为这可能会返回None并破坏循环）。

试试这个：

def findDupes():
    c.execute('select files from myTable order by files')
    rows = c.fetchall()
    for row in rows:
        c.execute('select files from myTable where files like ?',(row[0]+'%',))
        dupe = c.fetchone()
        print (dupe[0])

您可以用另一种方法解决这个问题，让数据库服务器为您计数。这样，您只需运行一个查询，而不是低效的方法，即获取所有文件，然后逐个检查每个文件：

def find_dupes():
   rows = c.execute('SELECT files, COUNT(*) FROM myTable GROUP BY files HAVING COUNT(*) > 1')
   return [row[0] for row in rows]
dupes = find_dupes()
print('n'.join(dupes))

相关内容

最新更新

热门标签：