Python -一个单一的csv文件排序使用多个字段和删除行是顺序在该文件中,我如何使用sqlite做到这一点



嗨,我有下面的代码,对一个csv文件进行排序,该文件包含每个用户的多行,必须按几列和日期排序,如果列2、3和5在当前行中与前一行相比是相同的,我们删除当前行(重复)。输出被写入两个文件,一个用于不重复,另一个用于重复。

下面是文件中的一些条目:

shift1,2021-02-14 06:35:00,J,P2,***USER16-J-P2,USER16
shift1,2021-02-15 07:35:00,J9,P2,***USER16-J9-P2,USER16
shift1,2021-02-17 06:35:00,J,P3,***USER16-J-P3,USER16
shift1,2021-02-18 07:35:00,J9,P2,***USER16-J9-P2,USER16
shift1,2021-02-19 06:35:00,J,P1,***USER16-J-P1,USER16
shift1,2021-02-22 07:35:00,J9,P2,***USER16-J9-P2,USER16
shift1,2021-02-23 07:35:00,J9,P2,***USER16-J9-P2,USER16
shift1,2021-02-25 06:35:00,J,P3,***USER16-J-P3,USER16
shift1,2021-02-26 06:35:00,J,P3,***USER16-J-P3,USER16
shift1,2021-02-27 06:35:00,J,P2,***USER16-J-P2,USER16
...
shift1,2021-02-17 07:35:00,J9,P3,***USER23-J9-P3,USER23
shift1,2021-02-18 07:35:00,J9,P3,***USER23-J9-P3,USER23
shift1,2021-02-19 06:35:00,J,P1,***USER23-J-P1,USER23
shift1,2021-02-19 22:55:00,N,P1,***USER23-N-P1,USER23
shift1,2021-02-21 06:35:00,J,P3,***USER23-J-P3,USER23
shift1,2021-02-22 22:55:00,N,P2,***USER23-N-P2,USER23
shift1,2021-02-23 22:55:00,N,P2,***USER23-N-P2,USER23
shift1,2021-02-24 22:55:00,N,P2,***USER23-N-P2,USER23
shift1,2021-02-26 07:35:00,J9,P2,***USER23-J9-P2,USER23

结果如下:

shift1,2021-02-14 06:35:00,J,P2,***USER16-J-P2,USER16
shift1,2021-02-15 07:35:00,J9,P2,***USER16-J9-P2,USER16
shift1,2021-02-17 06:35:00,J,P3,***USER16-J-P3,USER16
shift1,2021-02-18 07:35:00,J9,P2,***USER16-J9-P2,USER16
shift1,2021-02-19 06:35:00,J,P1,***USER16-J-P1,USER16
shift1,2021-02-22 07:35:00,J9,P2,***USER16-J9-P2,USER16
shift1,2021-02-25 06:35:00,J,P3,***USER16-J-P3,USER16
shift1,2021-02-27 06:35:00,J,P2,***USER16-J-P2,USER16
...
shift1,2021-02-15 14:35:00,S,P1,***USER23-S-P1,USER23
shift1,2021-02-17 07:35:00,J9,P3,***USER23-J9-P3,USER23
shift1,2021-02-19 06:35:00,J,P1,***USER23-J-P1,USER23
shift1,2021-02-19 22:55:00,N,P1,***USER23-N-P1,USER23
shift1,2021-02-21 06:35:00,J,P3,***USER23-J-P3,USER23
shift1,2021-02-22 22:55:00,N,P2,***USER23-N-P2,USER23
shift1,2021-02-26 07:35:00,J9,P2,***USER23-J9-P2,USER23

代码如下:

import csv
entries = []
last_entry = [None, None, None]
check = [None, None, None]
duplicate_entries = []
with open('test.txt', 'r') as my_file:
for line in my_file:
columns = line.strip().split(',')
check[0] = columns[2]
check[1] = columns[3]
check[2] = columns[5]
if check != last_entry:
if columns[2] not in entries:
last_entry[0] = columns[2]
if columns[3] not in entries:
last_entry[1] = columns[3]
if columns[5] not in entries:
last_entry[2] = columns[5]
if columns[1] not in entries:
entries.append(columns)
else:
duplicate_entries.append(columns)
# writes entries to
with open('test_out.txt', 'w') as out_csv_file:
text_out = csv.writer(out_csv_file, delimiter=",")
for result in entries:
text_out.writerow(result)

# writing out duplicates from duplicate_entries 
with open('test_dups.txt', 'w') as out_dups_file:
text_out = csv.writer(out_dups_file, delimiter=",")
for result in duplicate_entries:
text_out.writerow(result)

由于我对python和编程非常陌生,我想知道如何改进这一点,以及如何在sqlite3 for python中做到这一点。

如果您只需要检查前一行在某些列中是否具有相同的值,那么以下方法可能就足够了。你可以使用Python的itemgetter()来提取你需要比较的值。也可以使用csv.reader(),而不是使用split(',')时,读取您的现有文件:

from operator import itemgetter
import csv
entries = []
duplicate_entries = []
last_entry = [None, None, None]
req_cols = itemgetter(2, 3, 5)
with open('v1.txt', 'r') as f_input:
csv_input = csv.reader(f_input)

for row in csv_input:
cur_entry = req_cols(row)

if cur_entry != last_entry:
last_entry = cur_entry
entries.append(row)
else:
duplicate_entries.append(row)
# writes entries to
with open('test_v2.txt', 'w', newline='') as out_csv_file:
text_out = csv.writer(out_csv_file)
text_out.writerows(entries)
# writing out duplicates from duplicate_entries 
with open('test_dups.txt', 'w', newline='') as out_dups_file:
text_out = csv.writer(out_dups_file)
text_out.writerows(duplicate_entries)

您还应该添加newline=''参数,以避免输出中出现额外换行的问题(请参阅csv.reader()文档)。此外,csv.writerows()可以用于在单个调用中编写行列表。

相关内容

  • 没有找到相关文章

最新更新