如何使用open在python中过滤数据文件并创建新文件



我有一个巨大的csv,我试图用open过滤数据。

我知道我可以在命令行中使用FINDSTR,但我想使用python创建一个经过筛选的新文件,或者我想创建一个pandas数据帧作为输出。

这是我的代码:

outfile = open('my_file2.csv', 'a')
with open('my_file1.csv', 'r') as f:
for lines in f:
if '31/10/2018' in lines:
print(lines)  
outfile.write(lines)

问题是生成的输出文件=输入文件,并且没有过滤器(文件大小相同(

感谢所有

代码的问题是最后一行的缩进。它应该在if语句中,所以只有包含'31/10/2018'的行才会被写入。

outfile = open('my_file2.csv', 'a')
with open('my_file1.csv', 'r') as f:
for lines in f:
if '31/10/2018' in lines:
print(lines)  
outfile.write(lines)

要使用Pandas进行过滤并创建DataFrame,请执行以下操作:

import pandas as pd
import datetime
# I assume here that the date is in a seperate column, named 'Date'
df = pd.read_csv('my_file1.csv', parse_dates=['Date']) 
# Filter on October 31st 2018
df_filter = df[df['Date'].dt.date == datetime.date(2018, 10, 31)]
# Output to csv
df_filter.to_csv('my_file2.csv', index=False)

(对于非常大的csv,请查看pd.read_csv()参数"chunksize"(

要使用with open(....) as f:,您可以执行以下操作:

import pandas as pd
filtered_list = []
with open('my_file1.csv', 'r') as f:
for lines in f:
if '31/10/2018' in lines:
print(lines)
# Split line by comma into list
line_data = lines.split(',')
filtered_list.append(line_data)
# Convert to dataframe and export as csv
df = pd.DataFrame(filtered_list)
df_filter.to_csv('my_file2.csv', index=False)

相关内容

  • 没有找到相关文章

最新更新