循环遍历数据帧列表,将该列表的每个元素写入磁盘上的一个新的.csv文件



我有一个数据框架列表,并试图使用pandas.df导出每个数据框架。To_csv方法到磁盘上的文件夹。但是,只有数据帧列表中的最后一项被作为.csv

文件写入磁盘。请参阅下面的代码:

import pandas as pd
import os
import datetime
from pathlib import Path
CSV_Folder = Path('C:PA_BoundariesTests')
Output = r'C:/PA_Boundaries/test_output'
today = datetime.date.today()
date = today.strftime('%Y%m%d')
try:
dfs = []
for file in os.listdir(CSV_Folder):
df = pd.read_csv(CSV_Folder / file)
dfs.append(df)

new_dfs = []
for df in dfs:
new_df = pd.DataFrame()
new_df['Original Addr string'] = df['StreetConc']
new_df['Addr #'] = df['AddNum']
new_df['Prefix'] = df['StPreDir']
new_df['Street Name'] = df['StName']
new_df['StreetType'] = df['StType']
new_df['Suffix'] = df['StDir']
new_df['Multi-Unit'] = ''
new_df['City'] = df['City']
new_df['Zip Code'] = df['PostCode']
new_df['4'] = df['PostalExt']
new_df['County'] = df['CountyID']
new_df['Addr Type'] = ''
new_df['Precint Part Name'] = ''
new_df['Lat'] = df['X']
new_df['Long'] = df['Y']


replaced_address_names = []
for index, row in new_df.iterrows():
new_row = row['Original Addr string'].replace(',', ' ')
replaced_address_names.append(new_row)


new_df['Original Addr string'] = replaced_address_names

county_id = df.iloc[0, 37]

new_dfs.append(new_df)

for i in range(len(new_dfs)):
new_dfs[i].to_csv(f'{Output}ADDR_{county_id}_{date}.csv', index=False)
except FileNotFoundError:
print(f'{file} not found in {CSV_Folder}')
except PermissionError:
print('Check syntax of paths')
else:
print('Process Complete')

new_dfs包含正确的数据帧数。但是,当循环遍历新的数据帧列表并对列表中的每个项调用.to_csv时,只有列表中的最后一项被写入磁盘。

问题在于您为导出文件命名的方式。运行完循环后,county_id将等于最后一个county_id,或最后一个迭代df的county_id。

由于导出的数据框的名称是{Output}ADDR_{county_id}_{date}.csv,因此所有导出的文件都以相同的count_id和日期命名,或者换句话说,它们正在被重写。

为了避免这种情况,您可以创建一个名为county_ids的新列表,然后使用最后一个循环来更改保存文件的名称。这将是您的结果代码:

import pandas as pd
import os
import datetime
from pathlib import Path
CSV_Folder = Path('C:PA_BoundariesTests')
Output = r'C:/PA_Boundaries/test_output'
today = datetime.date.today()
date = today.strftime('%Y%m%d')
try:
dfs = []
for file in os.listdir(CSV_Folder):
df = pd.read_csv(CSV_Folder / file)
dfs.append(df)
new_dfs, county_ids = [], []
for df in dfs:
new_df = pd.DataFrame()
new_df['Original Addr string'] = df['StreetConc']
new_df['Addr #'] = df['AddNum']
new_df['Prefix'] = df['StPreDir']
new_df['Street Name'] = df['StName']
new_df['StreetType'] = df['StType']
new_df['Suffix'] = df['StDir']
new_df['Multi-Unit'] = ''
new_df['City'] = df['City']
new_df['Zip Code'] = df['PostCode']
new_df['4'] = df['PostalExt']
new_df['County'] = df['CountyID']
new_df['Addr Type'] = ''
new_df['Precint Part Name'] = ''
new_df['Lat'] = df['X']
new_df['Long'] = df['Y']

replaced_address_names = []
for index, row in new_df.iterrows():
new_row = row['Original Addr string'].replace(',', ' ')
replaced_address_names.append(new_row)


new_df['Original Addr string'] = replaced_address_names

county_ids.append(df.iloc[0, 37])
new_dfs.append(new_df)

for i in range(len(new_dfs)):
new_dfs[i].to_csv(f'{Output}ADDR_{county_id[i]}_{date}.csv', index=False)
except FileNotFoundError:
print(f'{file} not found in {CSV_Folder}')
except PermissionError:
print('Check syntax of paths')
else:
print('Process Complete')

显然我无法测试这一点-如果你运行它,可能需要调整的行。但是,我会像下面这样编写代码。基本上我会调用一个函数来替换,因为我打开并立即写出来。

如果你能让它工作,它可能会更快,读起来更好,因为有更少的行。

例子:

import pandas as pd
import os
import datetime
from pathlib import Path
CSV_Folder = Path(r'C:/PA_Boundaries/Tests')
Output = r'C:/PA_Boundaries/test_output/'
today = datetime.date.today()
date = today.strftime('%Y%m%d')
def updateFrame(f):
new_df = pd.DataFrame()
new_df['Original Addr string'] = f['StreetConc']
new_df['Addr #'] = f['AddNum']
new_df['Prefix'] = f['StPreDir']
new_df['Street Name'] = f['StName']
new_df['StreetType'] = f['StType']
new_df['Suffix'] = f['StDir']
new_df['Multi-Unit'] = ''
new_df['City'] = f['City']
new_df['Zip Code'] = f['PostCode']
new_df['4'] = f['PostalExt']
new_df['County'] = f['CountyID']
new_df['Addr Type'] = ''
new_df['Precint Part Name'] = ''
new_df['Lat'] = f['X']
new_df['Long'] = f['Y']

# better way to replace without looping the rows...
new_df['Original Addr string'] = new_df['Original Addr string'].str.replace(',', ' ')

return new_df

for file in os.listdir(CSV_Folder):
working_file = str(CSV_Folder) + '/' + file
if working_file.endswith('.csv'):
try:
df = pd.read_csv(working_file)
county_id = str(df.iloc[0, 37])
# the function returns a frame so you can treat it as such...
updateFrame(df).to_csv(f'{Output}ADDR_{county_id}_{date}.csv', index=False)

except FileNotFoundError:
print(f'{file} not found in {CSV_Folder}')
except PermissionError:
print('Check syntax of paths')
else:
print('Process Complete')