我有多个csv文件,其中case
列从0开始。
我想通过设置最后一个case
值+1作为下一个的开始值来连接它们。
我知道我可以创建一个for循环来读取每个csv文件,并在每个循环中将最后一个值添加到case
列。
import pandas as pd
# List of file names
file_list = ['file1.csv', 'file2.csv', 'file3.csv']
# Read the first file and store it in a DataFrame
df = pd.read_csv(file_list[0])
# Get the last value of the column that you want to continue
last_value = df.iloc[-1]['column_name']
# Loop through the remaining files
for file in file_list[1:]:
# Read the file into a DataFrame
df_temp = pd.read_csv(file)
# Continue the last value from the previous file in the current file
df_temp['column_name'] += last_value+1
last_value = df_temp.iloc[-1]['column_name']
# Concatenate the current file with the main DataFrame
df = pd.concat([df, df_temp])
是否可以直接使用pd.concat(map(pd.read_csv, file_list)
之类的东西?
修改每个文件数据,将其附加到Python列表中,然后在末尾进行连接,而不是频繁地进行连接,这样会更有效率:
import pandas as pd
# List of file names
file_list = ['file1.csv', 'file2.csv', 'file3.csv']
# Read the first file and store it in a DataFrame
data = []
last_value = 0
# Loop through files
for file in file_list:
# Read the file into a DataFrame
df_temp = pd.read_csv(file)
# Continue the last value from the previous file in the current file
df_temp['column_name'] += last_value
last_value = df_temp.iloc[-1]['column_name'] + 1
# different here: append the data
data.append(df_temp)
df = pd.concat(data)
您可以尝试您提到的map
函数
import pandas as pd
file_list = ['dum1.csv', 'dum2.csv', 'dum3.csv']
# Concatenate the CSV files into a single data frame
df_concatenated = pd.concat(map(pd.read_csv, file_list))
但是,这不会更新列值。所以你必须在事前或事后更新它们。我不太确定你的df结构,但你可以试试:
import pandas as pd
# Initialize a counter for the case values
case_counter = 0
file_list = ['dum1.csv', 'dum2.csv', 'dum3.csv']
# Concatenate CSV files into a single data frame
df_concat = pd.concat([df.assign(case=df.case + case_counter) for df in map(pd.read_csv, file_list)])
或者,您也可以用map
连接数据帧并重置索引
df.reset_index(inplace=True)
df.rename(columns={'index': 'case'}, inplace=True)
df['case'] = range(df.shape[0])
这将修改原始DataFrame并重置其索引,将新的索引列重命名为case,并用从0到行数的一系列数字填充它。您也可以在不使用索引的情况下在单独的列上执行此操作,而不是通过创建新数据帧来完成。