我正在做一个脚本来保存子列表中的读取路径。假设我在一个列表中保存了 400 个文件路径,每个路径都有特定的语法Ci_whaterver.csv
,那么在我的路径列表中我有这样的东西:
pathlist=[C1_01.csv,C1_02.csv,...,Cn_01.csv,Cn_02.csv]
我想最终对这种情况进行排序或排序pathlistf
:
pathlistf=[[C1_01.csv,C1_02.csv,...],[C2_01.csv,C2_02.csv,...],...,[Cn_01.csv,Cn_02.csv,...]]
我不知道以这种方式重新排序路径。
再次您好,我遇到了这个问题: 我想问一些与过去案例非常相似的事情,假设我有下一条路:path=[case1_Qxxx_cap1_whatever.csv,case1_Qxxx_cap1_whatever2.csv,case1_Qxxx_cap1_whatever3.csv,....,case1_Qxxx_cap2_whatever.csv,case1_Qxxx_cap2_whatever2.csv,case1_Qxxx_cap2_whatever3.csv,case2_Qxxx_cap1_whatever.csv,case2_Qxxx_cap1_whatever2.csv,...,case2_Qxxx_cap2_whatever.csv,case2_Qxxx_cap2_whatever2.csv]
我想要这个:
pathf=[[[case1_Qxxx_cap1_whatever.csv,case1_Qxxx_cap1_whatever2.csv,...],[case1_Qxxx_cap2_whatever.csv,Qxxx_cap2_whatever2.csv,...]],[[case2_Qxxx_cap1_whatever.csv,case2_Qxxx_cap1_whatever2.csv,...],[case2_Qxxx_cap2_whatever.csv,case2_Qxxx_cap2_whatever2.csv,...]]]
如果pathlist
是预排序的,则可以根据itertools.groupby
使用以下代码。
from itertools import groupby
pathlist=['Cn_01.csv', 'C1_02.csv', 'C9_01.csv', 'C9_02.csv', 'Ca_01.csv', 'C9_03.csv', 'Ca_02.csv', 'C1_01.csv', 'Cn_02.csv']
pathlist.sort()
groupedfilenames = (list(g) for _, g in groupby(pathlist, key=lambda a: a[:2]))
print(list(groupedfilenames))
输出:
[['C1_01.csv', 'C1_02.csv'], ['C9_01.csv', 'C9_02.csv', 'C9_03.csv'], ['Ca_01.csv', 'Ca_02.csv'], ['Cn_01.csv', 'Cn_02.csv']]
一种方法是创建一个字典,并使用部分Ci
作为键,并以Ci
开头的文件名列表将是值。例如,取pathlist = ['C1_01.csv','C1_02.csv', 'C2_01.csv' , 'C3_01.csv', 'C2_02.csv']
,然后我们将创建一个字典,该字典将存储
{'C1': ['C1_01.csv', 'C1_02.csv'], 'C2': ['C2_01.csv', 'C2_02.csv'], 'C3': ['C3_01.csv']}
这是代码:
pathlist = ['C1_01.csv','C1_02.csv', 'C2_01.csv' , 'C3_01.csv', 'C2_02.csv']
d = {}
for path in pathlist:
if path[:2] not in d:
d[path[:2]] = [path]
else:
d[path[:2]].append(path)
pathlistf = []
for key in d:
pathlistf.append(d[key])
print(pathlistf)
# Output: [['C1_01.csv', 'C1_02.csv'], ['C3_01.csv'], ['C2_01.csv', 'C2_02.csv']]
希望这能解决问题。请随时提出任何问题。
这个函数对你有帮助吗?
最好!
import traceback
import pandas as pd
from typing import List
def organize_paths(path_lst:List[str],sep:str):
"""
Method to create a DataFrame from csv_path items
Args:
path_lst (List[str]): lits with paths in string format
sep (str): separator for split method
Returns:
df (DataFrame): matrix with the splitting info
"""
lst = list()
print(f'(SUCCESS organize_paths) -> starting process -> input: ({path_lst},{sep})')
try:
for item in path_lst:
calc_data = item.split(sep)
lst.append({calc_data[0]:calc_data[1]})
df = pd.DataFrame(lst)
print(f'(SUCCESS organize_paths) -> finishing process -> input: ({path_lst},{sep}) -> output sample: {df.head().to_dict()}')
return df
except BaseException as exc:
print(f'(ERROR organize_paths) -> finishing process -> input: ({path_lst},{sep}) -> exception: {traceback.format_exc()}')
pathlist=['C1_01.csv','C1_02.csv','C2_01.csv','C2_02.csv']
df = organize_paths(path_lst=pathlist,sep='_')
# display(df) to validate calculations