我有一堆文件名如下:
tif_files = av_v5_1983_001.tif, av_v5_1983_002.tif, av_v5_1983_003.tif...av_v5_1984_001.tif, av_v5_1984_002.tif...av_v5_2021_001.tif, av_v5_2021_002.tif
但是,不能保证它们的顺序。
我想根据名称对它们进行排序,以便将同一年的文件一起排序。当我这样做
sorted(tif_files, key=lambda x:x.split('_')[-1][:-4])
我得到以下结果:
av_v5_1983_001.tif, av_v5_1984_001.tif, av_v5_1985_001.tif...av_v5_2021_001.tif
但是我想要这个
av_v5_1983_001.tif, av_v5_1983_002.tif, av_v5_1983_003.tif...av_v5_1984_001.tif, av_v5_1984_002.tif...av_v5_2021_001.tif, av_v5_2021_002.tif
以[2:]
为例,['1984', '001.tif']
tif_files = 'av_v5_1983_001.tif', 'av_v5_1983_002.tif', 'av_v5_1983_003.tif',
'av_v5_1984_001.tif', 'av_v5_1984_002.tif', 'av_v5_2021_001.tif', 'av_v5_2021_002.tif'
sorted(tif_files, key=lambda x: x.split('_')[2:])
# ['av_v5_1983_001.tif',
# 'av_v5_1983_002.tif',
# 'av_v5_1983_003.tif',
# 'av_v5_1984_001.tif',
# 'av_v5_1984_002.tif',
# 'av_v5_2021_001.tif',
# 'av_v5_2021_002.tif']
如果您有v1
或v2
或…v5
或…您还需要考虑版本的数量,如下所示:
tif_files = ['av_v1_1983_001.tif', 'av_v5_1983_002.tif', 'av_v6_1983_002.tif','av_v5_1984_001.tif', 'av_v5_1984_002.tif', 'av_v4_2021_001.tif','av_v5_2021_001.tif', 'av_v5_2021_002.tif', 'av_v4_1984_002.tif']
sorted(tif_files, key=lambda x: [x.split('_')[2:], x.split('_')[1]])
输出:
['av_v1_1983_001.tif',
'av_v5_1983_002.tif',
'av_v6_1983_002.tif',
'av_v5_1984_001.tif',
'av_v4_1984_002.tif',
'av_v5_1984_002.tif',
'av_v4_2021_001.tif',
'av_v5_2021_001.tif',
'av_v5_2021_002.tif']
您所做的是首先按00x
索引排序,然后按x.split('_')[-1]
产生001
等的年份排序。尝试将索引更改为先按年排序,然后再按索引排序:
sorted(tif_files, key=lambda x:x.split('_')[2])
sorted(tif_files, key=lambda x:x.split('_')[-1][:-4])
只要您的命名约定保持一致,您应该能够仅按字母数字排序。因此,下面的代码应该可以工作;
sorted(tif_files)
如果您希望按文件名中的最后两个数字排序,而忽略前缀,则需要一些更引人注目的东西来拆分这些数字并让您按它们排序。你可以使用如下代码:
import pandas as pd
tif_files_list = [[xx, int(xx.split("_")[2]), int(xx.split("_")[3])] for xx in tif_files]
tif_files_frame = pd.DataFrame(tif_files_list, columns=["Name", "Primary Index", "Secondary Index"])
tif_files_frame_ordered = tif_files_frame.sort_values(["Primary Index", "Secondary Index"], axis=0)
tif_files_ordered = tif_files_frame_ordered["Name"].tolist()
这将名称中的数字分解为Pandas Dataframe的单独列,然后根据这些分解的列对条目进行排序,此时您可以单独提取有序的名称列。
如果key
返回2个值的元组,sort
函数将尝试基于第一个值然后是第二个值进行排序。请参考:https://stackoverflow.com/a/5292332/9532450
tif_files = [
"hea_der_1983_002.tif",
"hea_der_1983_001.tif",
"hea_der_1984_002.tif",
"hea_der_1984_001.tif",
]
def parse(filename: str) -> tuple[str, str]:
split = filename.split("_")
return split[2], split[3]
sort = sorted(tif_files, key=parse)
print(sort)
输出['hea_der_1983_001.tif', 'hea_der_1983_002.tif', 'hea_der_1984_001.tif', 'hea_der_1984_002.tif']
右键单击文件夹并单击按>>名字。