按多个类别将文件名分组到列表中



给定一个包含以下文件的目录:

pcasvm_dataset_window_blackman_nperseg_4096_distance_1_speed_25k
pcasvm_dataset_window_blackman_nperseg_4096_distance_2_speed_25k
pcasvm_dataset_window_blackman_nperseg_8192_distance_1_speed_100k
pcasvm_dataset_window_blackman_nperseg_16384_distance_1_speed_200k
pcasvm_dataset_window_hamming_nperseg_4096_distance_1_speed_25k
pcasvm_dataset_window_hamming_nperseg_8192_distance_5_speed_25k
pcasvm_dataset_window_hann_nperseg_4096_distance_1_speed_25k
...

我可以通过以下理解阅读这些内容:datasets = [d for d in os.listdir('path/to/dir')]

然而,我想做的是分组分析这些数据集,分组为:

window(即blackman、hann)和nperseg(即8192、4096等)

这里的问题是,在给定大量实际数据集的情况下,如何最快速地实现这一目标。

字典是理想的吗?

例如:

dict(
blackman: dict(
4096: [file1, file2, file3],
8192: [..., ],
...
),
...
)

如果我理解正确,您可以使用re解析文件名,使用dict.setdefault对其进行分组:

import re
file_names = [
"pcasvm_dataset_window_blackman_nperseg_4096_distance_1_speed_25k",
"pcasvm_dataset_window_blackman_nperseg_4096_distance_2_speed_25k",
"pcasvm_dataset_window_blackman_nperseg_8192_distance_1_speed_100k",
"pcasvm_dataset_window_blackman_nperseg_16384_distance_1_speed_200k",
"pcasvm_dataset_window_hamming_nperseg_4096_distance_1_speed_25k",
"pcasvm_dataset_window_hamming_nperseg_8192_distance_5_speed_25k",
"pcasvm_dataset_window_hann_nperseg_4096_distance_1_speed_25k",
]
pat = re.compile(r"window_([^_]+)_nperseg_([^_]+)")
out = {}
for name in file_names:
m = pat.search(name)
if m:
out.setdefault(m.group(1), {}).setdefault(m.group(2), []).append(name)
print(out)

打印:

{
"blackman": {
"4096": [
"pcasvm_dataset_window_blackman_nperseg_4096_distance_1_speed_25k",
"pcasvm_dataset_window_blackman_nperseg_4096_distance_2_speed_25k",
],
"8192": [
"pcasvm_dataset_window_blackman_nperseg_8192_distance_1_speed_100k"
],
"16384": [
"pcasvm_dataset_window_blackman_nperseg_16384_distance_1_speed_200k"
],
},
"hamming": {
"4096": [
"pcasvm_dataset_window_hamming_nperseg_4096_distance_1_speed_25k"
],
"8192": [
"pcasvm_dataset_window_hamming_nperseg_8192_distance_5_speed_25k"
],
},
"hann": {
"4096": ["pcasvm_dataset_window_hann_nperseg_4096_distance_1_speed_25k"]
},
}

最新更新