给定一个包含以下文件的目录:
pcasvm_dataset_window_blackman_nperseg_4096_distance_1_speed_25k
pcasvm_dataset_window_blackman_nperseg_4096_distance_2_speed_25k
pcasvm_dataset_window_blackman_nperseg_8192_distance_1_speed_100k
pcasvm_dataset_window_blackman_nperseg_16384_distance_1_speed_200k
pcasvm_dataset_window_hamming_nperseg_4096_distance_1_speed_25k
pcasvm_dataset_window_hamming_nperseg_8192_distance_5_speed_25k
pcasvm_dataset_window_hann_nperseg_4096_distance_1_speed_25k
...
我可以通过以下理解阅读这些内容:datasets = [d for d in os.listdir('path/to/dir')]
然而,我想做的是分组分析这些数据集,分组为:
window
(即blackman、hann)和nperseg
(即8192、4096等)
这里的问题是,在给定大量实际数据集的情况下,如何最快速地实现这一目标。
字典是理想的吗?
例如:
dict(
blackman: dict(
4096: [file1, file2, file3],
8192: [..., ],
...
),
...
)
如果我理解正确,您可以使用re
解析文件名,使用dict.setdefault
对其进行分组:
import re
file_names = [
"pcasvm_dataset_window_blackman_nperseg_4096_distance_1_speed_25k",
"pcasvm_dataset_window_blackman_nperseg_4096_distance_2_speed_25k",
"pcasvm_dataset_window_blackman_nperseg_8192_distance_1_speed_100k",
"pcasvm_dataset_window_blackman_nperseg_16384_distance_1_speed_200k",
"pcasvm_dataset_window_hamming_nperseg_4096_distance_1_speed_25k",
"pcasvm_dataset_window_hamming_nperseg_8192_distance_5_speed_25k",
"pcasvm_dataset_window_hann_nperseg_4096_distance_1_speed_25k",
]
pat = re.compile(r"window_([^_]+)_nperseg_([^_]+)")
out = {}
for name in file_names:
m = pat.search(name)
if m:
out.setdefault(m.group(1), {}).setdefault(m.group(2), []).append(name)
print(out)
打印:
{
"blackman": {
"4096": [
"pcasvm_dataset_window_blackman_nperseg_4096_distance_1_speed_25k",
"pcasvm_dataset_window_blackman_nperseg_4096_distance_2_speed_25k",
],
"8192": [
"pcasvm_dataset_window_blackman_nperseg_8192_distance_1_speed_100k"
],
"16384": [
"pcasvm_dataset_window_blackman_nperseg_16384_distance_1_speed_200k"
],
},
"hamming": {
"4096": [
"pcasvm_dataset_window_hamming_nperseg_4096_distance_1_speed_25k"
],
"8192": [
"pcasvm_dataset_window_hamming_nperseg_8192_distance_5_speed_25k"
],
},
"hann": {
"4096": ["pcasvm_dataset_window_hann_nperseg_4096_distance_1_speed_25k"]
},
}