有没有一种更pythonic的方法来查找文件夹的副本并将其路径作为元组列表返回?



我实现了这一点是为了在文件夹名称采用"NAME_(ID("格式的文件夹中查找重复项。该程序查找重复项,其中 NAME 相同且 ID 不同,并将它们放入具有完整路径的元组列表中以供以后合并。

虽然这种方法效果很好(文件夹计数永远不会是>1000(,但我忍不住想知道是否有一种更pythonic、可能更快的方法。

def getFolderNumber(f_name):
return f_name.split("_")[-1]

def discoverDuplicates(path):
'''
Discovers duplicate folders
'''
#Gets the list of folder ID numbers
folder_nums = [getFolderNumber(f) for f in os.listdir(path)]
#Gets the list of full folder names
folders = [f for f in os.listdir(path)]
#Returns folder ID numbers if there are more than 1 
dupes = [item for item, count in collections.Counter(folder_nums).items() if count > 1]
#Gets the indices for the duplicates in folder_nums, which is of equal length and ordering to folders
f_indices = [i for i, x in enumerate(folder_nums) if x in dupes]
#Gets all items in folders that match the indices from above. Sorts this list using the ID number
pre_zip = sorted([folders[x] for x in f_indices],key=lambda x: getFolderNumber(x).lstrip("(").rstrip(")"))
#Packages together duplicates into a list of lists
iterator = iter(pre_zip)
pre_join = list(map(list,zip(iterator,iterator)))
#Lambda for joining paths
path_builder = lambda x: os.path.join(path,x)
#Joins the final paths within each sublist within the final lists of lists
return [final.append(list(map(path_builder,x))) for x in pre_join]

下面是一个输入示例:

输入: ListOfFolder(List( 包含 [NAME1_(459(,NAME2(459(,NameN(ID_N(...]

输出: [('路径/到/NAME1_(459(','路径/到/NAME2_(459'(,(重复对2(,...]

这样的东西可以完成这项工作吗?

from collections import defaultdict
def group_duplicates(folders_names):
dups_dict = defaultdict(list)
for folder_name in folders_names:
folder_name_prefix = folder_name.split('_')[:-1]
dups_dicts[folder_name_prefix].append(folder_name)
return [tuple(dups) for dups in dups_dict.values()]

它最终可以被视为更"pythonic",因为它非常简单明了(简单比复杂好(.
人们可以用一些嵌套的推导来一行,但它最终会变得不那么可读。

至于速度,它会执行得相当快,所以只要它不是你程序的瓶颈,你就不应该太担心它。

最新更新