我有一个像这样的深层子文件夹结构:
a/b/file1.txt
a/b/file1.doc
a/b/file2.txt
a/b/file2.doc
a/c/file3.txt
a/c/file3.doc
a/c/d/file4.txt
a/c/d/file4.doc
我想提取所有的.txt和.doc文件对-例如,到元组列表中-文件名是相同的,只是文件类型不同。
到目前为止,我想出的最好的办法是以下看起来效率不高的方法:
files = []
for root, dirs, files in os.walk(path):
for filename in files:
if os.path.isdir(os.path.join(os.path.abspath("."), filename)):
file_list = os.listdir(filename)
file_list_copy = file_list.copy()
#for each in file_list of type .txt
# find .doc of same name in file_list_copy
#add the 2 to tuple nd append to list
可能不是最有效的,但有效:
使用shell命令将类型移动到不同的文件夹(针对txt和doc扩展运行以创建2个文件夹(:
find /path-to-files-root/ -type f -name '*.txt' -exec mv -i {} /new-path-to-files/txt/ ;
然后我跑了:
def get_all_files(path, pattern):
#see https://stackoverflow.com/questions/17282887/getting-files-with-same-name-irrespective-of-their-extension
datafiles = []
for root,dirs,files in os.walk(path):
for file in fnmatch.filter(files, pattern):
datafiles.append(file)
return datafiles
txt_files = [f for f in os.listdir(txt_path) if isfile(join(txt_path, f))]
doc_files = [f for f in os.listdir(doc_path) if isfile(join(doc_path, f))]
for i, txt_file in enumerate(txt_files):
filename = (os.path.splitext(txt_file)[0])
doc_files = get_all_files(doc_path, '{0}.doc'.format(filename))
if len(doc_files)== 1:
doc_file = doc_files[0]
#do something with txt_file and doc_file