两个目录下的文件比较



我比较两个目录中的所有文件,如果比较大于90%,那么我继续外循环,并且我想删除第二个目录中匹配的文件,以便第一个目录中的第二个文件不会与已经匹配的文件进行比较。

这是我尝试过的:

for i for i in sorted_files:
for j in sorted_github_files:
#pdb.set_trace()
with open(f'./files/{i}') as f1:
try:
text1 = f1.read()
except:
pass
with open(f'./github_files/{j}') as f2:
try:
text2 = f2.read()
except:
pass
m = SequenceMatcher(None, text1, text2)
print("file1:", i, "file2:", j)
if m.ratio() > 0.90:
os.remove(f'./github_files/{j}')
break

我知道我不能改变迭代,一旦它在行动,这就是为什么它返回我的文件没有发现错误,我不想使用try except块。任何想法都欢迎

有几点需要指出:

  • 始终提供最小可重复的示例
  • 你的第一个for循环不工作,因为你使用了' for i for i .. ' '
  • 如果您想首先遍历list1 (sorted_files)中的文件,然后在第二个循环之外读取文件
  • 我会将匹配比率超过0.90的文件添加到新列表中,然后删除文件,以便您的项目在迭代期间不会更改
  • 您可以在这里找到我创建和使用的测试数据
import os
from difflib import SequenceMatcher
# define your two folders, full paths
first_path = os.path.abspath(r"C:UsersXYZDesktoptestfoldera")
second_path = os.path.abspath(r"C:UsersXYZDesktoptestfolderb")
# get files from folder
first_path_files = os.listdir(first_path)
second_path_files = os.listdir(second_path)
# join path and filenames
first_folder = [os.path.join(first_path, f) for f in first_path_files]
second_folder = [os.path.join(second_path, f) for f in second_path_files]
# empty list for matching results
matched_files = []
# iterate over the files in the first folder
for file_one in first_folder:
# read file content
with open(file_one, "r") as f:
file_one_text = f.read()
# iterate over the files in the second folder
for file_two in second_folder:
# read file content
with open(file_two, "r") as f:
file_two_text = f.read()
# match the two file contents
match = SequenceMatcher(None, file_one_text, file_two_text)
if match.ratio() > 0.90:
print(f"Match found ({match.ratio()}): '{file_one}' | '{file_two}'")
# TODO: here you have to decide if you rather want to remove files from the first or second folder
matched_files.append(file_two)  # i delete files from the second folder
# remove duplicates from the resulted list
matched_files = list(set(matched_files))
# remove the files
for f in matched_files:
print(f"Removing file: {f}")
os.remove(f)

最新更新