我如何在txt中写入刮刮行并在Python中写入之前检查重复?

我正在根据我的关键字从用户频道id中抓取他们的公共信息电子邮件，但是一些频道id重复，然后电子邮件也重复，同时抓取大量的频道id，所以在我将它们逐行写入我的文本之前，我需要他们也检查可能重复的电子邮件，并忽略电子邮件是否已经存在于文本文件中。
如果你给我写如何删除空格，我也会很优雅，因为我已经有代码，有时工作其他不工作，不知怎的，它写空行与空格。

我的代码逐行写入所有邮件:

with open("scraped_emails.txt", 'a') as f:
for email in cleanEmail:
f.write(email.replace(" ", "")+ 'n')

你可以添加一个if语句来检查你想要附加的电子邮件是否已经在文件中，通过这样做:

cleanEmail = ['a@b.com', '   glennbz@veriznon.net    ', 'x@yy.ul']
with open("scraped_emails.txt", 'r+') as f:
emails = f.read()
for email in cleanEmail:
if email not in emails:
f.write(email.strip() + 'n')

请注意，我添加了strip()方法，这将通过删除前后空白来解决空白问题。

# Output 
a@b.com
glenjnnbz@veriznon.net
x@yy.ul

如果我理解正确，你想清理你的文件scraped_emails.txt，删除重复并通过删除空白来纠正电子邮件?我会做两步:

解析所有来自scraped_emails.txt的电子邮件，剥离空间并将它们存储在一组(唯一的)
用清理后的值覆盖现有文件。如果对此不确定，请先写入另一个文件，然后检查结果

clean_emails = set()
file_name = "scraped_emails.txt"
# initial reading of emails
print(f"Reading {file_name} to clean emails ..")
initial_line_counter = 0
with open(file_name, "r") as f_in:
for line in f_in:
# remember input lines, just for statistics
initial_line_counter += 1
# strips newlines and whitespaces
cleaned_email = line.rstrip("n").strip()

# you mentioned empty lines - this prevents adding of empty strings to your set
if cleaned_email:
clean_emails.add(cleaned_email)
# opening the file with the attribute mode="w" overwrites existing files
with open(file_name, "w") as f_out:
for email in clean_emails:
f_out.write(f"{email}n")
print(f"Reduced {initial_line_counter} to {len(clean_emails)} cleaned email addresses")

您可以使用scraped_emails.txt进行测试，其中包含以下内容:

some_mail1@yahoo.com
some_mail2@yahoo.com
some_mail3@yahoo.com
some_mail4@yahoo.com
some_mail5@yahoo.com
some_mail6@yahoo.com
some_mail@y7ahoo.com  
some_mail8@yahoo.com  
some_mail9@yahoo.com
some_mail9@yahoo.com
some_mail9@yahoo.com

相关内容

最新更新

热门标签：