我如何在txt中写入刮刮行并在Python中写入之前检查重复?

  • 本文关键字:txt Python python duplicates
  • 更新时间 :
  • 英文 :

  1. 我正在根据我的关键字从用户频道id中抓取他们的公共信息电子邮件,但是一些频道id重复,然后电子邮件也重复,同时抓取大量的频道id,所以在我将它们逐行写入我的文本之前,我需要他们也检查可能重复的电子邮件,并忽略电子邮件是否已经存在于文本文件中。

  2. 如果你给我写如何删除空格,我也会很优雅,因为我已经有代码,有时工作其他不工作,不知怎的,它写空行与空格。

我的代码逐行写入所有邮件:

with open("scraped_emails.txt", 'a') as f:
for email in cleanEmail:
f.write(email.replace(" ", "")+ 'n')

你可以添加一个if语句来检查你想要附加的电子邮件是否已经在文件中,通过这样做:

cleanEmail = ['a@b.com', '   glennbz@veriznon.net    ', 'x@yy.ul']
with open("scraped_emails.txt", 'r+') as f:
emails = f.read()
for email in cleanEmail:
if email not in emails:
f.write(email.strip() + 'n')

请注意,我添加了strip()方法,这将通过删除前后空白来解决空白问题。

# Output 
a@b.com
glenjnnbz@veriznon.net
x@yy.ul

如果我理解正确,你想清理你的文件scraped_emails.txt,删除重复并通过删除空白来纠正电子邮件?我会做两步:

  1. 解析所有来自scraped_emails.txt的电子邮件,剥离空间并将它们存储在一组(唯一的)
  2. 用清理后的值覆盖现有文件。如果对此不确定,请先写入另一个文件,然后检查结果
clean_emails = set()
file_name = "scraped_emails.txt"
# initial reading of emails
print(f"Reading {file_name} to clean emails ..")
initial_line_counter = 0
with open(file_name, "r") as f_in:
for line in f_in:
# remember input lines, just for statistics
initial_line_counter += 1
# strips newlines and whitespaces
cleaned_email = line.rstrip("n").strip()

# you mentioned empty lines - this prevents adding of empty strings to your set
if cleaned_email:
clean_emails.add(cleaned_email)
# opening the file with the attribute mode="w" overwrites existing files
with open(file_name, "w") as f_out:
for email in clean_emails:
f_out.write(f"{email}n")
print(f"Reduced {initial_line_counter} to {len(clean_emails)} cleaned email addresses")

您可以使用scraped_emails.txt进行测试,其中包含以下内容:

some_mail1@yahoo.com
some_mail2@yahoo.com
some_mail3@yahoo.com
some_mail4@yahoo.com
some_mail5@yahoo.com
some_mail6@yahoo.com
some_mail@y7ahoo.com  
some_mail8@yahoo.com  
some_mail9@yahoo.com
some_mail9@yahoo.com
some_mail9@yahoo.com

最新更新