imap_tools从邮件中抓取链接需要很长时间



我使用imap_tools从电子邮件中获取链接。电子邮件非常小,只有很少的文字、图片等。也不是很多,一天大约有20-40个。

当一封新邮件到达时,需要10到25秒来抓取链接。这看起来很长。我希望它少于2秒,速度很重要。

Nb。这是一个共享邮箱,我不能简单地获取未看到的邮件,因为通常其他用户在scraper到达他们之前已经打开了邮件。

有谁能看出是什么问题吗?

import pandas as pd
from imap_tools import MailBox, AND
import re, time, datetime, os
from config import email, password
uids = []
yahooSmtpServer = "imap.mail.yahoo.com"
data = {
'today': str(datetime.datetime.today()).split(' ')[0],
'uids': []
}
while True:
while True:
try:
client = MailBox(yahooSmtpServer).login(email, password, 'INBOX')
try:
if not data['today'] == str(datetime.datetime.today()).split(' ')[0]:
data['today'] = str(datetime.datetime.today()).split(' ')[0]
data['uids'] = []
ds = str(datetime.datetime.today()).split(' ')[0].split('-')
msgs = client.fetch(AND(date_gte=datetime.date.today()))
for msg in msgs:
links = []
if str(datetime.datetime.today()).split(' ')[0] == str(msg.date).split(' ')[0] and not msg.uid in data['uids']:
mail = msg.html
if 'order' in mail and not 'cancel' in mail:
for i in re.findall(r'(https?://[^s]+)', mail):
if 'pick' in i:
link = i.replace('"', "")
link = link.replace('<', '>').split('>')[0]
print(link)
links.append(link)
break
data['uids'].append(msg.uid)
scr_links = pd.DataFrame({'Links': links})
scr_links.to_csv('Links.csv', mode='a', header=False, index=False)
time.sleep(0.5)
except Exception as e:
print(e)
pass
client.logout()
time.sleep(5)
except Exception as e:
print(e)
print('sleeping for 5 sec')
time.sleep(1)

我认为这是邮件服务器油门超时。

尝试查看IMAP IDLE

从0.51.0开始imap_tools支持IDLE:

https://github.com/ikvk/imap_tools/releases/tag/v0.51.0

最新更新