For循环没有保存到数据帧



第一个问题,我希望你明白我是怎么写的

我正在搜索大量的电子邮件列表,如果它们在谷歌中被发现(我在德国,因此字符串中的德语)更新数据框架中的电子邮件有效性列以反映它…但这不是储蓄。它打印正确,但检查后记,它没有存储迭代值。

#  Script googling emails
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://google.de/search?q="Nicolas Cage"')
pyautogui.press('tab', presses=4)
pyautogui.press('enter')
df['email_validity'] = None
for email, domain_validity, email_validity in zip(df['email'], df['domain_validity'], df['email_validity']):
if domain_validity == True:
try:
driver.get(f'https://google.de/search?q="{email}" after:1990')
time.sleep(3)     # loading url
"""pyautogui.hotkey('escape', presses=2)"""
time.sleep(2)
if 'die alle deine Suchbegriffe enthalten' not in driver.page_source and 'übereinstimmenden Dokumente gefunden'not in driver.page_source and 'Es wurden keine Ergebnisse gefunden' not in driver.page_source:
email_validity = True
print(email_validity)
elif 'not a robot' in driver.page_source:
print('help me!')
input("write anything, and press enter:")
else:
email_validity = False
print(email_validity)
except:
print(email)
else:
email_validity = domain_validity

driver.close()
print('completed')
df.head()

您还没有在循环中更新df。变量emaildomain_validityemail_validity包含zip()返回的元组中的值。修改它们不会修改数据框架。

<标题>df.at h1> 需要在最后使用df.at来更新数据帧。
for index, email in enumerate(df['email']):
email_validity = None
# the rest of your code
df.at[index, 'email_validity'] = email_validity
<标题>

df.apply ()您还可以将电子邮件验证检查提取到一个单独的函数中,并在整个列上使用apply()而不是循环。您可以删除if domain_validity == True:检查并将其用作apply上的lambda函数。

这对您来说可能不太简单,因为需要处理'not a robot'的情况并返回一个值。

def check_email_validity(email):
try:
driver.get(f'https://google.de/search?q="{email}" after:1990')
time.sleep(3)     # loading url
"""pyautogui.hotkey('escape', presses=2)"""
time.sleep(2)
if 'die alle deine Suchbegriffe enthalten' not in driver.page_source and 'übereinstimmenden Dokumente gefunden'not in driver.page_source and 'Es wurden keine Ergebnisse gefunden' not in driver.page_source:
return True
elif 'not a robot' in driver.page_source:
print('help me!')
input("write anything, and press enter:")
# !!!!!!!!!!! This will need to return something
else:
return False
except:
print(email)
return None
df['email_validity'] = df.apply(lambda x: check_email_validity(x['email']) if x['domain_validity'] else False, axis=1)

最新更新