如何抓取多条消息?



我正在尝试从同一日期的多个WhatsApp消息通过以下代码。然而,这只给出了该日期的第一条消息(4/21/2022))例如:

要求输出应该是:

Hey there (message 1)

你好吗?(2)信息

WBU吗?(3)信息

输出结果

Hey there (message 1)

Hey there (message 1)

Hey there (message 1)

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
day = input("Enter date: ")
month = input("Enter month: ")
year = input("Enter year: ")
date = month + "/" + day + "/" + year
driver = webdriver.Chrome()
driver.get("https://web.whatsapp.com/")
WebDriverWait(driver, 60).until(
EC.text_to_be_present_in_element(
(By.CLASS_NAME, '_1vjYt'), 'WhatsApp Web'
)
)

listContact = []
with open('cont.txt', 'r') as f:
for line in f:
line = line.replace('n', '')
listContact.append(line)
for contact in listContact:
driver.implicitly_wait(10)
hotel = driver.find_element(By.XPATH, '//span[@title="{}"]'.format(contact))
hotel.click()
driver.implicitly_wait(10)
while (driver.find_element(
By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))):
messages = driver.find_element(
By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))
print(messages.text)

HTML编码是:


<div class="_2jGOb copyable-text" data-pre-plain-text="[2:39 PM, 5/1/2022] Joseph: ">
<div class="_1Gy50">
<span dir="ltr" class="i0jNr selectable-text copyable-text">
<span>
Hey, there
</span>
</span>
</div>
</div>
<div class="_2jGOb copyable-text" data-pre-plain-text="[2:40 PM, 5/1/2022] Joseph: ">
<div class="_1Gy50">
<span dir="ltr" class="i0jNr selectable-text copyable-text">
<span>
How are you?
</span>
</span>
</div>
</div>
<div class="_2jGOb copyable-text" data-pre-plain-text="[2:39 PM, 5/1/2022] Joseph: ">
<div class="_1Gy50">
<span dir="ltr" class="i0jNr selectable-text copyable-text">
<span>
WBU?
</span>
</span>
</div>
</div>

最后一个while()循环最好重写为

elements = driver.find_element(
By.CSS_SELECTOR, 'div[data-pre-plain-text*="{}"]'.format(date))
for e in elements:
print(e.text)

你得到相同的输出,因为while循环体开始新的独立迭代。

find_element(末尾没有s)总是只在页面上找到第一个元素-无论您使用多少次

您必须使用find_elements(最后使用s)来获得所有元素-然后使用for-loop

css = 'div[data-pre-plain-text*="{}"]'.format(date)
elements = driver.find_elements(By.CSS_SELECTOR, css)
for e in elements:
print(e.text)

最新更新