抓取隐藏电话号码



我一直在尝试提取电话号码时遇到麻烦,而不使用selenium单击"afficher le number"&;按钮。

这里是链接的url - https://www.mubawab.ma/fr/a/7469776/beau-terrain-%C3%A0-la-vente-%C3%A0-hay-izihar-superficie-68-m%C2%B2-

下面是我尝试过的代码:

import re
import requests
from bs4 import BeautifulSoup
url = "https://www.mubawab.ma/fr/a/7469776/beau-terrain-%C3%A0-la-vente-%C3%A0-hay-izihar-superficie-68-m%C2%B2-
"
phone_url = "https://www.mubawab.ma/jSpBT9/gAEhoRFWpm8vGww==', 'adPage"
ad_id = re.search(r"(d+).htm", url).group(1)
html_text = requests.get(phone_url.format(ad_id)).text
soup = BeautifulSoup(html_text, "html.parser")
phone = re.search(r"getTrackingPhone((.*?))", html_text).group(1)
print(soup.select_one(".texto").get_text(strip=True), phone)

在本例中,需要使用selenium。因为很难理解有效负载是如何编码的,而且时间会多花很多倍。最可能的字符串:

YR3gCzHEBrHR63YyPD95vui5tCyoyGZZRCtdUTrrJtw=

转换为:

ᣢ㡒䄬ീ嬤㠰℠尯〴䀶̨ۀ⪠嘡ਣ䰧〪ိ䁇䁦㗠߆ྠ㎁㠤怬Ⱡ⧓iⴠ祬ö~删ങ校屵䀠瀤槨‣㏰׏⏠᪠ӠѴ㢠ზ5ⵝ䯭涇䰧ࠢ⬠ӕ倠㓡Ġ༠ⲠǠË䜕₈Ф纾㾚ુ圪$㛀Ś⵬R儨⒗Ᏼာ挥狩⬕䐠⮀㚐䈦޳ݕҊ冑懖咏࠳⧜性ᘂ㙻ⓔዠ佊摾妤໫䕖勩ᬕᣱ⋍῅庰䶬䟦䝱௅凸潹㈠䕪౤⠥㡃忭夠㭍㞹慳ၭ"☷ᦞ䂢䠷Р睢᭍ୀ㌵׃ऄ〢㝒桾ᾠ☡犱ⶼᔨᔔᕢ㕒⣢ℰ䝐⒡ڹ䐫㋜㸩啒㄄ᾼ昂纙ઽ瘲Ⲙẻˇ帠୏湧᥂၍令偱夦䡮ऀᕚદ慢爼⠖䧜傉䤴夬݅䡯兰摍䨳0㢔仦摔䈤沥冠汈᠂ᕢń⠀㥰䣖ѵဠ幸栠

也许答案对某些人来说是显而易见的,但我将提供我自己的selenium版本。不要忘记为你的浏览器下载webdriver,例如chrome,并在代码中指定路径

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(path)
driver.get('https://www.mubawab.ma/fr/a/7469776/beau-terrain-%C3%A0-la-vente-%C3%A0-hay-izihar-superficie-68-m%C2%B2-')
script = BeautifulSoup(driver.page_source, 'lxml').find('div', class_='hide-phone-number-box').get('onclick')
elem = driver.find_element(By.CLASS_NAME, 'hide-phone-number-box')
driver.execute_script(script, elem)
timeout = 5
try:
element_present = EC.presence_of_element_located((By.CLASS_NAME, 'phoneText'))
WebDriverWait(driver, timeout).until(element_present)
phone = BeautifulSoup(driver.page_source, 'lxml').find('p', class_='phoneText').getText()
except TimeoutException:
print("Timed out waiting for page to load")
print(phone)

输出:

+212 6 27 47 75 46

我已经找到了解决我自己的问题,而不使用硒。你不能使用请求得到的电话号码,因为页面使用javascript创建页面的电话号码。但是您可以使用requests_html来呈现javascript并获取电话号码:

from requests_html import HTMLSession
url = "https://www.mubawab.ma/fr/a/7469776/beau-terrain-%C3%A0-la-vente-%C3%A0-hay-izihar-superficie-68-m%C2%B2- "
session = HTMLSession()
r = session.get(url)
# get the onclick code from the button
onclick = r.html.xpath('//*[@id="stickyDiv"]/div[2]/div[1]/div')[0].attrs['onclick']
# put the onclick code in a script
script = f"() => {{{onclick}}}"
# render the script
r.html.render(sleep=1, timeout=20, script=script)
# get the phone number
phone_number =  r.html.xpath('//*[@id="response"]/p')[0].text
print(phone_number)

输出:

06 27 47 75 46

最新更新