我想抓取一些打开的数据,但我一直收到元素没有文本属性值的错误。在我在谷歌上测试它之前,我可以用这种方式在课堂上轻松地获取文本。我也用xpath测试它,我知道";nobr";标签在html中是多个,这不是问题所在,但通常情况下,它可以通过xpath 跳转
driver.get('https://www.gelbeseiten.de/Suche/dm-drogerie%20markt/Bundesweit')
time.sleep(3)
plz = driver.find_element_by_class_name("nobr").text
plzx = driver.find_element_by_xpath("/html/body/div[2]/div[2]/div/div/div[1]/div/div/div/div[2]/div/article[59]/a/address/p[1]/span").text
print(plzx)
唯一的问题是找到包含元素的文本???我想循环浏览页面元素,用它填充pd中的这些列
#this was working if I scrape just by one value through page
product_titles = driver.find_elements_by_class_name('nobr')
for title in product_titles:
print(title.text)
#i want to save the scraped data later
df = pd.DataFrame([[name,plz.street,city,number]],columns['business','plz','street','city','number'])
这是我想要抓取信息的部分
<a href="https://www.gelbeseiten.de/gsbiz/cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-realid="cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-tnid="162004776014" target="_self">
<div class="mod-hervorhebung">
</div>
<h2 data-wipe-name="Titel">dm-drogerie markt GmbH + Co. KG</h2>
<p class="d-inline-block mod-Treffer--besteBranche">
Drogeriewaren
</p>
<div class="bewertungen-anker">
<div class="mod mod-Stars mod-Stars--" title="5.0/5" data-float="5,0">
<span class="mod-Stars__text" style="width: 100%;">5.0</span>
</div>
<span>5.0</span>
<span>(2)</span>
</div>
<address class="mod mod-AdresseKompakt">
<p data-wipe-name="Adresse">
Geisenheimer Str. 70,
<span class="nobr">
65385
Rüdesheim am Rhein
</span>
</p>
<p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">06722 40 63 70</p>
</address>
<div class="oeffnungszeit_kompakt__zustandsinfo--geoeffnet">
<span>Geöffnet</span>,
<span class="nobr">schließt um 20:00</span>
</div>
</a>
<div class="mod mod-Aktionsleistekompakt">
<div class="mod mod-gsSlider mod-gsSlider--noneOnWhite">
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Slider-Pfeil-links"}"></span>
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{"listener":"click","name":"Trefferliste: Slider-Pfeil-rechts"}"></span>
<div class="mod-gsSlider__slider">
<span class="contains-icon-route gs-btn" data-wipe="{"listener":"click", "name":"Trefferliste Navigation-Button", "id":"162004776014"}" data-parameters="{"partner": "googlemaps", "searchquery": "Geisenheimer%20Str%2070%2065385%20R%C3%BCdesheim%20am%20Rhein"}" data-target="_blank">Route</span>
<a class="contains-icon-details gs-btn" rel="noopener" href="https://www.gelbeseiten.de/gsbiz/cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-wipe="{"listener": "mouseup", "name": "Trefferliste Actionbutton Mehr Details", "id": "162004776014", "synchron": false}" data-isneededpromise="false" data-cookieinfo="cf5182f8-e6ba-4846-a1f2-0d179feb68c4=162004776014">Mehr Details</a>
<div class="mod-gsSlider__spacer"></div>
</div>
</div>
</div>
您可以尝试
element = driver.find_element_by_class_name('nobr')
plz = element.get_attribute('innerText')
或直接:
plz = driver.find_element_by_class_name('nobr').get_attribute('innerText')
因为文本是HTML元素的innerText属性。
因为find_elements_by_class_name
返回一个列表,所以不能直接使用get_attribute
。你需要写一个for循环,比如:
product_titles = driver.find_elements_by_class_name('nobr')
for title in product_titles:
print(title.get_attribute("innerText"))
#print(title.text)
无论哪个对你有效。
上次编辑:与XPath 相同
product_titles = driver.find_elements_by_xpath("//span[contains(@class, 'nobr')]")
for title in product_titles:
print(title.text)