网页抓取 - 找不到元素文本数据擦除名称



我想抓取一些打开的数据,但我一直收到元素没有文本属性值的错误。在我在谷歌上测试它之前,我可以用这种方式在课堂上轻松地获取文本。我也用xpath测试它,我知道";nobr";标签在html中是多个,这不是问题所在,但通常情况下,它可以通过xpath 跳转

driver.get('https://www.gelbeseiten.de/Suche/dm-drogerie%20markt/Bundesweit')
time.sleep(3)
plz = driver.find_element_by_class_name("nobr").text
plzx = driver.find_element_by_xpath("/html/body/div[2]/div[2]/div/div/div[1]/div/div/div/div[2]/div/article[59]/a/address/p[1]/span").text
print(plzx)

唯一的问题是找到包含元素的文本???我想循环浏览页面元素,用它填充pd中的这些列

#this was working if I scrape just by one value through page
product_titles = driver.find_elements_by_class_name('nobr')
for title in product_titles:
print(title.text)

#i want to save the scraped data later  
df = pd.DataFrame([[name,plz.street,city,number]],columns['business','plz','street','city','number'])


这是我想要抓取信息的部分


<a href="https://www.gelbeseiten.de/gsbiz/cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-realid="cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-tnid="162004776014" target="_self">

<div class="mod-hervorhebung">
</div>

<h2 data-wipe-name="Titel">dm-drogerie markt GmbH + Co. KG</h2>
<p class="d-inline-block mod-Treffer--besteBranche">
Drogeriewaren
</p>

<div class="bewertungen-anker">
<div class="mod mod-Stars mod-Stars--" title="5.0/5" data-float="5,0">
<span class="mod-Stars__text" style="width: 100%;">5.0</span>
</div>
<span>5.0</span>
<span>(2)</span>
</div>
<address class="mod mod-AdresseKompakt">
<p data-wipe-name="Adresse">
Geisenheimer&nbsp;Str.&nbsp;70, 
<span class="nobr">
65385
Rüdesheim&nbsp;am&nbsp;Rhein
</span>
</p>
<p class="mod-AdresseKompakt__phoneNumber" data-hochgestellt-position="end" data-wipe-name="Kontaktdaten">06722 40 63 70</p>
</address>

<div class="oeffnungszeit_kompakt__zustandsinfo--geoeffnet">
<span>Geöffnet</span>, 
<span class="nobr">schließt um 20:00</span>
</div>
</a>
<div class="mod mod-Aktionsleistekompakt">
<div class="mod mod-gsSlider mod-gsSlider--noneOnWhite">
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="left" data-show="false" data-wipe="{&quot;listener&quot;:&quot;click&quot;,&quot;name&quot;:&quot;Trefferliste: Slider-Pfeil-links&quot;}"></span>
<span class="mod-gsSlider__arrow mod-gsSlider__arrow--arrow" data-direction="right" data-show="false" data-wipe="{&quot;listener&quot;:&quot;click&quot;,&quot;name&quot;:&quot;Trefferliste: Slider-Pfeil-rechts&quot;}"></span>
<div class="mod-gsSlider__slider">
<span class="contains-icon-route gs-btn" data-wipe="{&quot;listener&quot;:&quot;click&quot;, &quot;name&quot;:&quot;Trefferliste Navigation-Button&quot;, &quot;id&quot;:&quot;162004776014&quot;}" data-parameters="{&quot;partner&quot;: &quot;googlemaps&quot;, &quot;searchquery&quot;: &quot;Geisenheimer%20Str%2070%2065385%20R%C3%BCdesheim%20am%20Rhein&quot;}" data-target="_blank">Route</span>
<a class="contains-icon-details gs-btn" rel="noopener" href="https://www.gelbeseiten.de/gsbiz/cf5182f8-e6ba-4846-a1f2-0d179feb68c4" data-wipe="{&quot;listener&quot;: &quot;mouseup&quot;, &quot;name&quot;: &quot;Trefferliste Actionbutton Mehr Details&quot;, &quot;id&quot;: &quot;162004776014&quot;, &quot;synchron&quot;: false}" data-isneededpromise="false" data-cookieinfo="cf5182f8-e6ba-4846-a1f2-0d179feb68c4=162004776014">Mehr Details</a>
<div class="mod-gsSlider__spacer"></div>
</div>
</div>

</div>

您可以尝试

element = driver.find_element_by_class_name('nobr')
plz = element.get_attribute('innerText')

或直接:

plz = driver.find_element_by_class_name('nobr').get_attribute('innerText')

因为文本是HTML元素的innerText属性。

因为find_elements_by_class_name返回一个列表,所以不能直接使用get_attribute。你需要写一个for循环,比如:

product_titles = driver.find_elements_by_class_name('nobr')
for title in product_titles:
print(title.get_attribute("innerText"))
#print(title.text)

无论哪个对你有效。

上次编辑:与XPath 相同

product_titles = driver.find_elements_by_xpath("//span[contains(@class, 'nobr')]")
for title in product_titles:
print(title.text)

最新更新