我正在创建一个webscraper,它需要打开多个图标已填充的项目选项卡。例如,我需要打开的每个页面都有div class="课程选择器项目被固定";在它的源代码中。
<dropdown-content max-width="800" min-width="450" no-padding="" vertical-offset="0" dir="ltr" dropdown-content="" style="--dropdown-verticaloffset:0px;" opened=""><div class="classselector-wrapper" aria-live="assertive">
<div id="classSelectorId" class="placeholder placeholder-live" aria-live="assertive">
<div class="2_7_615 2_8_459 body-compact">
<ul class="datalist vui-list">
<li class="datalist-item datalist-item-actionable datalist-simpleitem vui-selected" id="2_9_421" data-actionid="2_11_656">
<div class="datalist-item-content" title="Class 1">
<div class="class-selector-item class-selector-item-pinned" data-org-unit-id="12345">
<div class="2_160_610 class-selector-item-name">
<a class="link datalist-item-actioncontrol" id="2_11_656" href="/abc/home/12345">Class1</a>
</div>
<span id="2_10_630" data-active-id="2_161_292" data-inactive-id="2_162_883"><button-icon icon="tier1:pin-filled" id="2_161_292" onclick="O("__g2",3)();" text="Un-pin "Class 1"" dir="ltr" type="button"></button-icon>
<button-icon icon="tier1:pin-hollow" class="hidden" id="2_162_883" onclick="O("__g2",4)();" text="Pin "Class 1"" dir="ltr" type="button"></button-icon>
</span></div>
</div>
<div class="clear"></div>
</li>
<li class="datalist-item datalist-item-actionable datalist-simpleitem vui-selected" id="2_12_929" data-actionid="2_14_114">
<div class="datalist-item-content" title="Class 2">
<div class="class-selector-item class-selector-item-pinned" data-org-unit-id="23456">
<div class="2_160_610 class-selector-item-name">
<a class="link datalist-item-actioncontrol" id="2_14_114" href="/abc/home/23456">Class 2</a>
</div>
<span id="2_13_229" data-active-id="2_163_477" data-inactive-id="2_164_80"><button-icon icon="tier1:pin-filled" id="2_163_477" onclick="O("__g2",5)();" text="Un-pin "Class 2"" dir="ltr" type="button"></button-icon>
<button-icon icon="tier1:pin-hollow" class="hidden" id="2_164_80" onclick="O("__g2",6)();" text="Pin "Class 2"" dir="ltr" type="button"></button-icon>
</span></div>
</div>
<div class="clear"></div>
</li>
<li class="datalist-item datalist-item-actionable datalist-simpleitem vui-selected" id="2_15_372" data-actionid="2_17_26">
<div class="datalist-item-content" title="Class 3">
<div class="class-selector-item class-selector-item-pinned" data-org-unit-id="34567">
<div class="2_160_610 class-selector-item-name">
<a class="link datalist-item-actioncontrol" id="2_17_26" href="/abc/home/34567">Class 3</a>
</div>
<span id="2_16_595" data-active-id="2_165_349" data-inactive-id="2_166_873"><button-icon icon="tier1:pin-filled" id="2_165_349" onclick="O("__g2",7)();" text="Un-pin "Class 3"" dir="ltr" type="button"></button-icon>
<button-icon icon="tier1:pin-hollow" class="hidden" id="2_166_873" onclick="O("__g2",8)();" text="Pin "Class 3"" dir="ltr" type="button"></button-icon>
</span></div>
</div>
<div class="clear"></div>
</li>
我需要webscraper来查找所有具有"课程选择器项目被固定";然后取数据组织单元id中的值。例如,在这种情况下,列表将返回[123452345634567]。
我所指的源代码行是:
<div class="class-selector-item-pinned" data-org-unit-id"12345">
<div class="class-selector-item-pinned" data-org-unit-id"23456">
<div class="class-selector-item-pinned" data-org-unit-id"34567">
这就是我所做的,到目前为止,列表没有返回任何内容。
获取单元ID列表
courseString='https://example.com/abc/p/home'
listofUnitID =[]
links = [elem.get_attribute("data-org-unit-id") for elem in driver.find_elements_by_class_name("class-selector-item-pinned")]
从列表中筛选出none类型
res = []
for val in links:
if val != None :
res.append(val)
print(res)
仅保留类的列表
for i in res:
if courseString in i:
listOfHref.append(i)
print(listOfUnitID)
如果我查看您的HTML,实际上您想要抓取以下div
:
<div class="class-selector-item class-selector-item-pinned" data-org-unit-id="34567">
...
...
不是这个:
<div class="class-selector-item-pinned" data-org-unit-id"12345">
<div class="class-selector-item-pinned" data-org-unit-id"23456">
<div class="class-selector-item-pinned" data-org-unit-id"34567">
这会导致代码不返回任何内容,因为您使用多个类来定位div。
.find_elements_by_class_name
仅用于单个类名。
你可以试试.find_elements_by_css_selector('css_selector')
,就像这样:
links = [elem.get_attribute("data-org-unit-id") for elem in driver.find_elements_by_css_selector(".class-selector-item.class-selector-item-pinned")]