如何在Beautiful Soup和Selenium中查找特定div ID中的所有元素



大家好,我正在努力收集linkedin信息。我有这个源代码。我的问题是,我知道如何获得部分ID的信息,然而,这个ID在每次页面刷新时都会更改

<section id="ember443" class="artdeco-card ember-view relative break-words pb3 mt2 " tabindex="-1"><!---->
<div id="experience" class="pv-profile-card-anchor"></div>
<!---->
<div class="pvs-list__outer-container">
<!---->    <ul class="pvs-list
ph5 display-flex flex-row flex-wrap
">
<li class="artdeco-list__item pvs-list__item--line-separated pvs-list__item--one-column">
<!----><div class="pvs-entity
pvs-entity--padded pvs-list__item--no-padding-when-nested

">
<div>
<a data-field="experience_company_logo" class="optional-action-target-wrapper 
display-flex" target="_self" href="https://www.linkedin.com/company/22316561/">
<div class="ivm-image-view-model  pvs-entity__image ">
<div class="ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag display-flex

">
</div>
</div>
</a>
</div>
<div class="display-flex flex-column full-width align-self-center">
<div class="display-flex flex-row justify-space-between">
<div class="
display-flex flex-column full-width">

<div class="display-flex align-items-center">
<span class="mr1 t-bold">
<span aria-hidden="true"><!---->CEO &amp; Founder<!----></span><span class="visually-hidden"><!---->CEO &amp; Founder<!----></span>
</span>
<!----><!----><!---->        </div>
<span class="t-14 t-normal">
<span aria-hidden="true"><!---->Runa<!----></span><span class="visually-hidden"><!---->Runa<!----></span>
</span>
<span class="t-14 t-normal t-black--light">
<span aria-hidden="true"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span><span class="visually-hidden"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span>
</span>
<span class="t-14 t-normal t-black--light">
<span aria-hidden="true"><!---->Mexico City Area, Mexico<!----></span><span class="visually-hidden"><!---->Mexico City Area, Mexico<!----></span>
</span>

我已经实现了用获得这个类的所有部分

experiences = soup.find_all("section", {"class": "artdeco-card ember-view relative break-words pb3 mt2"})

然而,我需要div id"中的文本;经验;部分我尝试过:

div = soup.find_all(id="experience")

但它只给了我那个标签,其他什么都没有。任何关于我如何在特定的";经验;部分提前感谢

好吧,id为"的div内没有任何测试;"经验"-您想要的数据是之后的。所以也许可以试试之类的东西

expAnchor = soup.find(id="experience")
if expAnchor: #to avoid error, in case expAnchor = None
expContainer = expAnchor.find_next('div', {"class": "pvs-list__outer-container"}) 

或者,您可以使用css选择器,并在一个调用中获得它,如:

expContainer = soup.select_one('#experience ~ div.pvs-list__outer-container')

最新更新