使用BeautifulSoup解析DOM

这是HTML:

<p class="date range"> "March 2014 to Present"
  <span class="duration"> (1 year 9 months) </span>
  <span class="location"> California </span>
<p class="date range"> "2009 - 2013"
  <span class="location"> Country </span>
</p>
<p class="date range"> "2007 - 2008"
  <span class="location"> Country </span>
</p>

我的代码：

data = soup.find(id="profile-experience")
for li in data.find_all("p", class_="date-range"):
  print li.get_text()

我得到的

March 2014 – Present(1 year 9 months)California
2009 – 2013Country
2007 – 2008Country

我只想得到日期范围，这样它就会看起来像这样：

March 2014-Present
2009-2013
2007-2008

我不知道如何解析数据，因为第二个日期和"国家"之间没有空格。

此外，如何在不拉任何孩子的情况下获得日期范围？

我们的想法是从具有类date和range:的每个p元素中获得第一个文本节点

for date_range in soup.select("p.date.range"):
    print(date_range.find(text=True).strip())

打印：

"March 2014 to Present"
"2009 - 2013"
"2007 - 2008"

相关内容

最新更新

热门标签：