BeautifulSoup, Selenium, For循环的Python数据提取问题



我需要在'ul'标签上循环,因为我显示我的脚本。有人能告诉我怎么做吗?非常感谢您的宝贵时间。

下面是我的代码:

正如我所说,我想循环通过UL标签,其中至少包括10个LI标签。我想提取LI tag的文本。但是我找不到在UL标签内循环的方法。

page_source = driver.page_source
soup = BeautifulSoup(page_source, features='html.parser')
searchResCon = soup.find('div', {'class':'search-results-container'})
followerCol = searchResCon.find('div', {'class':'ph0 pv2 artdeco-card mb2'})
searchList = followerCol.find('ul', {'class':'reusable-search__entity-result-list 
list-style-none'})
singleCon = searchList.find('li', {'class':'reusable-search__result-container'})
for li in searchList: #I want to loop at inside 'ul' tag which equal to searchList 
#variable
#that ul tag has at least 10 'li' tag. I want to iterate over 'ul'.

#here is the information that I collect with their precise name and variables 
inside 
of 'ul' tag which these infos inside 'li' s.
name = singleCon.find('span', {'aria-hidden':'true'}).get_text().strip()
title = singleCon.find('div', {'class':'entity-result__primary-subtitle t-14 
t-black t-normal'}).get_text().strip()
location = singleCon.find('div', {'class':'entity-result__secondary-subtitle 
t-14 t- normal'}).get_text().strip()
hashtag = singleCon.find('p', {'class':'entity-result__summary entity 
result__summary--2-lines t-12 t-black--light mb1'}).get_text().strip()
follower = singleCon.find('span',{'class':'entity-result__simple-insight-text 
entity- 
result__simple-insight-text--small'}).get_text().strip()
#I have list called contactsInfo and I am appending whole information to this 
list.
contactsInfo.append(f'-' * 30)
contactsInfo.append('n')
contactsInfo.append(f'-' * 30)
contactsInfo.append('n')
contactsInfo.append(f'Name: {name}')
contactsInfo.append('n')
contactsInfo.append(f'Title: {title}')
contactsInfo.append('n')
contactsInfo.append(f'Location: {location}')
contactsInfo.append('n')
contactsInfo.append(f'Hashtag: {hashtag}')
contactsInfo.append('n')
contactsInfo.append(f'Follower & Mutual: {follower}')
contactsInfo.append('n')

当我添加find_all对象到searchList变量,然后汤抛出我这样的错误;

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [242], in <cell line: 8>()
6 singleCon = searchList.find_all('li', {'class':'reusable-search__result-container'})
8 for li in singleCon:
---> 10     name = singleCon.find('span', {'aria-hidden':'true'}).get_text().strip()
11     title = singleCon.find('div', {'class':'entity-result__primary-subtitle t-14 t-black t-normal'}).get_text().strip()
12     location = singleCon.find('div', {'class':'entity-result__secondary-subtitle t-14 t-normal'}).get_text().strip()
File ~/Desktop/linkedin/emv/lib/python3.10/site-packages/bs4/element.py:2289, in ResultSet.__getattr__(self, key)
2287 def __getattr__(self, key):
2288     """Raise a helpful exception to explain a common code fix."""
-> 2289     raise AttributeError(
2290         "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
2291     )
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

非常感谢您的宝贵时间。

顺便说一下,我使用Python 3.10和jupyter notebook。

代码和错误信息不太匹配。根据消息,它应该看起来像这样

使用li来查找您的信息,而不是singleCon,这仍然是您为每个li迭代的ResultSet:

singleCon = searchList.find_all('li', {'class':'reusable-search__result-container'})
for li in singleCon:
name = li.find('span', {'aria-hidden':'true'}).get_text().strip()
title = li.find('div', {'class':'entity-result__primary-subtitle t-14 t-black t-normal'}).get_text().strip()
location = li.find('div', {'class':'entity-result__secondary-subtitle t-14 t-normal'}).get_text().strip()

最新更新