有没有一种方法可以获得没有省略号的BeautifulSoup网页描述



使用bs4,我找到了一种获取网页标题和描述的方法,但描述是用省略号分割的,就像这个"Everything you need to know about how-to adopt a cat, bringing your new cat... Browse cat breeds and learn about the many cats available for adoption on..."

这是我的代码:

def get_data(search):
headers = {"user-agent" : get_headers()} # get_headers is just the value of a file with viewing headers
response = requests.get("https://www.google.com/search", params={"q" : search}, headers = headers)

results = page_info(response.content)
return results
def page_info(page):
soup = BeautifulSoup(page, "lxml")
names = soup.find_all("h3", class_ = "LC20lb DKV0Md")
links = soup.find_all("cite", class_ = "iUh30 Zu0yb qLRx3b tjvcx")
desc = soup.find_all("div", class_ = "VwiC3b yXK7lf MUxGbd yDYNvb lyLwlc")
parsed_data = [{"title":title.text, "link":link.text.replace(" › ","/"), "desc":desc.text} for title, link, desc in zip(names, links, desc)]
return parsed_data

有没有办法在没有...的情况下获得描述?谢谢编辑:为了更清楚,我的意思是得到完整的描述,或者至少比提供的描述更完整。

您可以拆分该字符串,然后从中生成一个新字符串

disc = "Everything you need to know about how-to adopt a cat, bringing your new cat... Browse cat breeds and learn about the many cats available for adoption on..."
#this splits the description by '...' and returns array
splited = disc.split('...')
#now join the elements in array and make another string out of it
final = ''.join(splited)
# you can change this line to final = ' '.join(splited) and add space between those splited strings

您可以使用String.replace()删除(或替换(字符串的任何部分

示例:

original_str = "abc ... 123... abc"
corrected_str = original_str.replace('...', '')
print(original_str)
print(corrected_str)

输出:

abc ... 123... abc
abc  123 abc