美丽汤是否有可用的功能可以删除所有空格

我对Python很陌生。我正在尝试抓取网站 = https://nl.soccerway.com/。对于这种刮擦，我使用美丽的汤。

唯一的问题是当我抓取团队名称时，团队名称得到提取时，左侧和右侧的空白周围有空格。

如何删除它？我知道很多人以前问过这个问题，但是我无法让它工作。

第二个问题：如何从TD中提取HREF标题？请参阅提供的 HTML 代码。俱乐部名称是佩鲁贾。

搜索谷歌
搜索堆栈溢出

佩鲁贾

import requests
from bs4 import BeautifulSoup

def main():
url = 'https://nl.soccerway.com/'
get_detail_data(get_page(url))
def get_page(url):
response = requests.get(url)
if not response.ok:
print('response code is:', response.status_code)
else:
soup = BeautifulSoup(response.text, 'lxml')
return soup

def get_detail_data(soup):
minutes = ""
score  = ""
TeamA  = ""
TeamB  = ""
table_data = soup.find('table',class_='table-container')

try:
for tr in table_data.find_all('td', class_='minute visible'):
minutes = (tr.text)
print(minutes)
except:
pass
try:
for tr in soup.find_all('td', class_='team team-a'):
TeamA = tr.text
print(TeamA)

except:
pass
if __name__ == '__main__':
main()

你可以使用beautifoulsoup中的get_text(strip=True(方法

tr.get_text(strip=True)

使用strip()方法删除尾随空格和前导空格。所以在你的情况下，它会是：

TeamA = tr.text.strip()

若要获取href属性，请使用模式tag['attribute']。在您的情况下，它将是：

href = tr.a['href']

相关内容

最新更新

热门标签：