网站抓取在python中，但类有不同的类名.(同时discord.py)

我试图在"flashscore.com"上刮取比赛细节，例如"https://www.flashscore.com/match/hEyZ9RHu/#match-summary"，然后在不和谐的嵌入中重用它们。基本上所有这些都是通过不和聊天机器人完成的，但现在这并不重要。我正在试着获取一个人的全名，但似乎不起作用。

@client.command(pass_context=True)
async def analize(ctx, link):
if ctx.guild is not None:
await ctx.send("Scraping...")
html=requests.get(link)
soup=BeautifulSoup(html.text,"html.parser")
name=soup.find_all('a', class_=re.compile("participantName___"))
print(name)

是我的代码，但是它打印出来的都是[]。我对python，网页抓取和编程非常陌生，所以如果我不理解某些东西，请原谅。

但是为什么这不起作用呢?谢谢。

您可以尝试使用soupfilter对象，它会过滤掉任何不必要的html

from bs4 import BeautifulSoup, SoupStrainer

你只需要传递一个额外的参数给你的BeatifulSoup对象，叫做parse_only，它应该包含你的soupfilter对象。然后，您可以遍历所有的a标签元素，看看是否有类包含& participantName"如果有，您可以添加href，其中包含参与者的名称。

participants = []
strainer = SoupStrainer("a")
soup = BeatifulSoup(html.text "html.parser", parse_only=strainer)
for elem in soup.find_all("a"):
try:
class_ = elem["class"]

except:
continue
if "participantName" in class_:
participants.append(elem["href"])

我认为你想要的比整个页面的请求是HTML标题。快速检查显示，这些名称列在<meta name="og:title" content="Timofey Yusipov - Vladimir Marshalov 0:3">下的标题中。使用request .head()函数，然后打印报头。一旦您看到一些特定的模式，然后使用BeautifulSoup进行解析，甚至只是使用for循环遍历标题，寻找"og-title";元。

我是HTML的新手，所以我不确定如何表达最后一部分。

相关内容

最新更新

热门标签：