如何使用python从网页中获取内容

此页面有一个urlhttps://www.example.com

<html>
<body>
<button id="button1" onclick=func1()>
<button id="button2" onclick=func2()>
</body>
<script>
function func1(){
open("/doubt?s=AAAB_BCCCDD");
}
function func2(){
open("/doubt?s=AABB_CCDDEE");
}
//something like that, it is working ....
</script>
</html>

AAAB_BCCCDD和AABB_CCDDE-两者都是代币。。。

我想用python获得页面中的第一个令牌
我的python代码-

import requests
r = requests.get("https://www.example.com")
s = r.text
if "/doubt?s=" in s:
# After this i can' understand anything ...
# i want to get the first token here as a variable

请帮帮我。。。。

通常，在获取网站的原始文本内容后，您会首先使用BeautifulSoup等库解析HTML。它将创建一个文档对象模型(DOM(树，然后可以查询所需的元素。

然而，这不会读取或解释JavaScript代码。对于您的问题，可以使用正则表达式从原始文本中提取必要的信息。

示例：

import re
import requests
r = requests.get("https://www.example.com")
s = r.text
pattern = re.compile('/doubt\?s=(?P<token>\w+)')
matches = pattern.findall(s)
if len(matches) > 0:
print(matches[0])

相关内容

最新更新

热门标签：