Python 正则表达式用于在 HTML 上查找字符串并写入文本文件

我试图在解析的html Youtube链接上找到4行字符串并将它们写入文本文件。

import requests
url = 'https://www.youtube.com/watch?something'
r = requests.get(url)
r.text

我已经得到了 3 个字符串行的正则表达式，但我似乎无法解决 #3。以及如何只获得#4中的第一个？

另外，我想我知道如何在Python中做正则表达式，但我无法弄清楚。如何在PYTHON中做到这一点？

以下是我需要查找的字符串类型：

"所有者通道名称"："(正则表达式字母数字，包括空格("，"上传日期">

"ownerChannelName"：" 和 "，"uploadDate" 之间的 reGex 不带括号。

溶液：

"ownerChannelName":"[a-zA-Z0-9_-s]*","uploadDate"

订阅"：true}，"navigationEndpoint"：{

解决方案：完全相同 - 没有正则表达式

<script >
ytcfg.set({"INNERTUBE_CONTEXT":{"client

</script>

我需要脚本标签之间的所有内容

这是我到目前为止得到的...

ytcfg.set({"INNERTUBE_CONTEXT":{"client":{[a-zA-Z0-9:".%{},/()_-]*"}}});

但它不起作用。甚至有可能在这些事情之间得到一切：

ytcfg.set({"INNERTUBE_CONTEXT":{"client":{和

"}}});

4. 视频播放？过期=10整数\u0026ei=22 字母数字字符\u0026ip=xx.xx.xx.xx\

溶液：

videoplayback?expire=d{10}\\u0026ei=[a-zA-Z0-9_-]{22}\\u0026ip=bd{1,3}.d{1,3}.d{1,3}.d{1,3}b\\

---但是我如何只获得页面上的第一个？

最后，我可以将所有内容放在一个函数中(写入文本文件(并应用于chromedriver获取的每个url吗？

with open("url.txt") as f:

for line in f:
if line.rstrip():
url = line.strip()
print(url)
driver.get(url)
time.sleep(random.randint(180, 350))
***Do the Function here***

请问，函数应该是什么样子的。对不起，我的头已经杀了我了。:)

使用它re.search它只返回第一个匹配项

import re
regex = r"videoplayback?expire=d{10}\u0026ei=[a-zA-Z0-9_-]{22}\u0026ip=bd{1,3}.d{1,3}.d{1,3}.d{1,3}b\"
test_str = ("videoplayback?expire=0123456789\u0026ei=0123456789abcdefghin22\u0026ip=10.1.1.1\n"
"  videoplayback?expire=0123456789\u0026ei=0123456789abcdefghin22\u0026ip=10.1.1.2\n"
"   videoplayback?expire=0123456789\u0026ei=0123456789abcdefghin22\u0026ip=10.1.1.3\n"
"    videoplayback?expire=0123456789\u0026ei=0123456789abcdefghin22\u0026ip=10.1.1.4\n"
"     videoplayback?expire=0123456789\u0026ei=0123456789abcdefghin22\u0026ip=10.1.1.5\")
matches = re.search(regex, test_str)
if matches:
print(matches.group())

相关内容

最新更新

热门标签：