使用 Python 计算网页上特定单词出现的频率

我尝试使用这个：

c=requests.get('https://www.uniberg.com/referenzen.html').text
c.count('Programmierung')

但是输出显示 2 次出现，而实际上没有。

我也试过这个：

a=requests.get('https://www.uniberg.com/index.html').text.count('Mitarbeiter')

但它也返回像Mitarbeiterphilosophie这样我不想要的单词数。有人可以找到一种方法来改善这一点或建议另一种方法吗？

今天https://www.uniberg.com/referenzen.htmlcontanins 2 出现Programmierung

我认为，您需要签入 HTML 源代码，而不是使用浏览器进行渲染。

Programmierung字词在 HTML 部分与此CSS

section .detail {
display: none;
}

对于第二点：

试试这个(使用regex(：

import re
len(re.findall(r'WMitarbeiterW', requests.get('https://www.uniberg.com/index.html').text))

使用正则表达式：

requests.get(URL( 返回整个网页(在 Google-Chrome 上使用 ctrl+U 查看它，或者只是使用 wget 下载网页(，而不仅仅是 Web 浏览器呈现的内容。这就是为什么计数显示为 2。

相关内容