网站上的文字显示为吉本语，而不是希伯来语

我正在尝试从网站获取字符串。我使用请求模块发送GET请求。

text = requests.get("http://example.com") #send GET requests to the website
print text.text #print the variable

然而，由于某种原因，该文本出现在吉博里什语中，而不是希伯来语：

<div>
<p>×©×¨×ª</p>
</div>

当我用Fiddler嗅到流量或在浏览器中查看网站时，我会看到希伯来语：

<div>
<p>שרת</p>
</div>

顺便说一下，html代码包含定义编码的元标签，即utf-8。我试图将文本编码为utf-8，但它仍然是胡言乱语。我试图使用utf-8对其进行解码，但它抛出了UnicodeEncodeError异常。我在脚本的第一行中声明我正在使用utf-8。此外，当我使用内置的urllib模块发送请求时，也会出现问题。

我读过Unicode HOWTO，但还是没能修复它。我也读过这里的许多线程(关于UnicodeEncodeError异常，以及为什么希伯来文在Python中变成胡言乱语)，但我仍然没能修复。

我在Windows机器上使用Python 2.7.9。我正在Python IDLE中运行我的脚本。

提前谢谢。

服务器没有正确声明编码。

>>> print u'×©×¨×ª'.encode('latin-1').decode('utf-8')
שרת

在访问text.text之前设置text.encoding。

text = requests.get("http://example.com") #send GET requests to the website
text.encoding = 'utf-8' # Correct the page encoding
print text.text #print the variable

相关内容

最新更新

热门标签：