Python:引用的URL在网站上使用请求时没有正确转换

我正试图从Glosbe.com上抓取一些德语句子。请求的URL包含一些utf-8字符。请求完成后，网站不会将引用的字符更改为utf-8字符。请求的URl应如下所示

https://glosbe.com/de/hu/abkühlen

但从网站请求的URL没有转换为utf-8，搜索到的单词就是这个

https://glosbe.com/de/hu/abk%C3%BChlen/

使用的代码：

def beautifulSoapPrepare(sourceLang,destLang,phrase):
headers = {
'User-Agent': 'My User Agent 1.0',
'From': 'youremail@domain.example'  # This is another valid field
}
url="https://glosbe.com/"+sourceLang+"/"+destLang+"/"+urllib.parse.quote(phrase)+"/"
r = requests.get(url, "lxml",headers=headers)
soup = BeautifulSoup(r.content,features="lxml")
return soup

这里的图片显示了问题。图片中的问题

你能帮我解决这个问题吗？我希望网站搜索德语单词abkühlen，而不是这个abk%C3%BChlen。

解决方案：问题出现在URL中。一旦我删除了URL末尾的斜杠，它就起作用了。

之前：

url="https://glosbe.com/"+sourceLang+"/"+destLang+"/"+urllib.parse.quote(phrase)+"/"

之后：

url="https://glosbe.com/"+sourceLang+"/"+destLang+"/"+urllib.parse.quote(phrase)

如果您的最终目标是获得您要查找的特定单词的翻译，以下代码将为您提供这些信息(您最终可以对其进行分类、功能化，无论您想要什么(：

import requests
from bs4 import BeautifulSoup as bs
url = 'https://glosbe.com/de/hu/'
word = 'abkühlen'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url + word, headers=headers)
soup = bs(r.text, 'html.parser')
translations = soup.select('h3.translation')
for t in translations:
print(t.get_text(strip=True))

终端打印结果：

lehűl
hűtés
lehűt
hűvös
hűtés
előhűtés

请求文档可在https://requests.readthedocs.io/en/latest/

此外，BeautifulSoup文档位于：https://beautiful-soup-4.readthedocs.io/en/latest/index.html

相关内容

最新更新

热门标签：