抓取的网络数据中缺少信息，谷歌翻译，使用Python

我想抓取谷歌翻译网站，并使用Python 3从中获取翻译文本。

这是我的代码：

from bs4 import BeautifulSoup as soup
from urllib.request import Request as uReq
from urllib.request import urlopen as open

my_url = "https://translate.google.com/#en/es/I%20am%20Animikh%20Aich"
req = uReq(my_url, headers={'User-Agent':'Mozilla/5.0'})
uClient = open(req)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html5lib')
print(html)

不幸的是，我无法在解析的网页中找到所需的信息。在chrome"检查"中，它显示翻译的文本在里面：

 <span id="result_box" class="short_text" lang="es"><span class="">Yo soy Animikh Aich</span></span>

但是，当我在解析的HTML代码中搜索信息时，这就是我在其中找到的信息：

<span class="short_text" id="result_box"></span>

我尝试使用所有html5lib，lxml，html.parser进行解析。我无法找到解决方案。请帮我解决这个问题。

您可以使用特定的Python API：

import goslate
gs = goslate.Goslate()
print(gs.translate('I am Animikh Aich', 'es'))
Yo soy Animikh Aich

尝试如下以获得所需的内容：

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://translate.google.com/#en/es/I%20am%20Animikh%20Aich")
soup = BeautifulSoup(driver.page_source, 'html5lib')
item = soup.select_one("#result_box span").text
print(item)
driver.quit()

输出：

Yo soy Animikh Aich

JavaScript 在

加载后正在修改 HTML 代码。 urllib无法处理JavaScript，则必须使用Selenium来获取所需的数据。

有关安装和演示，请参阅此链接。

相关内容

最新更新

热门标签：