我想翻译我的数据帧的一列的文本,目标是协调数据。我有中文、英文、法文、德文、西班牙文等……我想要所有的英文文本。我尝试了几件事:使用googletrans API
1(天真地尝试做
from googletrans import Translator
translator = Translator()
df["translated"] = df.apply(lambda row :translator.translate(row['name']).text,axis = 1)
Out:JSONDecodeError: ('Expecting value: line 1 column 1 (char 0)', 'occurred at index 1816997')
2( 通过每次重置APIGoogleTrans API错误-应为值:第1行第1列(字符0(使用这个链接,我运行了以下代码:但我仍然有一个错误。。
import copy
from googletrans import Translator
translatedList = []
for index, row in df.iterrows():
# REINITIALIZE THE API
translator = Translator()
newrow = copy.deepcopy(row)
try:
# translate the 'text' column
translated = translator.translate(row['name'], dest='en')
newrow['translated'] = translated.text
except Exception as e:
print(str(e))
continue
translatedList.append(newrow)
Out: Expecting value: line 1 column 1 (char 0)
3( 我还试图通过改变IP来绕过谷歌API的限制。
使用vpn测试:不工作
import random
listofservers = ["South Africa", "Egypt" , "Australia", "New Zealand", "South Korea", "Singapore", "Taiwan", "Vietnam", "Hong Kong", "Indonesia", "Thailand", "Japan", "Malaysia", "United Kingdom", "Netherlands", "Germany", "France", "Belgium", "Switzerland", "Sweden","Spain","Denmark", "Italy", "Norway", "Austria", "Romania", "Czech Republic", "Luxembourg", "Poland", "Finland", "Hungary", "Latvia", "Russia", "Iceland", "Bulgaria", "Croatia", "Moldova", "Portugal", "Albania", "Ireland", "Slovakia","Ukraine", "Cyprus", "Estonia", "Georgia", "Greece", "Serbia", "Slovenia", "Azerbaijan", "Bosnia and Herzegovina", "Macedonia","India", 'Turkey', 'Israel', 'United Arab Emirates', 'United States', 'Canada','Mexico'
,"Brazil", "Costa Rica", "Argentina", "Chile"]
def SelectServer(l):
return random.choice(l)
def translate_text(text, dest_language="en"):
# Used to translate using the googletrans library
translator = googletrans.Translator()
try:
translation = translator.translate(text=text, dest=dest_language)
except json.decoder.JSONDecodeError:
# api call restriction
print("exception !! déconection du VPN ")
process = subprocess.Popen(["nordvpn", "-d"], shell = True ,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
process.wait()
time.sleep(5)
srv = SelectServer(listofservers)
print("sélection du serveur : "+ srv + " et connexion")
process = subprocess.Popen(["nordvpn", "-c", "-g", srv ], shell = True ,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
process.wait()
time.sleep(60)
return translate_text(text=text, dest_language=dest_language)
return translation.text
Out : ConnectionError: HTTPSConnectionPool(host='translate.google.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000027016006488>: Failed to establish a new connection: [WinError 10060]
我非常感谢你的帮助,
克里斯。
我正在处理XML文件进行翻译,并收到错误"JSONDecodeError:应为值:第1行第1列(char 0("。当我搜索这个错误时,我发现一些特殊字符无法翻译。在这种情况下, & etc.
对我来说是个问题。如果你的文本中有特殊字符,请复制粘贴到谷歌翻译网站,看看是否有错误。
出现此错误的另一个原因可能是请求或字符限制过多。如果您使用列表而不是字符串,则列表中的每个索引都意味着一个新的翻译请求。如果有太多的请求,谷歌会暂时禁止你的IP。
我将文本收集在一个字符串变量中。我在每条文本的开头加上[文本编号],用/n分隔,然后发送到翻译。喜欢
[1]First Textn
[2]SecondTextn
[3]Third Textn
谷歌翻译可以在单个请求中翻译10000个字符,因此字符串变量的字符限制为10000。此外,我在每次请求逃离禁令之间添加了一个100秒的计时器。它对我有用。
附言:我尝试代理绕过禁令,但对我不起作用。我通过热点连接了我的手机互联网,它起作用了。