字符编码的问题看起来与手动下载文件时的不同



我正在尝试使用以下谷歌翻译api端点来翻译应用程序中的文本:https://clients5.google.com/translate_a/t?client=dict-chrome ex&sl=自动&tl=en&q=

当我点击链接时,它会下载一个文本文件,当打开该文件时,它包含了我需要的所有信息,似乎格式正确(句子[0]。trans="text"与我手动写出单词"text"的格式相同(。

然而,在C#中,当使用www文件请求时,在python中使用requests.get或通过poster时,我会得到以下字符串";trans":"к;。

我试着将它转换成一堆不同的编码,但没有一个能给出正确的值。我也不同意完整的请求的英文部分是正确的,但本应是英文的翻译出现了错误,而显示原始翻译的俄文部分也出现了错误。

无论我在C#中尝试不同的编码(utf7、utf8、utf16、utf16-be(时如何更改其编码,我从中得到的文本似乎都不会转换回测试。

我这里有什么东西不见了吗?

尝试请求的代码、手动下载文件的结果以及运行代码的结果如下所示:

代码:

import json
import requests
text = "контрольная работа"
lang = "en"
url = f"https://clients5.google.com/translate_a/t?client=dict-chrome-ex&sl=auto&tl={lang}&q={text}"
url = url.replace(" ", "%20")
res = requests.get(url)
res = res.text
jres = json.loads(res)
translation = jres["sentences"][0]["trans"]
print(res, end="nn")
print("t", translation)

手动下载(点击chrome中的链接下载文件(:

{
"sentences": [
{
"trans": "test",
"orig": "контрольная работа",
"backend": 10
},
{
"src_translit": "kontrol'naya rabota"
}
],
"dict": [
{
"pos": "noun",
"terms": [
"test"
],
"entry": [
{
"word": "test",
"reverse_translation": [
"тест",
"испытание",
"анализ",
"проверка",
"критерий",
"контрольная работа"
],
"score": 0.18498141
}
],
"base_form": "контрольная работа",
"pos_enum": 1
}
],
"src": "ru",
"alternative_translations": [
{
"src_phrase": "контрольная работа",
"alternative": [
{
"word_postproc": "test",
"score": 1000,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
10
]
},
{
"word_postproc": "test work",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
3
]
}
],
"srcunicodeoffsets": [
{
"begin": 0,
"end": 18
}
],
"raw_src_segment": "контрольная работа",
"start_pos": 0,
"end_pos": 0
}
],
"confidence": 1,
"ld_result": {
"srclangs": [
"ru"
],
"srclangs_confidences": [
1
],
"extended_srclangs": [
"ru"
]
},
"target_inflections": [
{
"written_form": "test",
"features": {
"number": 2
}
},
{
"written_form": "tests",
"features": {
"number": 1
}
}
]
}

在C#中使用www请求文件(当www没有被弃用时,使用统一引擎的.net框架3.5(或在Python:中请求

{
"sentences": [
{
"trans": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°",
"orig": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ  работа",
"backend": 3,
"translation_engine_debug_info": [
{
"model_tracking": {
"checkpoint_md5": "ef4a126affdcc2d3c84e987e2d0fb6b1",
"launch_doc": "tea_GermanicB_afdaislbnosvfyyiiw_en_2020q2.md"
}
}
]
}
],
"src": "is",
"alternative_translations": [
{
"src_phrase": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ  работа",
"alternative": [
{
"word_postproc": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð ± отР°",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
3
]
},
{
"word_postproc": "ÐºÐ¾Ð½Ñ‚Ñ € оР»ÑŒÐ½Ð ° Ñ Ñ € Ð ° Ð °",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
8
]
}
],
"srcunicodeoffsets": [
{
"begin": 0,
"end": 35
}
],
"raw_src_segment": "ÐºÐ¾Ð½Ñ‚Ñ€Ð¾Ð»ÑŒÐ½Ð°Ñ  работа",
"start_pos": 0,
"end_pos": 0
}
],
"confidence": 1,
"ld_result": {
"srclangs": [
"is"
],
"srclangs_confidences": [
1
],
"extended_srclangs": [
"is"
]
}
}

由于它直接与Chrome一起工作,我添加了一个Chrome用户代理标头,它工作正常:

import json
import requests
from pprint import pprint
url = 'https://clients5.google.com/translate_a/t'
params = {'client': 'dict-chrome-ex',
'sl': 'auto',
'tl': 'en',
'q': 'контрольная работа'}
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'}
r = requests.get(url,params=params,headers=headers)
jres = r.json()
print(json.dumps(jres, indent=2, ensure_ascii=False))

输出:

{
"sentences": [
{
"trans": "test",
"orig": "контрольная работа",
"backend": 10
},
{
"src_translit": "kontrol'naya rabota"
}
],
"dict": [
{
"pos": "noun",
"terms": [
"test"
],
"entry": [
{
"word": "test",
"reverse_translation": [
"тест",
"испытание",
"анализ",
"проверка",
"критерий",
"контрольная работа"
],
"score": 0.18498141
}
],
"base_form": "контрольная работа",
"pos_enum": 1
}
],
"src": "ru",
"alternative_translations": [
{
"src_phrase": "контрольная работа",
"alternative": [
{
"word_postproc": "test",
"score": 1000,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
10
]
},
{
"word_postproc": "control work",
"score": 0,
"has_preceding_space": true,
"attach_to_next_token": false,
"backends": [
3
]
}
],
"srcunicodeoffsets": [
{
"begin": 0,
"end": 18
}
],
"raw_src_segment": "контрольная работа",
"start_pos": 0,
"end_pos": 0
}
],
"confidence": 1,
"ld_result": {
"srclangs": [
"ru"
],
"srclangs_confidences": [
1
],
"extended_srclangs": [
"ru"
]
},
"target_inflections": [
{
"written_form": "test",
"features": {
"number": 2
}
},
{
"written_form": "tests",
"features": {
"number": 1
}
}
]
}

最新更新