翻译 JSON 文件的特定字段



我想翻译"text"从Taskmaster-2数据集的飞行域。这是一个深度嵌套的JSON文件。使用谷歌云翻译我该怎么做?

示例(从英语到孟加拉语):

Origin JSON文件:

[ {
"conversation_id": "dlg-00100680-00e0-40fe-8321-6d81b21bfc4f",
"instruction_id": "flight-12",
"utterances": [
{
"index": 0,
"speaker": "USER",
"text": "Hello. I'd like to find a round trip commercial airline flight from San Francisco to Denver.",
"segments": [
{
"start_index": 26,
"end_index": 36,
"text": "round trip",
"annotations": [
{
"name": "flight_search.type"
}
]
},
输出JSON文件:
[ {
"conversation_id": "dlg-00100680-00e0-40fe-8321-6d81b21bfc4f",
"instruction_id": "flight-12",
"utterances": [
{
"index": 0,
"speaker": "USER",
"text": "হ্যালো. আমি সান ফ্রান্সিসকো থেকে ডেনভার পর্যন্ত একটি রাউন্ড ট্রিপ বাণিজ্যিক এয়ারলাইন ফ্লাইট খুঁজতে চাই।",
"segments": [
{
"start_index": 26,
"end_index": 36,
"text": "রাউন্ড ট্রিপ",
"annotations": [
{
"name": "flight_search.type"
}
]
},

我在航班中提取了几行数据。json并使用下面用Python编写的代码,使用谷歌云翻译API将英语翻译成日语。还可以查看API支持的语言列表。

test.json:

[
{
"conversation_id": "dlg-00100680-00e0-40fe-8321-6d81b21bfc4f",
"instruction_id": "flight-12",
"utterances": [
{
"index": 0,
"speaker": "USER",
"text": "Hello. I'd like to find a round trip commercial airline flight from San Francisco to Denver.",
"segments": [
{
"start_index": 26,
"end_index": 36,
"text": "round trip",
"annotations": [
{
"name": "flight_search.type"
}
]
},
{
"start_index": 68,
"end_index": 81,
"text": "San Francisco",
"annotations": [
{
"name": "flight_search.origin"
}
]
},
{
"start_index": 85,
"end_index": 91,
"text": "Denver",
"annotations": [
{
"name": "flight_search.destination1"
}
]
}
]
},
{
"index": 1,
"speaker": "ASSISTANT",
"text": "Hello, how can I help you?"
},
{
"index": 2,
"speaker": "ASSISTANT",
"text": "San Francisco to Denver, got it.",
"segments": [
{
"start_index": 0,
"end_index": 13,
"text": "San Francisco",
"annotations": [
{
"name": "flight_search.origin"
}
]
},
{
"start_index": 17,
"end_index": 23,
"text": "Denver",
"annotations": [
{
"name": "flight_search.destination1"
}
]
}
]
}
]
}
]

代码:

import json
from google.cloud import translate_v2 as translate
f = open('test.json')
data = json.load(f)
target = "ja"
translate_client = translate.Client()
for conv in data:
for utt in conv["utterances"]:
utt["text"] = translate_client.translate(utt["text"], target_language=target)["translatedText"]
if "segments" in utt:
for seg in utt["segments"]:
seg["text"] = translate_client.translate(seg["text"], target_language=target)["translatedText"]
#print(data) # prints a dictionary
json_object = json.dumps(data, indent=2,ensure_ascii=False).encode('utf8')
print(json_object.decode()) # prints a json string

输出:

[
{
"conversation_id": "dlg-00100680-00e0-40fe-8321-6d81b21bfc4f",
"instruction_id": "flight-12",
"utterances": [
{
"index": 0,
"speaker": "USER",
"text": "こんにちは。サンフランシスコからデンバーまでの民間航空会社の往復便を探したいのですが。",
"segments": [
{
"start_index": 26,
"end_index": 36,
"text": "往復",
"annotations": [
{
"name": "flight_search.type"
}
]
},
{
"start_index": 68,
"end_index": 81,
"text": "サンフランシスコ",
"annotations": [
{
"name": "flight_search.origin"
}
]
},
{
"start_index": 85,
"end_index": 91,
"text": "デンバー",
"annotations": [
{
"name": "flight_search.destination1"
}
]
}
]
},
{        "index": 1,
"speaker": "ASSISTANT",
"text": "こんにちは、どうすればいいですか?"
},
{
"index": 2,
"speaker": "ASSISTANT",
"text": "サンフランシスコからデンバーへ、了解。",
"segments": [
{
"start_index": 0,
"end_index": 13,
"text": "サンフランシスコ",
"annotations": [
{
"name": "flight_search.origin"
}
]
},
{
"start_index": 17,
"end_index": 23,
"text": "デンバー",
"annotations": [
{
"name": "flight_search.destination1"
}
]
}
]
}
]
}
]

相关内容

  • 没有找到相关文章

最新更新