如果顺序可能会改变，如何使用python替换几个单词?

我想创建一个自制的小翻译工具，只翻译特定的句子列表。我已经学会了使用 replace(( 方法，但我的主要问题是我正在从英语翻译成西班牙语，所以出现了两个问题：

-顺序颠倒很多次

-有时一组单词被翻译为一个单词，有时一个单词必须被翻译为两个或更多

我知道如何逐字翻译，但这还不足以解决这个问题。在这种特殊情况下，我想我必须翻译整整一个单词。

我该怎么做？

我知道如何逐字翻译。

我能够定义两个列表，在第一个列表中，我输入要翻译的原始英语单词，在另一个列表中放置相应的西班牙语单词。

然后我得到输入文本，拆分它并使用两个 for 循环来检查是否存在任何单词。万一它们是，我使用替换将它们更改为西班牙语版本。

之后，我使用 join 方法在单词之间添加一个空格以获得最终结果。

a = (["Is", "this", "the", "most","violent","show"])
b = (["Es", "este", "el", "más", "violento", "show"])
text = "Is this the most violent show?"
text2 = text.split()
for i in range (len(a)):
for j in range ((text2.__len__())):
if a[i] == text2[j]:
text2[j] = b[i]
print ("Final text is: ", " ".join(text2))

输出为：

最后的文字是：Es este el más violento show？

结果顺序错误，因为"más violento show"在西班牙语中听起来很奇怪，它应该是"show más violento"。

我想学习的是将这样的单词放入数组中：

a = (["most violent show"])
b= (["show más violento"])

但是在这种情况下，我无法使用拆分工具，并且我对如何执行此操作有点迷茫。

使用替换和映射的更简单的解决方案怎么样：

t = {'aa': 'dd', 'bbb': 'eee', 'c c c': 'f f f'}
v = 'dd eee zz f f f'
output = v
for a, b in t.iteritems():
output = output.replace(a, b)
print(output)
# 'aa bbb zz c c c'

这实际上是一个相当复杂的问题(如果你允许的话(！在撰写本文时，对于这个特定示例，其他一些答案完全没问题，因此，如果它们有效，请将其中一个标记为可接受的答案。

首先，您应该为此使用字典。它们是一本"字典"，您可以在其中查找某些内容(键(并获得定义(值(。

困难的部分是能够匹配要翻译的输入短语的部分，以获得翻译的输出。我们的通用算法：遍历每一个英语关键词/短语，然后将它们翻译成西班牙语。

有几个问题：

您将随用随流翻译，这意味着如果您的翻译包含可能是英语和西班牙语的单词，您可能会遇到无意义的翻译。
英语关键字可能是其他关键字的字符子集，例如："most" -> "más", "most violent show" -> "show más violento".
您需要匹配区分大小写。

我不会打扰 3，因为它并不在问题范围内，并且需要太长时间。求解 2 最简单：读取字典的键时，按输入键的长度排序。求解 1 要困难得多：在查看"正在进行的翻译"时，您需要知道哪些术语已经被翻译。

因此，下面概述了一个复杂但彻底的解决方案：

translation_dict = {
"is": "es",
"this": "este",
"the": "el",
"most violent show": "show más violento",
}
input_phrase = "Is this the most violent show?"
translations = list()
# Force the translation to be lower-case.
input_phrase = input_phrase.lower()
for key in sorted(translation_dict.keys(), key=lambda phrase: -len(phrase)):
spanish_translation = translation_dict[key]
# Code will assume all keys are lower-case.
if key in input_phrase:
input_phrase = input_phrase.replace(key, "{{{}}}".format(len(translations)))
translations.append(spanish_translation)
print(input_phrase.format(*translations))

如果您知道翻译的最大单词大小(即：迭代n-gram，其中n <= m，m是您期望翻译的最大单词组(，还有更复杂的解决方案。您将首先迭代n-gram 以获得最大m，尝试搜索翻译词典，然后将n递减 1，直到您遍历单个单词进行迭代。

例如，使用带有输入的m = 3："This is a test string."，您将获得以下尝试翻译的英语短语。

"This is a"
"is a test"
"a test string"
"this is"
"is a"
"a test"
"test string"
"this"
"is"
"a"
"test"
"string"

这对于庞大的翻译词典具有性能优势。我会展示它，但这个答案已经足够复杂了。

我认为您可以使用字符串replace方法实现所需的内容：

a = ("Is", "this", "the", "most violent show")
b = ("Es", "este", "el", "show más violento")
text = "Is this the most violent show?"
for val, elem in enumerate(a):
text = text.replace(elem, b[val])
print(text)
>>> 'Es este el show más violento?'

另请注意，元组中有一个冗余列表。

注意 Caspar Wylie 的解决方案是一种更简洁的方法，而是使用字典

相关内容

最新更新

热门标签：