在不使用填充的情况下对短语进行迭代以获得固定的上下文



我试图在不使用任何类型的填充的情况下为每个单词获取一个固定的上下文(window_size(。例如:

sentence = ["hi", "my", "name", "is", "Francesco", "and", "i", "play", "drums"]
length = 9
window_size = 2

window_ size可以从1到5变化,并且对于每个目标词上下文必须具有相同的大小。在本例中,所需输出为:

[("hi", ["my","name","is","Francesco"]),("my", ["hi","name","is","Francesco"]),("name", ["hi","my","is","Francesco"]),
("is", ["my","name","Francesco", "and"]), ("Francesco", ["name","is","and","i"]),("and", ["is","Francesco","i","play"], ("i", ["Francesco","and","play","drums"]),("play", ["Francesco","and","i","drums"]),("drums", ["Francesco","and","i","play"])]

我被卡住了,也许这是一个递归问题?有人能帮我吗?

import copy
values = []
word_ids = [1,2,3,4,5,6,7,8,9]
possible = copy.deepcopy(word_ids)
popping = copy.deepcopy(word_ids)
boundary = 2
for index, num in enumerate(possible):
popping.pop(index)

values.append((num, popping[max(index - boundary -1 , 0):index + boundary + 1]))
popping = copy.deepcopy(word_ids)
print(values)

考虑上面代码中的边界是window_size

上面的代码打印的是:

[(1, [2, 3, 4]), (2, [1, 3, 4, 5]), (3, [1, 2, 4, 5, 6]), (4, [1, 2, 3, 5, 6, 7]), (5, [2, 3, 4, 6, 7, 8]), (6, [3, 4, 5, 7, 8, 9]), (7, [4, 5, 6, 8, 9]), (8, [5, 6, 7, 9]), (9, [6, 7, 8])]

正如你所看到的,上下文没有相同大小的

当遇到边缘时,尝试减少和增加窗口大小:

import copy
values = []
word_ids = [1,2,3,4,5,6,7]
possible = copy.deepcopy(word_ids)
length = len(word_ids)
window_left = 4
window_right = 4
for index, element in enumerate(word_ids):
ctrl = True
half = False
if index >= length//2:
half = True
while (ctrl):
if (index - window_left < 0) and half == False:
window_left = window_left - 1
window_right = window_right + 1
elif half==True and (index + window_right  > length - 1):
window_left = window_left + 1
window_right = window_right - 1
else:
values.append((element, possible[max(0,index - window_left): index + window_right + 1 ]))
window_right = 4
window_left = 4
ctrl = False  
# then remove the target word     
res = []
for element in values:
element[1].remove(element[0])
res.append((element[0], element[1]))
print(res)

相关内容

最新更新