为元素列表中的每个单词返回 "element of list" 位置(单词)所在的索引



我有一个这样的列表,其中每个元素的字符串中的第一个数字恰好是每个元素的索引:

list = [" ","1- make your choice", "2- put something and make", "3- make something happens", "4- giulio took his choice so make","5- make your choice", "6- put something and make", "7- make something happens", "8- giulio took his choice so make","9- make your choice", "10- put something and make", "11- make something happens", "12- giulio took his choice so make"]

我想返回元素列表中每个单词所在的"列表元素"的索引:

for x in list:
    ....

我的意思是像这样:

position_of_word_in_all_elements_list = set("make": 1,2,3,4,5,6,7,8,9,10,11,12)    
position_of_word_in_all_elements_list = set("your": 1,5,9)
position_of_word_in_all_elements_list = set("giulio":4,8,12)

有什么建议吗?

这将查找输入中所有字符串的出现情况,甚至包括"1-"等。但是从结果中过滤不喜欢的记录应该不是什么大问题:

# find the set of all words (sequences separated by a space) in input
s = set(" ".join(list).split(" "))
# for each word go through input and add index to the 
# list if word is in the element. output list into a dict with
# the word as a key
res = dict((key, [ i for i, value in enumerate(list) if key in value.split(" ")]) for key in s)

{":[0],"one_answers":[2、6、10],"8 -":[8],"11 -":[11],"6 -":[6],"东西":[2、3、6、7、10、11),"你":(1、5、9),"发生":[3、7、11],"朱里奥":[4、8、12],"使":[1,2,3,4,5,6,7,8,9,10,11,12],"4 -":[4],"2 -":[2],"他":[4、8、12],"9 -":[9],"10 -":[10],"7":[7],"12":[12],"花":[4、8、12],"把":[2、6、10],"选择":(1、4、5、8、9、12],"5 -":[5],"所以":[4、8、12],"3 -":[3],"1 -":[1]}

首先重命名你的列表,以免干扰Python内置的东西所以

>>> from collections import defaultdict
>>> li = [" ","1- make your choice", "2- put something and make", "3- make something happens", "4- giulio took his choice so make","5- make your choice", "6- put something and make", "7- make something happens", "8- giulio took his choice so make","9- make your choice", "10- put something and make", "11- make something happens", "12- giulio took his choice so make"]`
>>> dd = defaultdict(list)
>>> for l in li:
        try: # this is ugly hack to skip the " " value
            index,words = l.split('-')
        except ValueError:
            continue
        word_list = words.strip().split()
        for word in word_list:
            dd[word].append(index)
>>> dd['make']
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']

defaultdict的作用:只要关键字(在本例中是单词)存在于字典中,它就像普通字典一样工作。如果键不存在,它创建它,它的值对应于,在我们的例子中是空列表,当你声明它dd = defaultdict(list)时指定。我不是最好的解释者,所以我建议在其他地方默认阅读,如果它不清楚:)

@Oleg写了一个很棒的书呆子解决方案。对于这个问题,我想出了以下简单的方法。

def findIndex(st, lis):
    positions = []
    j = 0
    for x in lis:
        if st in x: 
            positions.append(j)
            j += 1
    return positions

$>> findinindex ('your', list)

[1,5,9]

我需要使用字符串上的数字来获取ID,为此我有解决方案…但是正如你所记得的,我必须获得元素中每个单词的所有ID。

lst = [" ","1- make your choice", "2- put something and make", "3- make something happens", 
"4- giulio took his choice so make","5- make your choice", "6- put something and make", 
"7- make something happens", "8- giulio took his choice so make","9- make your choice", 
"10- put something and make", "11- make something happens", "12- giulio took his choice so make"]
diczio = {} 
abc = " ".join(lst).split(" ")
for x in lst:
    element = x
    for t in abc:
        if len(element) > 0:
            if t in element:
                xs = element.find("-")
                aw = element[0:xs]
                aw = int(aw)
                wer = set()
                wer.add(aw)
                diczio[t] = [wer]
print diczio

问题是我只得到了所有单词的一个ID,我把它们放在一个集合中(我的意思是wer = set())但我需要所有单词的ID:

1 -例如,对于单词'your',我只获取该单词所在的最后一篇文章的ID:

'your': [set(['9'])]

但是我需要:

'your': [set([1,5,9])]

2- ID 9是set中的字符串,我需要它在int中,但如果我试图将aw放入int中,我会得到一个错误:

aw = int(aw)
误差

ValueError: invalid literal for int() with base 10: ''

有什么建议吗?

最新更新