我有一个这样的列表,其中每个元素的字符串中的第一个数字恰好是每个元素的索引:
list = [" ","1- make your choice", "2- put something and make", "3- make something happens", "4- giulio took his choice so make","5- make your choice", "6- put something and make", "7- make something happens", "8- giulio took his choice so make","9- make your choice", "10- put something and make", "11- make something happens", "12- giulio took his choice so make"]
我想返回元素列表中每个单词所在的"列表元素"的索引:
for x in list:
....
我的意思是像这样:
position_of_word_in_all_elements_list = set("make": 1,2,3,4,5,6,7,8,9,10,11,12)
position_of_word_in_all_elements_list = set("your": 1,5,9)
position_of_word_in_all_elements_list = set("giulio":4,8,12)
有什么建议吗?
这将查找输入中所有字符串的出现情况,甚至包括"1-"等。但是从结果中过滤不喜欢的记录应该不是什么大问题:
# find the set of all words (sequences separated by a space) in input
s = set(" ".join(list).split(" "))
# for each word go through input and add index to the
# list if word is in the element. output list into a dict with
# the word as a key
res = dict((key, [ i for i, value in enumerate(list) if key in value.split(" ")]) for key in s)
{":[0],"one_answers":[2、6、10],"8 -":[8],"11 -":[11],"6 -":[6],"东西":[2、3、6、7、10、11),"你":(1、5、9),"发生":[3、7、11],"朱里奥":[4、8、12],"使":[1,2,3,4,5,6,7,8,9,10,11,12],"4 -":[4],"2 -":[2],"他":[4、8、12],"9 -":[9],"10 -":[10],"7":[7],"12":[12],"花":[4、8、12],"把":[2、6、10],"选择":(1、4、5、8、9、12],"5 -":[5],"所以":[4、8、12],"3 -":[3],"1 -":[1]}
首先重命名你的列表,以免干扰Python内置的东西所以
>>> from collections import defaultdict
>>> li = [" ","1- make your choice", "2- put something and make", "3- make something happens", "4- giulio took his choice so make","5- make your choice", "6- put something and make", "7- make something happens", "8- giulio took his choice so make","9- make your choice", "10- put something and make", "11- make something happens", "12- giulio took his choice so make"]`
>>> dd = defaultdict(list)
>>> for l in li:
try: # this is ugly hack to skip the " " value
index,words = l.split('-')
except ValueError:
continue
word_list = words.strip().split()
for word in word_list:
dd[word].append(index)
>>> dd['make']
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
defaultdict的作用:只要关键字(在本例中是单词)存在于字典中,它就像普通字典一样工作。如果键不存在,它创建它,它的值对应于,在我们的例子中是空列表,当你声明它dd = defaultdict(list)
时指定。我不是最好的解释者,所以我建议在其他地方默认阅读,如果它不清楚:)
@Oleg写了一个很棒的书呆子解决方案。对于这个问题,我想出了以下简单的方法。
def findIndex(st, lis):
positions = []
j = 0
for x in lis:
if st in x:
positions.append(j)
j += 1
return positions
$>> findinindex ('your', list)
[1,5,9]
我需要使用字符串上的数字来获取ID,为此我有解决方案…但是正如你所记得的,我必须获得元素中每个单词的所有ID。
lst = [" ","1- make your choice", "2- put something and make", "3- make something happens",
"4- giulio took his choice so make","5- make your choice", "6- put something and make",
"7- make something happens", "8- giulio took his choice so make","9- make your choice",
"10- put something and make", "11- make something happens", "12- giulio took his choice so make"]
diczio = {}
abc = " ".join(lst).split(" ")
for x in lst:
element = x
for t in abc:
if len(element) > 0:
if t in element:
xs = element.find("-")
aw = element[0:xs]
aw = int(aw)
wer = set()
wer.add(aw)
diczio[t] = [wer]
print diczio
问题是我只得到了所有单词的一个ID,我把它们放在一个集合中(我的意思是wer = set())但我需要所有单词的ID:
1 -例如,对于单词'your',我只获取该单词所在的最后一篇文章的ID:
'your': [set(['9'])]
但是我需要:
'your': [set([1,5,9])]
2- ID 9是set中的字符串,我需要它在int中,但如果我试图将aw放入int中,我会得到一个错误:
aw = int(aw)
误差ValueError: invalid literal for int() with base 10: ''
有什么建议吗?