我有大量的句子,我想从中提取以某些单词组合开头的子句子。例如,我想提取以"what"或"whatis"等开头的句子片段(基本上从单词对之前出现的句子中删除单词(。句子和单词对都存储在一个DataFrame
中:
'Sentence' 'First2'
0 If this is a string what does it say? 0 can I
1 And this is a string, should it say more? 1 should it
2 This is yet another string. 2 what does
3 etc. etc. 3 etc. etc
我想要从上面的例子中得到的结果是:
0 what does it say?
1 should it say more?
2
下面最明显的解决方案(至少对我来说(不起作用。它只使用第一个词对b
来遍历r
的所有句子,而不是其他b
。
a = df['Sentence']
b = df['First2']
#The function seems to loop over all r's but only over the first b:
def func(z):
for x in b:
if x in r:
s = z[z.index(x):]
return s
else:
return ‘’
df['Segments'] = a.apply(func)
似乎以这种方式同时循环访问两个数据帧是行不通的。有没有更有效和更有效的方法可以做到这一点?
else:
return ''
这意味着如果第一次比较不是匹配项,"func"将立即返回。这可能就是代码不返回任何匹配项的原因。
示例工作代码如下:
# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
for first_two in first_twos:
if first_two in sentence:
s = sentence[sentence.index(first_two):]
return s
return ''
df['Segments'] = a.apply(func)
和输出:
df:
{
'First2': ['can I', 'should it', 'what does'],
'Segments': ['what does it say? ', 'should it say more?', ''],
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string. ' ]
}
您可以通过zip(iterator,iterator_foo)
轻松循环两件事
我的问题由以下代码回答:
def func(r):
for i in b:
if i in r:
q = r[r.index(i):]
return q
return ''
df['Segments'] = a.apply(func)
这里由卢大明指出了解决方案(只有最后一行与他的不同(。问题出在原始代码的最后两行:
else:
return ''
这导致函数过早返回。卢大明的答案比蟒蛇循环只执行一次的可能重复问题的答案要好吗?这造成了其他问题 - 正如我对 Wii 的回复中所解释的那样。(所以我不确定我的是否真的是重复的。