如何根据词对的存在选择子字符串?蟒



我有大量的句子,我想从中提取以某些单词组合开头的子句子。例如,我想提取以"what"或"whatis"等开头的句子片段(基本上从单词对之前出现的句子中删除单词(。句子和单词对都存储在一个DataFrame中:

'Sentence'                                    'First2'                                    
0  If this is a string what does it say?      0 can I    
1  And this is a string, should it say more?  1 should it    
2  This is yet another string.                2 what does
3  etc. etc.                                  3 etc. etc

我想要从上面的例子中得到的结果是:

0 what does it say?
1 should it say more?
2

下面最明显的解决方案(至少对我来说(不起作用。它只使用第一个词对b来遍历r的所有句子,而不是其他b

a = df['Sentence']
b = df['First2'] 
#The function seems to loop over all r's but only over the first b:
def func(z): 
    for x in b:
        if x in r:
            s = z[z.index(x):] 
            return s
        else:
            return ‘’
df['Segments'] = a.apply(func)

似乎以这种方式同时循环访问两个数据帧是行不通的。有没有更有效和更有效的方法可以做到这一点?

我相信你的代码中有一个错误。

else:
    return ''

这意味着如果第一次比较不是匹配项,"func"将立即返回。这可能就是代码不返回任何匹配项的原因。

示例工作代码如下:

# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
    for first_two in first_twos:
        if first_two in sentence:
            s = sentence[sentence.index(first_two):]
            return s
    return ''
df['Segments'] = a.apply(func)

和输出:

df:   
{   
'First2': ['can I', 'should it', 'what does'],   
'Segments': ['what does it say? ', 'should it say more?', ''],   
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string.  '  ]  
} 

您可以通过zip(iterator,iterator_foo)轻松循环两件事

我的问题由以下代码回答:

def func(r):
    for i in b:
        if i in r:
            q = r[r.index(i):]
            return q
    return ''
df['Segments'] = a.apply(func)

这里由卢大明指出了解决方案(只有最后一行与他的不同(。问题出在原始代码的最后两行:

else:
    return ''  

这导致函数过早返回。卢大明的答案比蟒蛇循环只执行一次的可能重复问题的答案要好吗?这造成了其他问题 - 正如我对 Wii 的回复中所解释的那样。(所以我不确定我的是否真的是重复的。

最新更新