我正在尝试在我的数据集上实现文本的"extract.subject_verb_object_triples"功能。但是，我编写的代码非常慢且占用大量内存。有没有更有效的实现？

import spacy
import textacy
def extract_SVO(text):
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    tuples = textacy.extract.subject_verb_object_triples(doc)
    tuples_to_list = list(tuples)
    if tuples_to_list != []:
        tuples_list.append(tuples_to_list)
tuples_list = []          
sp500news['title'].apply(extract_SVO)
print(tuples_list)

示例数据（sp500news）

    date_publish  
0       2013-05-14 17:17:05   
1       2014-05-09 20:15:57   
4       2018-07-19 10:29:54   
6       2012-04-17 21:02:54   
8       2012-12-12 20:17:56   
9       2018-11-08 10:51:49   
11      2013-08-25 07:13:31   
12      2015-01-09 00:54:17   
 title  
0       Italy will not dismantle Montis labour reform  minister                            
1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                
4       Xis campaign to draw people back to graying rural China faces uphill battle        
6       Romney begins to win over conservatives                                            
8       Oregon mall shooting survivor in serious condition                                 
9       Polands PGNiG to sign another deal for LNG supplies from US CEO                    
11      Australias opposition leader pledges stronger economy if elected PM                
12      New York shifts into Code Blue to get homeless off frigid streets

这应该会加快速度 -

import spacy
import textacy
nlp = spacy.load('en_core_web_sm')
def extract_SVO(text):
    tuples = textacy.extract.subject_verb_object_triples(text)
    if tuples:
        tuples_to_list = list(tuples)
        tuples_list.append(tuples_to_list)
tuples_list = []          
sp500news['title'] = sp500news['title'].apply(nlp)
_ = sp500news['title'].apply(extract_SVO)
print(tuples_list)

解释

在 OP 输入中，nlp = spacy.load('en_core_web_sm') 是从每次加载的函数内部调用的。我觉得这是最大的瓶颈。这可以取出来，它应该加快速度。

此外，仅当元组不为空时，才能tuple转换为list。

更有效地实现文本/空间'subject_verb_object_triples'

示例数据（sp500news）

相关内容

最新更新

热门标签：

更有效地实现文本/空间'subject_verb_object_triples'

示例数据 （sp500news）

相关内容

最新更新

热门标签：

示例数据（sp500news）