如何在python中使用具有数据框架的节库的词序化?



我当前的数据库是:

# bibliotecas necessárias
import pandas as pd
dict_noticia = {'nome_adm': ['CC Brasil', 
'ABC Futuro Esporte',
'Tabuao'], 

'noticia': ["['folha', 'paulo', 'https', 'east', 'amazonaws', 'multclipp', 'arquivos', 'noticias', 'pdf', 'jpg', 'mônica', 'bergamo', 'longo', 'tempo']", 
"['coluna', 'estadão']",
"['flamengo', 'futebol','melhor','campeao','é']"]
}

df = pd.DataFrame(dict_noticia)
df

我需要一个新列,其引理为"news"列。下面的脚本给出错误:

import stanza
nlp_stanza = stanza.Pipeline(lang='pt', processors='tokenize,mwt,pos,lemma')
def f_lematizacao_stanza(df,column_name,new_column_name):
df[new_column_name] = df[column_name].apply(lambda x: ([w.lemma_ for w in nlp_stanza(row)]))
return df
f_lematizacao_stanza(data,'noticia','noticia_lema')

NameError: name 'row' is not defined

如何解决

提前谢谢你。

您没有定义变量row。您需要使用x:

def f_lematizacao_stanza(df,column_name,new_column_name):
df[new_column_name] = df[column_name].apply(lambda x: ([w.lemma_ for w in nlp_stanza(x)]))
return df

最新更新