ValueError:设置具有序列的数组元素(将列表分配给新建列)



我有这样的代码,它基本上读取一个csv文件,在csv文件的一列中找到类似的单词(这是一个字符串文本(和我创建的关键字词典,然后将单词作为列表返回。

import pandas as pd
import re
from nltk.tokenize.treebank import TreebankWordDetokenizer
from langdetect import detect
from sentiment_analysis_spanish import sentiment_analysis
from textblob import TextBlob
import unidecode
df1=pd.read_csv('TFG1.csv', encoding = 'utf8')
def find_all_words(words, sentence):
all_words = re.findall(r'w+', sentence)
words_found = []
for word in words:
if word in all_words:
words_found.append(word)
return "Words found:", words_found.__len__(), " The words are:", words_found

english_dic=['sage', 'selection']
spanish_dic=['grupo', 'bien']
df1["Reescribe aquí / Rewrite here"].apply(unidecode.unidecode) #para quitar acentos
TreebankWordDetokenizer().detokenize(df1["Reescribe aquí / Rewrite here"])
i=1
f=0
df1["Words count"]=0
df1["Words found"] = None
for rows in [x.lower() for x in df1["Reescribe aquí / Rewrite here"]]:
if detect(rows)=='en':
df1["Words count"].iloc[f]=find_all_words(english_dic, rows)[1]
df1["Words found"].iloc[f]=find_all_words(english_dic, rows)[3]
print(i, "-", rows, find_all_words(english_dic, rows))
elif detect(rows)=='es':
df1["Words count"].iloc[f]=find_all_words(spanish_dic, rows)[1]
df1["Words found"].iloc[f]=find_all_words(spanish_dic, rows)[3]
print(i, "-", rows, find_all_words(spanish_dic, rows))
f+=1
i+=1

函数find_all_words((返回2个内容;找到的单词数,例如,如果字典I有english_dic=['sage', 'selection'],并且文本为sage表示选择很好它将返回:找到的单词:2单词是:[sage,selection]

但是,当我尝试将列表分配给一个新列时,它会给我一个错误。

df1["Words found"].iloc[f]=find_all_words(english_dic, rows)[3]

它返回此错误:ValueError:设置具有序列的数组元素

你对此有什么解决方案吗?

在尝试初始化'Words found'列之前,将其类型设置为对象:

...
df1['Words found'] = df1['Words found'].astype(object)
df1['Words found'] = None

编辑:如果这是第一次引用'Words found',则首先创建列:

df1['Words found'] = None
df1['Words found'] = df1['Words found'].astype(object)

最新更新