TypeError:字符串索引必须是整数;我如何在我的代码中解决这个问题?

我尝试遍历CSV中的所有单元格，从列'Text'开始，并创建一个名为'Type'的新列，其中我将使用多项式朴素贝叶斯预测生成的文本类型。

这是代码:

from sklearn.naive_bayes import MultinomialNB
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
dataset = pd.read_csv("Test.csv", encoding='latin-1')
clf = MultinomialNB()
cv = CountVectorizer()

for row in dataset:
text= row['Text']
data = cv.transform([text]).toarray()
output = clf.predict(data)
dataset['Type']=dataset[output]

这是我的错误:

text= row['Text']
TypeError: string indices must be integers

用于遍历数据帧行的方法不正确。这里

for row in dataset:

只返回第一行，它通常包含所有列名，这些列名通常是字符串。所以当我们这样做的时候:text= row['Text']它试图在索引'Text'处提取字符串，字符串索引只能是整数，因此出现错误。

eg: text= "abc"
>print(text[0]) #Output is 'a'. 
>print(text['abc']) #Error - string indices must be integers

因此，遍历行并提取所需列值的正确方法是:

for index,row in df.iterrows():
text= row["Text"]

有关iterrows函数的信息，请参阅:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html

相关内容

最新更新

热门标签：