当我将(NLTK)停止字应用于数据帧时,它是否显示错误


Reviews                                               Label
0   Bromwell High is a cartoon comedy. It ran at t...   Positive
1   Homelessness (or Houselessness as George Carli...   Positive
2   Brilliant over-acting by Lesley Ann Warren. Be...   Positive

上面的一个是我的数据框架,有列:当我超过下面的代码时,评论和标签:`

nltk.download('stopwords') This is used to update stop words.
from nltk.corpus import stopwords
stop = stopwords.words('english')
final_without_stopwords = final[['Reviews','Label']].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)])).str.replace('[^ws]','')
print(final_without_stopwords)`

结果:

KeyError                                  Traceback (most recent call last)
~Anaconda3libsite-packagespandascoreindexesbase.py in get_loc(self, key, method, tolerance)
3077             try:
-> 3078                 return self._engine.get_loc(key)
3079             except KeyError:
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('Reviews', 'Label')
During handling of the above exception, another exception occurred:
KeyError                                  Traceback (most recent call last)
<ipython-input-52-cb4ca290db84> in <module>()
5 #final['Reviews'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop_words)]))
6 
----> 7 final_without_stopwords = final['Reviews','Label'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)])).str.replace('[^ws]','')
8 print(final_without_stopwords)
~Anaconda3libsite-packagespandascoreframe.py in __getitem__(self, key)
2686             return self._getitem_multilevel(key)
2687         else:
-> 2688             return self._getitem_column(key)
2689 
2690     def _getitem_column(self, key):
~Anaconda3libsite-packagespandascoreframe.py in _getitem_column(self, key)
2693         # get column
2694         if self.columns.is_unique:
-> 2695             return self._get_item_cache(key)
2696 
2697         # duplicate columns & possible reduce dimensionality
~Anaconda3libsite-packagespandascoregeneric.py in _get_item_cache(self, item)
2487         res = cache.get(item)
2488         if res is None:
-> 2489             values = self._data.get(item)
2490             res = self._box_item_values(item, values)
2491             cache[item] = res
~Anaconda3libsite-packagespandascoreinternals.py in get(self, item, fastpath)
4113 
4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
4116             else:
4117                 indexer = np.arange(len(self.items))[isna(self.items)]
~Anaconda3libsite-packagespandascoreindexesbase.py in get_loc(self, key, method, tolerance)
3078                 return self._engine.get_loc(key)
3079             except KeyError:
-> 3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081 
3082         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libsindex.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libshashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: ('Reviews', 'Label')
enter code here

**

实际上,我想将停止字应用于只有两列的数据帧。当我用单列(评论(来完善这个代码时,它运行得很好但当我在两个专栏(评论和标签(中表现出色时,它正在显示有些错误。关于如何处理这两列代码的任何建议。

**

如果要将函数元素化应用于数据帧,请使用applymap:

一个简化的例子:

import pandas as pd
stop = set(['a','the','i','is'])
df = pd.DataFrame( {'sentence1':['i am a boy','i am a girl'],
'sentence2':['Bromwell High is a cartoon comedy','i am a girl']})
df[['sentence1','sentence2']].applymap(lambda x: ' '.join(i for i in x.split() if i not in stop))

sentence1    sentence2
0   am boy       Bromwell High cartoon comedy
1   am girl      am girl

如果您想将不带停止字的值重新分配到数据帧中,请使用:

df[['sentence1','sentence2']] = df[['sentence1','sentence2']].applymap(lambda x: ' '.join(i for i in x.split() if i not in stop))

最新更新