我有csv文件:
df=pd.read_csv(Path(os.getcwd()+r'all_files.csv'), sep=',', on_bad_lines='skip', index_col=False, dtype='unicode')
列:
column=input("Column:")
词:
word=input("Word:")
我想过滤一个csv文件:
df2=df[(df[column].dropna().str.contains(word.lower()))]
但是当我写信给专栏:ЄДРПОУ(Гр.8)
出现错误:
Warning (from warnings module):
File "C:pythonpythonFilterExcelFiles.py", line 35
df2=df[(df[column].dropna().str.contains(word.lower()))]
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Traceback (most recent call last):
File "C:pythonpythonFilterExcelFiles.py", line 51, in <module>
s()
File "C:pythonpythonFilterExcelFiles.py", line 35, in s
df2=df[(df[column].dropna().str.contains(word.lower()))]
File "C:UsersСтаниславAppDataLocalProgramsPythonPython310libsite-packagespandascoreframe.py", line 3496, in __getitem__
return self._getitem_bool_array(key)
File "C:UsersСтаниславAppDataLocalProgramsPythonPython310libsite-packagespandascoreframe.py", line 3549, in _getitem_bool_array
key = check_bool_indexer(self.index, key)
File "C:UsersСтаниславAppDataLocalProgramsPythonPython310libsite-packagespandascoreindexing.py", line 2383, in check_bool_indexer
raise IndexingError(
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
和我将降低df[column]
您在索引器中删除了NaN,使其可能更短,从而导致布尔索引错误。
不要下拉,NaN无论如何都会为False:
df2 = df[df[column].str.contains(word.lower())]
或者,如果你有一个操作将返回nan,可以填补他们False
:
df2 = df[df[column].str.contains(word.lower()).fillna(False)]
我已经搜索了一个答案,我遇到了一个类似的帖子,可能有解决你的问题。
根据上面提到的帖子,这个错误的原因是由于Python的编码,通常是ascii
;可以通过以下命令检查编码:
import sys
sys.getdefaultencoding()
来解决你的问题,你需要改变UTF-8
,使用以下!
import sys
reload(sys) # Note this line is essential for the change
sys.setdefaultencoding('utf-8')
我想把最初的解决方案归功于@jochietoch