熊猫中的规范化错误,Jupyter Notebook



我是Python和Tensorflow的新手。我创建了一个使用 Pandas 和现有 CSV 文件进行简单分类的示例。但是为了规范化CSV中的列,我收到以下错误。

调试 IDE:Jupyter 笔记本

import pandas as pd
patients = pd.read_csv("../npushpakaran/TENSORFLOW/Tensorflow-Bootcamp-master/02-TensorFlow-Basics/pima-indians-diabetes.csv")
patients.columns
Index(['Number_pregnant', 'Glucose_concentration', 'Blood_pressure', 'Triceps',
'Insulin', 'BMI', 'Pedigree', 'Age', 'Class', 'Group'],
dtype='object')
cols_to_norm =['Number_pregnant', 'Glucose_concentration', 'Blood_pressure', 'Triceps',
'Insulin', 'BMI', 'Pedigree', 'Age', 'Class', 'Group'] 

patients[cols_to_norm] = patients[cols_to_norm].apply(lambda x: (x- x.min())/(x.max()-x.min()))

在最后一行中,我收到以下错误。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y)
1008         try:
-> 1009             result = expressions.evaluate(op, str_rep, x, y, **eval_kwargs)
1010         except TypeError:
~/anaconda3/lib/python3.6/site-packages/pandas/core/computation/expressions.py in evaluate(op, op_str, a, b, use_numexpr, **eval_kwargs)
204     if use_numexpr:
--> 205         return _evaluate(op, op_str, a, b, **eval_kwargs)
206     return _evaluate_standard(op, op_str, a, b)
~/anaconda3/lib/python3.6/site-packages/pandas/core/computation/expressions.py in _evaluate_numexpr(op, op_str, a, b, truediv, reversed, **eval_kwargs)
119     if result is None:
--> 120         result = _evaluate_standard(op, op_str, a, b)
121 
~/anaconda3/lib/python3.6/site-packages/pandas/core/computation/expressions.py in _evaluate_standard(op, op_str, a, b, **eval_kwargs)
64     with np.errstate(all='ignore'):
---> 65         return op(a, b)
66 
TypeError: unsupported operand type(s) for -: 'str' and 'str'
During handling of the above exception, another exception occurred:
TypeError                                 Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py in safe_na_op(lvalues, rvalues)
1029             with np.errstate(all='ignore'):
-> 1030                 return na_op(lvalues, rvalues)
1031         except Exception:
~/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y)
1019                 mask = notna(x)
-> 1020                 result[mask] = op(x[mask], y)
1021 
TypeError: unsupported operand type(s) for -: 'str' and 'str'
During handling of the above exception, another exception occurred:
TypeError                                 Traceback (most recent call last)
<ipython-input-22-44ad2490d2ae> in <module>()
----> 1 patients[cols_to_norm] = patients[cols_to_norm].apply(lambda x: (x- x.min())/(x.max()-x.min()))
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
6002                          args=args,
6003                          kwds=kwds)
-> 6004         return op.get_result()
6005 
6006     def applymap(self, func):
~/anaconda3/lib/python3.6/site-packages/pandas/core/apply.py in get_result(self)
316                                       *self.args, **self.kwds)
317 
--> 318         return super(FrameRowApply, self).get_result()
319 
320     def apply_broadcast(self):
~/anaconda3/lib/python3.6/site-packages/pandas/core/apply.py in get_result(self)
140             return self.apply_raw()
141 
--> 142         return self.apply_standard()
143 
144     def apply_empty_result(self):
~/anaconda3/lib/python3.6/site-packages/pandas/core/apply.py in apply_standard(self)
246 
247         # compute the result using the series generator
--> 248         self.apply_series_generator()
249 
250         # wrap results
~/anaconda3/lib/python3.6/site-packages/pandas/core/apply.py in apply_series_generator(self)
275             try:
276                 for i, v in enumerate(series_gen):
--> 277                     results[i] = self.f(v)
278                     keys.append(v.name)
279             except Exception as e:
<ipython-input-22-44ad2490d2ae> in <lambda>(x)
----> 1 patients[cols_to_norm] = patients[cols_to_norm].apply(lambda x: (x- x.min())/(x.max()-x.min()))
~/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py in wrapper(left, right)
1064             rvalues = rvalues.values
1065 
-> 1066         result = safe_na_op(lvalues, rvalues)
1067         return construct_result(left, result,
1068                                 index=left.index, name=res_name, dtype=None)
~/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py in safe_na_op(lvalues, rvalues)
1032             if is_object_dtype(lvalues):
1033                 return libalgos.arrmap_object(lvalues,
-> 1034                                               lambda x: op(x, rvalues))
1035             raise
1036 
pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.arrmap_object()
~/anaconda3/lib/python3.6/site-packages/pandas/core/ops.py in <lambda>(x)
1032             if is_object_dtype(lvalues):
1033                 return libalgos.arrmap_object(lvalues,
-> 1034                                               lambda x: op(x, rvalues))
1035             raise
1036 
TypeError: ("unsupported operand type(s) for -: 'str' and 'str'", 'occurred at index Group')

任何人有任何想法,请帮忙。

您不需要将pd.DataFrame.apply与自定义函数一起使用。相反,使用 Pandas 中可用的矢量化方法:

cols = cols_to_norm
df_sub = df.loc[:, cols]
df.loc[:, cols] = (df_sub - df_sub.min()) / (df_sub.max() - df_sub.min())

最新更新