我有一个Pandas DataFramedf
,列df['auc_all']
包含一个具有两个值的元组(例如(0.54, 0.044)
(
当我运行时:
type(df['auc_all'][0])
>>> str
然而,当我跑步时:
def convert_str_into_tuple(self, string):
splitted_tuple = string.split(',')
value1 = float(splitted_tuple[0][1:])
value2 = float(splitted_tuple[1][1:-1])
return (value1, value2)
df['auc_all'] = df['auc_all'].apply(convert_str_into_tuple)
我得到以下错误:
df = full_df.create_full()
Traceback (most recent call last):
File "<ipython-input-437-34fc05204bad>", line 18, in create_full
df['auc_all'] = df['auc_all'].apply(self.convert_str_into_tuple)
File "C:Users20200016Anaconda3libsite-packagespandascoreseries.py", line 4357, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "C:Users20200016Anaconda3libsite-packagespandascoreapply.py", line 1043, in apply
return self.apply_standard()
File "C:Users20200016Anaconda3libsite-packagespandascoreapply.py", line 1099, in apply_standard
mapped = lib.map_infer(
File "pandas_libslib.pyx", line 2859, in pandas._libs.lib.map_infer
File "<ipython-input-437-34fc05204bad>", line 63, in convert_str_into_tuple
splitted_tuple = string.split(',')
AttributeError: 'tuple' object has no attribute 'split'
这似乎表明该单元格包含一个元组。
但是:
df['auc'][0][0]
>>> '('
变量类型似乎根据我使用它的位置而变化。这真的发生了吗?
如果列包含字符串形式的元组,请使用pd.eval
:
df['auc_all'] = pd.eval(df['auc_all'])
示例:
# df = pd.DataFrame({'auc_all': ['(0.54, 0.044)']})
>>> df
auc_all
0 (0.54, 0.044)
>>> type(df['auc_all'][0])
str
# df['auc_all'] = pd.eval(df['auc_all'])
>>> df
auc_all
0 [0.54, 0.044]
>>> type(df['auc_all'][0])
list
缺点是您的元组被转换为列表,但您可以使用ast
模块中的literal_eval
:
# import ast
# df['auc_all'] = df['auc_all'].apply(ast.literal_eval)
>>> df
auc_all
0 (0.54, 0.044)
>>> type(df['auc_all'][0])
tuple