类型为"float"的对象在切片熊猫数据帧 json 列时没有 len() 错误



我有这样的数据。在每一列中,都有不同长度的值/键。有些行也是NaN。

like                                match
0   [{'timestamp', 'type'}]              [{'timestamp', 'type'}]
1   [{'timestamp', 'comment', 'type'}]   [{'timestamp', 'type'}]
2   NaN                                 NaN

我想把这些列表分成它们自己的列。我想保留所有的数据(如果缺少,则将其设为NaN(。我试着遵循这个教程并这样做:

df1 = pd.DataFrame(df['like'].values.tolist())
df1.columns = 'like_'+ df1.columns
df2 = pd.DataFrame(df['match'].values.tolist())
df2.columns = 'match_'+ df2.columns
col = df.columns.difference(['like','match'])
df = pd.concat([df[col], df1, df2],axis=1)

我得到这个错误。

Traceback (most recent call last):
File "link to my file", line 12, in <module>
df1 = pd.DataFrame(df['like'].values.tolist())
File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 509, in __init__
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 524, in to_arrays
return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 561, in _list_to_arrays
content = list(lib.to_object_array(data).T)
File "pandas/_libs/lib.pyx", line 2448, in pandas._libs.lib.to_object_array
TypeError: object of type 'float' has no len()

有人能帮我理解我做错了什么吗?

您不能在NaN上执行values.tolist()。如果删除那一行NaN,就可以解决这个问题。但是你的前缀行失败了。有关前缀,请参见此。https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add_prefix.html

最新更新