import pandas as pd
inp= [{'c1null':10,'cols':{'c2':20,'c3time':null}, 'c4':'41'}, {'c1null':11,'cols':{'c2':null,'c3time':'2014-05-24 19:20'},'c4':'42'}, {'c1null':12,'cols':{'c2':20,'c3time':'2016-06-24 19:20'},'c4':'43'}]
df=pd.io.json.json_normalize(inp)
print(df)
inp
中有值 null
,然后上面的脚本无法成功地json_normilize获得以下的预期结果:
c1null c4 cols.c2 cols.c3time
0 10 41 20 NaT
1 11 42 NaN 2014-05-24 19:20
2 12 43 20 2016-06-24 19:20
现在,我使用pd.read_sql
获取数据框,需要将值null
替换为NaN
或NaT
,当键命名为*time
时,我们可以使用pd.io.json.json_normalize
。
如何将dataframe json字符串列中的值 null
替换为 NaN
或 NaT
?
尝试添加
from numpy import nan as null
inp= [{'c1':10,'cols':{'c2':20,'c3time':null}, 'c4':'41'}, {'c1':11,'cols':{'c2':null,'c3time':'2014-05-24 19:20'},'c4':'42'}, {'c1':12,'cols':{'c2':20,'c3time':'2016-06-24 19:20'},'c4':'43'}]
df=pd.io.json.json_normalize(inp)
df
Out[494]:
c1 c4 cols.c2 cols.c3time
0 10 41 20.0 NaN
1 11 42 NaN 2014-05-24 19:20
2 12 43 20.0 2016-06-24 19:20
df['cols.c3time']=pd.to_datetime(df['cols.c3time'])
df
Out[497]:
c1 c4 cols.c2 cols.c3time
0 10 41 20.0 NaT
1 11 42 NaN 2014-05-24 19:20:00
2 12 43 20.0 2016-06-24 19:20:00