我有这个数据集。
{
"date": "2018-01-01",
"body": "some txt",
"id": 111,
"sentiment": null
},
{
"date": "2018-01-02",
"body": "some txt",
"id": 112,
"sentiment": {
"basic": "Bearish"
}
}
我想用panda来阅读这篇文章,并将每行的列情感从null更改为null。
当我这样做时:
pd.read_json(path)
这就是我得到的结果:
body ... sentiment
0 None
1 {u'basic': u'Bullish'}
我不想要{u'basic': u'Bullish'}
,只想要基本的值。所以为了找到正确的行,我使用
df.loc[self.df['sentiment'].isnull() != True, 'sentiment'] = (?)
它起作用了,但我不知道我必须放什么来代替
我试过了,但不起作用
df.loc[self.df['sentiment'].isnull() != True, 'sentiment'] = df['sentiment']['basic]
有什么想法吗?感谢
您可以尝试:
mask = df['sentiment'].notnull()
df.loc[mask, 'sentiment'] = df.loc[mask, 'sentiment'].apply(lambda x: x['basic'])
您可以这样做:
df = pd.read_json(path) # creates the dataframe with dict objects in sentiment column
pd.concat([df.drop(['sentiment'], axis=1), df['sentiment'].apply(pd.Series)], axis=1) # create new columns for each sentiment type
例如,如果您的json是:
[{
"date": "2018-01-01",
"body": "some txt",
"id": 111,
"sentiment": null
},
{
"date": "2018-01-02",
"body": "some txt",
"id": 112,
"sentiment": {
"basic": "Bearish"
}
},
{
"date": "2018-01-03",
"body": "some other txt",
"id": 113,
"sentiment": {
"basic" : "Bullish",
"non_basic" : "Bearish"
}
}]
第1行之后的df:
body date id sentiment
0 some txt 2018-01-01 111 None
1 some txt 2018-01-02 112 {'basic': 'Bearish'}
2 some other txt 2018-01-03 113 {'basic': 'Bullish', 'non_basic': 'Bearish'}
第2行之后的df:
body date id basic non_basic
0 some txt 2018-01-01 111 NaN NaN
1 some txt 2018-01-02 112 Bearish NaN
2 some other txt 2018-01-03 113 Bullish Bearish
HTH。
fillna
+pop
+join
这里有一个可扩展的解决方案,它避免了按行apply
,并将任意数量的密钥转换为序列:
df = pd.DataFrame({'body': [0, 1],
'sentiment': [None, {u'basic': u'Bullish'}]})
df['sentiment'] = df['sentiment'].fillna(pd.Series([{}]*len(df.index), index=df.index))
df = df.join(pd.DataFrame(df.pop('sentiment').values.tolist()))
print(df)
body basic
0 0 NaN
1 1 Bullish