使用json数据集Pandas中的行值,在特定条件下更改行值



我有这个数据集。

{
"date": "2018-01-01", 
"body": "some txt", 
"id": 111, 
"sentiment": null
}, 
{
"date": "2018-01-02", 
"body": "some txt", 
"id": 112, 
"sentiment": {
"basic": "Bearish"
}
}

我想用panda来阅读这篇文章,并将每行的列情感从null更改为null。

当我这样做时:

pd.read_json(path)

这就是我得到的结果:

body           ...    sentiment
0                      None
1                      {u'basic': u'Bullish'}

我不想要{u'basic': u'Bullish'},只想要基本的值。所以为了找到正确的行,我使用

df.loc[self.df['sentiment'].isnull() != True, 'sentiment'] = (?)

它起作用了,但我不知道我必须放什么来代替

我试过了,但不起作用

df.loc[self.df['sentiment'].isnull() != True, 'sentiment'] = df['sentiment']['basic]

有什么想法吗?感谢

您可以尝试:

mask = df['sentiment'].notnull()
df.loc[mask, 'sentiment'] = df.loc[mask, 'sentiment'].apply(lambda x: x['basic'])

您可以这样做:

df = pd.read_json(path)  # creates the dataframe with dict objects in sentiment column 
pd.concat([df.drop(['sentiment'], axis=1), df['sentiment'].apply(pd.Series)], axis=1)  # create new columns for each sentiment type

例如,如果您的json是:

[{
"date": "2018-01-01", 
"body": "some txt", 
"id": 111, 
"sentiment": null
}, 
{
"date": "2018-01-02", 
"body": "some txt", 
"id": 112, 
"sentiment": {
"basic": "Bearish"
}
},
{
"date": "2018-01-03", 
"body": "some other txt", 
"id": 113, 
"sentiment": {
"basic" : "Bullish",
"non_basic" : "Bearish"
}
}]

第1行之后的df:

body       date   id                                     sentiment
0        some txt 2018-01-01  111                                          None
1        some txt 2018-01-02  112                          {'basic': 'Bearish'}
2  some other txt 2018-01-03  113  {'basic': 'Bullish', 'non_basic': 'Bearish'}

第2行之后的df:

body       date   id    basic non_basic
0        some txt 2018-01-01  111      NaN       NaN
1        some txt 2018-01-02  112  Bearish       NaN
2  some other txt 2018-01-03  113  Bullish   Bearish

HTH。

fillna+pop+join

这里有一个可扩展的解决方案,它避免了按行apply,并将任意数量的密钥转换为序列:

df = pd.DataFrame({'body': [0, 1],
'sentiment': [None, {u'basic': u'Bullish'}]})
df['sentiment'] = df['sentiment'].fillna(pd.Series([{}]*len(df.index), index=df.index))
df = df.join(pd.DataFrame(df.pop('sentiment').values.tolist()))
print(df)
body    basic
0     0      NaN
1     1  Bullish

最新更新