如何计算嵌套json文件的平均值.MongoDb



我想为每个"show_id"计算列"score2"的平均值,并将这个新值插入一个名为"avg_score"的新字段中。

下面是我的json结构:

{'show_id': 's1026',
'type': 'TV Show',
'title': 'BoJack Horseman',
'country': 'United States',
'IMDB_Rating': 8.7,
'No_of_Votes': 113345,
'tweet': [{'_id': '60dae6af60d63d1fa250cb25',
'text': '@iitstrasha the end of the f***ing world, that 70s show, BoJack Horseman',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 2.0,},
{'_id': '60dae6b060d63d1fa251d422',
'text': "@longbothom BoJack Horseman, peaky blinders, The Crown, Orange is the new black, sherlock, Finding 'Ohana",
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 0.0},
{'_id': '60dae6b360d63d1fa258134a',
'text': 'merlin-is-dead: I’ve been rewatching BoJack Horseman for the first time since it’s finale. I felt like drawing a Di',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 0.8150000000000001, 'pos': 0.185, 'compound': 0.3612}",
'score2': 0.185}]}

这是我想要的json结构

{'show_id': 's1026',
'type': 'TV Show',
'title': 'BoJack Horseman',
'country': 'United States',
'IMDB_Rating': 8.7,
'No_of_Votes': 113345,
'avg_score' : MEAN OF ALL SCORE2,
'tweet': [{'_id': '60dae6af60d63d1fa250cb25',
'text': '@iitstrasha the end of the f***ing world, that 70s show, BoJack Horseman',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 2.0},
{'_id': '60dae6b060d63d1fa251d422',
'text': "@longbothom BoJack Horseman, peaky blinders, The Crown, Orange is the new black, sherlock, Finding 'Ohana",
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 1, 'pos': 0, 'compound': 0}",
'score2': 0.0},
{'_id': '60dae6b360d63d1fa258134a',
'text': 'merlin-is-dead: I’ve been rewatching BoJack Horseman for the first time since it’s finale. I felt like drawing a Di',
'hashtags': '[]',
'score1': "{'neg': 0, 'neu': 0.8150000000000001, 'pos': 0.185, 'compound': 0.3612}",
'score2': 0.185}]}

对于每一个系列或电影的标题,都有和推特一样多的行。感谢所有回复并祝您晚上愉快。

我不熟悉json语言,因此我会创建一个名为avgscore的非嵌套字段。通过这种方法,我可以回到pandas数据帧并对其进行处理

我试图使用这个查询,但它似乎不起作用:

results = database.aggregate([
{ '$group': {'_id': '$title', 'vote': { '$sum': "$tweet.score2" }} }
])
[result for result in results]

您必须使用$addFields管道stafe来执行所需的计算。

results = database.aggregate([
{
"$addFields": {
"avg_score": {
"$avg": {
"$map": {
"input": "$tweet",
"in": "$$this.score2"
}
},
}
}
},
])

Mongo游乐场示例执行

最新更新