熊猫中有趣的字典操作



这非常适合我需要的东西。似乎没有人回答:

所以,我有一个有趣的问题。我有一些数据有一个有趣的嵌套字典,我需要处理它,但遇到了麻烦。我可以在纯python中完成,但我想在Pandas中完成整个解决方案,这样可以保持代码的整洁,而不必在其他地方重新打开相同的文件。

我有以下数据帧:

Id             Timezone             Data
957643         Pacific             {"California":{"city":"San Francisco","pop":"874961"}, {"Oregon":{"city":"Portland","pop":"645291"}}
973472         Eastern             {"New York":{"city":"New York","pop":"8419000"},{"Maine":{"city":"Portland","pop":"66595"}}

所需的输出,字典列表,将Id和时区放入每个分解的字典中,同时将每个字典包装在另一个State Data键中,这样我就可以输出为JSON:

[{"State Data":{"State":"California","City":"San Francisco","Population":"874961","Id":"957643","Timezone":"Pacific"}}, {"State Data":{"State":"New York","City":"New York","Population":"8419000","Id":"973472","Timezone":"Eastern"}},{"State Data":{"State":"Oregon","City":"Portland","Population":"645291","Id":"957643","Timezone":"Pacific"}}, {"State Data":{"State":"Maine","City":"Portland","Population":"66595","Id":"973472","Timezone":"Eastern"}}]

问题是,为了在其他地方获得最终的数据格式,我需要将所有州放入自己的词典中并进行更新,以便该州附带一个州密钥。我尝试过iterrows方法,并使用轴1进行应用,但它最终将所有Id和时区放在每个字典中,并相应地进行更新。

当在整个CSV中阅读时,下面的变体在纯python中有效,但在Pandas中无效(原因对大多数人来说可能很明显(。

output = []
entry = {}
for id_, time, data in the_states.iterrows():
for state, other in data.items():
entry['Id'] = id_
entry['City'] = data.get('city')
entry['Timezone'] = time
entry['Population'] = data.get('pop')
entry['State'] = state
output.append({'State Data': entry})

如有任何帮助,我们将不胜感激。

您尝试过pd.to_dict((选项吗?你可以通过不同的方式来展示你的数据。orient=记录或orient=索引可能会对您有所帮助。文档在这里https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html.

最新更新