我有一个如下数据集:
name status number message
matt active 12345 [job: , money: none, wife: none]
james active 23456 [group: band, wife: yes, money: 10000]
adam inactive 34567 [job: none, money: none, wife: , kids: one, group: jail]
如何提取键值对,并将它们转换为一直向外扩展的数据帧?
预期产出:
name status number job money wife group kids
matt active 12345 none none none none none
james active 23456 none 10000 none band none
adam inactive 34567 none none none none one
该消息包含多种不同的密钥类型。
任何帮助将不胜感激。
这并不容易。
需要通过replace
将值转换为dict
的list
(s+
是一个或多个空格(,然后使用ast
。
然后可以使用DataFrame
构造函数与concat
,pop
从df
中删除列:
import ast
df.message = df.message.replace([':s+,','[', ']', ':s+', ',s+'],
['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife
0 NaN none NaN none NaN none
1 NaN NaN band NaN 10000 yes
2 one NaN jail none none none
df = pd.concat([df, df1], axis=1)
print (df)
name status number kids money group job money wife
0 matt active 12345 NaN none NaN none NaN none
1 james active 23456 NaN NaN band NaN 10000 yes
2 adam inactive 34567 one NaN jail none none none
编辑:
yaml
的另一个解决方案:
import yaml
df.message = df.message.replace(['[',']'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife
0 NaN None NaN none none
1 band NaN NaN 10000 True
2 jail none one none None
df = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife
0 matt active 12345 NaN None NaN none none
1 james active 23456 band NaN NaN 10000 True
2 adam inactive 34567 jail none one none None
你把它标记为一个列表,但说它是一个字典,所以这应该可以工作:
pd.concat([data.drop(['message'], axis=1), data['message'].apply(pd.Series)], axis=1)