熊猫字典列表以分隔列



我有一个如下数据集:

name    status    number   message
matt    active    12345    [job:  , money: none, wife: none]
james   active    23456    [group: band, wife: yes, money: 10000]
adam    inactive  34567    [job: none, money: none, wife:  , kids: one, group: jail]

如何提取键值对,并将它们转换为一直向外扩展的数据帧?

预期产出:

name    status   number    job    money    wife    group   kids 
matt    active   12345     none   none     none    none    none
james   active   23456     none   10000    none    band    none
adam    inactive 34567     none   none     none    none    one

该消息包含多种不同的密钥类型。

任何帮助将不胜感激。

这并不容易。

需要通过replace将值转换为dictlist(s+是一个或多个空格(,然后使用ast

然后可以使用DataFrame构造函数与concatpopdf中删除列:

import ast
df.message = df.message.replace([':s+,','[', ']', ':s+', ',s+'], 
                                ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
   kids  money group   job  money  wife
0   NaN   none   NaN  none    NaN  none
1   NaN    NaN  band   NaN  10000   yes
2   one    NaN  jail  none   none  none
df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number  kids  money group   job  money  wife
0   matt    active   12345   NaN   none   NaN  none    NaN  none
1  james    active   23456   NaN    NaN  band   NaN  10000   yes
2   adam  inactive   34567   one    NaN  jail  none   none  none

编辑:

yaml的另一个解决方案:

import yaml
df.message = df.message.replace(['[',']'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
  group   job kids  money  wife
0   NaN  None  NaN   none  none
1  band   NaN  NaN  10000  True
2  jail  none  one   none  None
df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number group   job kids  money  wife
0   matt    active   12345   NaN  None  NaN   none  none
1  james    active   23456  band   NaN  NaN  10000  True
2   adam  inactive   34567  jail  none  one   none  None

你把它标记为一个列表,但说它是一个字典,所以这应该可以工作:

pd.concat([data.drop(['message'], axis=1), data['message'].apply(pd.Series)], axis=1)

最新更新