将单列值熊猫化为具有格式化值的多列标题



我正试图根据格式为<column name>: <column value(s)>, ..., <column name>: <column value(s)>extra的字符串值将单列extra转换为三个新标题,其中column name是新列,column value(s)可以是任意列值,如list、float或string。

我正在处理以下数据帧:

import pandas as pd

df = pd.DataFrame(
{
"subject": [1,1],
"extra": ["category: app, datasets: ["X", "Y"], acc: [0.8, 0.9]",
"category: dev, datasets: ["Z", "Y"], acc: [0.7, 0.95]"],
}
)

期望输出:

subject category datasets          acc
0        1      app   [X, Y]   [0.8, 0.9]
1        1      dev   [Z, Y]  [0.7, 0.95]

然后CCD_ 6将给出最终期望的结果

subject category datasets   acc
0        1      app        X   0.8
0        1      app        Y   0.9
1        1      dev        Z   0.7
1        1      dev        Y  0.95

您可以使用pyyaml:

import yaml
extracted_df = pd.json_normalize(df['extra'].apply(lambda x: yaml.load(re.sub(r',s*(w+:)', 'n\1', x), Loader=yaml.SafeLoader)))
new_df = pd.concat([df.drop('extra', axis=1), extracted_df], axis=1)

输出:

>>> new_df
subject category datasets          acc
0        1      app   [X, Y]   [0.8, 0.9]
1        1      dev   [Z, Y]  [0.7, 0.95]

相关内容

  • 没有找到相关文章