我正试图根据格式为<column name>: <column value(s)>, ..., <column name>: <column value(s)>
的extra
的字符串值将单列extra
转换为三个新标题,其中column name
是新列,column value(s)
可以是任意列值,如list、float或string。
我正在处理以下数据帧:
import pandas as pd
df = pd.DataFrame(
{
"subject": [1,1],
"extra": ["category: app, datasets: ["X", "Y"], acc: [0.8, 0.9]",
"category: dev, datasets: ["Z", "Y"], acc: [0.7, 0.95]"],
}
)
期望输出:
subject category datasets acc
0 1 app [X, Y] [0.8, 0.9]
1 1 dev [Z, Y] [0.7, 0.95]
然后CCD_ 6将给出最终期望的结果
subject category datasets acc
0 1 app X 0.8
0 1 app Y 0.9
1 1 dev Z 0.7
1 1 dev Y 0.95
您可以使用pyyaml
:
import yaml
extracted_df = pd.json_normalize(df['extra'].apply(lambda x: yaml.load(re.sub(r',s*(w+:)', 'n\1', x), Loader=yaml.SafeLoader)))
new_df = pd.concat([df.drop('extra', axis=1), extracted_df], axis=1)
输出:
>>> new_df
subject category datasets acc
0 1 app [X, Y] [0.8, 0.9]
1 1 dev [Z, Y] [0.7, 0.95]