如何在Panda中拆分一组相关列



我希望能帮助我。我有一个数据帧。它有两列(CONFIRM_STATUS和OUTCOME(,它们的组合会影响第三列(VALUE(的逻辑显示。

CONFIRM_STATUS有4个唯一值

result1 = df['CONFIRM_STATUS'].unique()
result1
array(['CONFIRMED', 'PROBABLE', 'SUSPECTED', 'TOTAL'], dtype=object)

OUTCOME有2个唯一值

result2 = df['OUTCOME'].unique()
result2
array(['CASE', 'DEATH'], dtype=object)

因此,我有8个唯一的组合,它们直接影响value列数值的含义。我需要将这些组合转换为8列,以便每个列显示其中一个组合。相对而言:死亡,康复,。。。

熊猫怎么能做到这一点?我知道,结果不是很详细,这里是这几个字段的截图。

EVENT_NAME  SOURCE  DATE_LOW    DATE_HIGH   DATE_REPORT DATE_TYPE   SPATIAL_RESOLUTION  AL0_CODE    AL0_NAME    AL1_CODE    AL1_NAME    AL2_NAME    AL3_NAME    LOCALITY_NAME   LOCATION_TYPE   CONFIRM_STATUS  OUTCOME CUMULATIVE_FLAG VALUE
2752    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   CASE    False   0
2753    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   CASE    True    0
2754    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   DEATH   False   0
2755    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    CONFIRMED   DEATH   True    0
2756    nCoV_2019   WHO COVID-19 Overview   2020-01-03  2020-01-03  2020-01-03  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    PROBABLE    CASE    False   0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4494958 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    SUSPECTED   DEATH   False   0
4494959 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   CASE    False   24581
4494960 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   CASE    True    2089329
4494961 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   DEATH   False   401
4494962 nCoV_2019   WHO COVID-19 Overview   2020-11-22  2020-11-22  2020-11-22  Authority notification  AL0 RU  Russian Federation  NaN NaN NaN NaN NaN Clinical care sought    TOTAL   DEATH   True    36179

我没有重建您的数据帧,但您应该可以像本例一样创建8个新列(我只显示了两个(。您可以更喜欢创建组合和构建列,但如果只有八个,只需简单地编写代码即可。

df[['CASE_CONFIRMED', 'CASE_PROBABLE']] = ''

一旦有了列,只需搜索这两列,并将该特定列设置为VALUE。

df.loc[(df['CONFIRM_STATUS'] == 'CONFIRMED') & (df['OUTCOME'] == 'CASE'}, 'CASE_CONFIRMED' ]] = df['VALUE']
df.loc[(df['CONFIRM_STATUS'] == 'PROBABLE') & (df['OUTCOME'] == 'CASE'}, 'CASE_PROBABLE' ]] = df['VALUE']

如果不起作用,请使用df.head(15(.to_json((.粘贴部分数据集

最新更新