我希望能帮助我。我有一个数据帧。它有两列(CONFIRM_STATUS和OUTCOME(,它们的组合会影响第三列(VALUE(的逻辑显示。
CONFIRM_STATUS有4个唯一值
result1 = df['CONFIRM_STATUS'].unique()
result1
array(['CONFIRMED', 'PROBABLE', 'SUSPECTED', 'TOTAL'], dtype=object)
OUTCOME有2个唯一值
result2 = df['OUTCOME'].unique()
result2
array(['CASE', 'DEATH'], dtype=object)
因此,我有8个唯一的组合,它们直接影响value列数值的含义。我需要将这些组合转换为8列,以便每个列显示其中一个组合。相对而言:死亡,康复,。。。
熊猫怎么能做到这一点?我知道,结果不是很详细,这里是这几个字段的截图。
EVENT_NAME SOURCE DATE_LOW DATE_HIGH DATE_REPORT DATE_TYPE SPATIAL_RESOLUTION AL0_CODE AL0_NAME AL1_CODE AL1_NAME AL2_NAME AL3_NAME LOCALITY_NAME LOCATION_TYPE CONFIRM_STATUS OUTCOME CUMULATIVE_FLAG VALUE
2752 nCoV_2019 WHO COVID-19 Overview 2020-01-03 2020-01-03 2020-01-03 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought CONFIRMED CASE False 0
2753 nCoV_2019 WHO COVID-19 Overview 2020-01-03 2020-01-03 2020-01-03 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought CONFIRMED CASE True 0
2754 nCoV_2019 WHO COVID-19 Overview 2020-01-03 2020-01-03 2020-01-03 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought CONFIRMED DEATH False 0
2755 nCoV_2019 WHO COVID-19 Overview 2020-01-03 2020-01-03 2020-01-03 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought CONFIRMED DEATH True 0
2756 nCoV_2019 WHO COVID-19 Overview 2020-01-03 2020-01-03 2020-01-03 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought PROBABLE CASE False 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4494958 nCoV_2019 WHO COVID-19 Overview 2020-11-22 2020-11-22 2020-11-22 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought SUSPECTED DEATH False 0
4494959 nCoV_2019 WHO COVID-19 Overview 2020-11-22 2020-11-22 2020-11-22 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought TOTAL CASE False 24581
4494960 nCoV_2019 WHO COVID-19 Overview 2020-11-22 2020-11-22 2020-11-22 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought TOTAL CASE True 2089329
4494961 nCoV_2019 WHO COVID-19 Overview 2020-11-22 2020-11-22 2020-11-22 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought TOTAL DEATH False 401
4494962 nCoV_2019 WHO COVID-19 Overview 2020-11-22 2020-11-22 2020-11-22 Authority notification AL0 RU Russian Federation NaN NaN NaN NaN NaN Clinical care sought TOTAL DEATH True 36179
我没有重建您的数据帧,但您应该可以像本例一样创建8个新列(我只显示了两个(。您可以更喜欢创建组合和构建列,但如果只有八个,只需简单地编写代码即可。
df[['CASE_CONFIRMED', 'CASE_PROBABLE']] = ''
一旦有了列,只需搜索这两列,并将该特定列设置为VALUE。
df.loc[(df['CONFIRM_STATUS'] == 'CONFIRMED') & (df['OUTCOME'] == 'CASE'}, 'CASE_CONFIRMED' ]] = df['VALUE']
df.loc[(df['CONFIRM_STATUS'] == 'PROBABLE') & (df['OUTCOME'] == 'CASE'}, 'CASE_PROBABLE' ]] = df['VALUE']
如果不起作用,请使用df.head(15(.to_json((.粘贴部分数据集