如何获取GROUPBY的结果并展开到列



这里有一个我的工作代码示例,它实现了我想要的,但太长了。得到的数据帧是我想要的格式:

cc_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'Critical Care') & (df_encounters['Enloe Threshold'] == 'Low'))]
cc_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'Critical Care') & (df_encounters['Enloe Threshold'] == 'Medium'))]
cc_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'Critical Care') & (df_encounters['Enloe Threshold'] == 'High'))]
ms_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'Medical Surgical') & (df_encounters['Enloe Threshold'] == 'Low'))]
ms_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'Medical Surgical') & (df_encounters['Enloe Threshold'] == 'Medium'))]
ms_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'Medical Surgical') & (df_encounters['Enloe Threshold'] == 'High'))]
sc_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'Specialty Care') & (df_encounters['Enloe Threshold'] == 'Low'))]
sc_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'Specialty Care') & (df_encounters['Enloe Threshold'] == 'Medium'))]
sc_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'Specialty Care') & (df_encounters['Enloe Threshold'] == 'High'))]
ns_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'No Specialty') & (df_encounters['Enloe Threshold'] == 'Low'))]
ns_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'No Specialty') & (df_encounters['Enloe Threshold'] == 'Medium'))]
ns_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'No Specialty') & (df_encounters['Enloe Threshold'] == 'High'))]
cc_low_val = cc_idx_low['Encounter CSN'].nunique()
cc_med_val = cc_idx_med['Encounter CSN'].nunique()
cc_high_val = cc_idx_high['Encounter CSN'].nunique()
ms_low_val = ms_idx_low['Encounter CSN'].nunique()
ms_med_val = ms_idx_med['Encounter CSN'].nunique()
ms_high_val = ms_idx_high['Encounter CSN'].nunique()
sc_low_val = sc_idx_low['Encounter CSN'].nunique()
sc_med_val = sc_idx_med['Encounter CSN'].nunique()
sc_high_val = sc_idx_high['Encounter CSN'].nunique()
ns_low_val = ns_idx_low['Encounter CSN'].nunique()
ns_med_val = ns_idx_med['Encounter CSN'].nunique()
ns_high_val = ns_idx_high['Encounter CSN'].nunique()
# building the specialty range list using the values from above
df_encounter_counts=[['Critical Care',cc_low_val,cc_med_val,cc_high_val],
['Medical Surgical',ms_low_val,ms_med_val,ms_high_val],
['Specialty Care',sc_low_val,sc_med_val,sc_high_val],
['No Specialty',ns_low_val,ns_med_val,ns_high_val]]
# creating the ranges data frame for the list of values
df_encounter_counts = pd.DataFrame(df_encounter_counts,columns=['Specialty','Low','Medium','High'])
df_encounter_counts

我知道,在"专长类型"one_answers"基础阈值"上使用groupby((,在"遭遇CSN"上使用.nunique(((每个专长类型和阈值的唯一遭遇数:

示例数据:值的三列

df_encounters = df[['Encounter CSN','Specialty Type','Foundation Threshold']].copy()
df_enc_totals = df_encounters.groupby(['Specialty Type','Foundation Threshold'])['Encounter CSN'].nunique()
Specialty Type    Foundation Threshold
Critical Care     High                     271
Low                      165
Medium                   439
Medical Surgical  High                     245
Low                     2307
Medium                  2724
No Specialty      High                      10
Low                     1277
Medium                   472
Specialty Care    High                     214
Low                      906
Medium                  1205
Name: Encounter CSN, dtype: int64

我想把";"低"中等";,以及";高";在他们自己的专栏下,下面有独特遭遇的计数,比如:

Specialty Type      Low      Medium    High
Critical Care       165      439       271
Medical Surgical    2307     2724      245  
Specialty Care      906      1205      214  
No Specialty        1277     472       10

编辑:未堆叠的";"基础阈值";尝试格式化时出错

最终答案(标记答案和我的答案的组合(:

df_encounters = df[['Encounter CSN','Specialty Type','Enloe Threshold']].copy()
df_enc_totals = df_encounters.groupby(['Specialty Type','Enloe Threshold'])['Encounter CSN'].nunique()
df_enc_count = df_enc_totals.unstack('Enloe Threshold')
df_enc_count['Specialty'] = df_enc_count.index
df_enc_count.columns.name = ''
df_enc_count.index = range(0, 4, 1)
df_enc_count[['Specialty','Low','Medium','High']]```

这是您的数据帧:

df_enc_totals = pd.DataFrame({'': {('Critical Care', 'High'): 271,
('Critical Care', 'Low'): 165,
('Critical Care', 'Medium'): 439,
('Medical Surgical', 'High'): 245,
('Medical Surgical', 'Low'): 2307,
('Medical Surgical', 'Medium'): 2724,
('No Specialty', 'High'): 10,
('No Specialty', 'Low'): 1277,
('No Specialty', 'Medium'): 472,
('Specialty Care', 'High'): 214,
('Specialty Care', 'Low'): 906,
('Specialty Care', 'Medium'): 1205}}
)
df_enc_totals.index.names = ['Specialty Type', 'Foundation Threshold']
df_enc_totals

请尝试这个:

df_enc_totals = df_enc_totals.unstack('Foundation Threshold')
df_enc_totals.columns = df_enc_totals.columns.droplevel()
df_enc_totals = df_enc_totals[['Low', 'Medium','High']]
df_enc_totals.columns.name = ''
df_enc_totals.reset_index(inplace = True)
df_enc_totals

最新更新