输入
group treatment
0 111 lot1
1 111 lot2
2 111 lot3
3 111 lot4
4 111 other
5 111 unknown
6 111 other
7 111 unknown
8 111 another
9 222 other
10 222 unknown
11 222 unknown
12 222 other
13 333 lot1
14 333 lot2
15 333 lot3
16 333 lot4
17 333 other
18 333 unknown
19 444 other
20 444 unknown
21 444 unknown
22 444 other
该输入中有两组:第一组(111和333(在处理列中有lot1,2,3,4、"other"one_answers"unknown",而另一组(222和444(只有"other’和"unnown"。我想把它们分成两个数据帧,这样每个数据帧就可以捕获其中一个组:
输出df1=
group treatment
111 lot1
111 lot2
111 lot3
111 lot4
111 other
111 unknown
111 other
111 unknown
111 another
333 lot1
333 lot2
333 lot3
333 lot4
333 other
333 unknown
df2=
group treatment
222 other
222 unknown
222 unknown
222 other
444 other
444 unknown
444 unknown
444 other
基本上,我想把原始数据分为两个数据帧,df1将包含在处理列中有lot1,2,3,4,other,unknown的组,而df2将只包含在处理栏中只有other,未知的组
needed_values = set(['other', 'unknown'])
df_1_data = list()
df_2_data = list()
groups = df.groupby('group')
for name, df_g in groups:
t = df_g.to_dict(orient='records')
unique_values = set(df_g['treatment'].unique())
if unique_values == needed_values:
df_2_data += t
else:
df_1_data += t
df_1 = pd.DataFrame(df_1_data)
df_2 = pd.DataFrame(df_2_data)