使用python/pandas根据一行中值的存在与否,从原始数据帧创建两个数据帧



输入

group   treatment
0   111 lot1
1   111 lot2
2   111 lot3
3   111 lot4
4   111 other
5   111 unknown
6   111 other 
7   111 unknown
8   111 another
9   222 other
10  222 unknown
11  222 unknown
12  222 other
13  333 lot1
14  333 lot2
15  333 lot3
16  333 lot4
17  333 other
18  333 unknown
19  444 other
20  444 unknown
21  444 unknown
22  444 other

该输入中有两组:第一组(111和333(在处理列中有lot1,2,3,4、"other"one_answers"unknown",而另一组(222和444(只有"other’和"unnown"。我想把它们分成两个数据帧,这样每个数据帧就可以捕获其中一个组:

输出df1=

group   treatment
111 lot1
111 lot2
111 lot3
111 lot4
111 other
111 unknown
111 other 
111 unknown
111 another
333 lot1
333 lot2
333 lot3
333 lot4
333 other
333 unknown

df2=

group   treatment
222 other
222 unknown
222 unknown
222 other
444 other
444 unknown
444 unknown
444 other

基本上,我想把原始数据分为两个数据帧,df1将包含在处理列中有lot1,2,3,4,other,unknown的组,而df2将只包含在处理栏中只有other,未知的组

needed_values = set(['other', 'unknown'])
df_1_data = list()
df_2_data = list()
groups = df.groupby('group')
for name, df_g in groups:
t = df_g.to_dict(orient='records')
unique_values = set(df_g['treatment'].unique())
if unique_values == needed_values:
df_2_data += t
else:
df_1_data += t
df_1 = pd.DataFrame(df_1_data)
df_2 = pd.DataFrame(df_2_data)

最新更新