我有如下数据框。 它具有分类列(波段(,其层次结构从 0,G1,G2,G3 到 G4,其中 G4 最高。
我想用熊猫重写下面的SQL逻辑
CASE
WHEN Band = '00' THEN access.Y
WHEN Band = 'G1' THEN acess.X +200
WHEN Band >= 'G2' THEN acess.X +access.Y
END As Result
From access
数据帧
Band X Y Result
00 200 2100 2100
G1 300 2300 500
G2 200 5000 5200
G3 150 6000 6150
00 250 2500 2500
G1 400 6000 600
G2 300 6000 6300
G4 500 4500 5000
如何在python中做到这一点?
您需要在此处pd.Categorical
(对于处理分类排序和比较的一般情况(才能在分类序列中使用比较运算符,例如>=
,然后使用np.select
来使用if,elif,else
条件。
df['Band'] = pd.Categorical(df['Band'],categories=['00','G1','G2','G3','G4'],ordered=True)
df['result']=np.select([df['Band']=='00',df['Band']=='G1',df['Band']>='G1']
,[df['Y'],df['X']+200,df['X'].add(df['Y'])])
print(df)
Band X Y Result result
0 00 200 2100 2100 2100
1 G1 300 2300 2600 500
2 G2 200 5000 5200 5200
3 G3 150 6000 6150 6150
4 00 250 2500 2500 2500
5 G1 400 6000 6400 600
6 G2 300 6000 6300 6300
7 G4 500 4500 5000 5000
详: 如果我们现在打印该系列,我们将能够看到雇佣制:
print(df['Band'])
0 00
1 G1
2 G2
3 G3
4 00
5 G1
6 G2
7 G4
Name: Band, dtype: category
Categories (5, object): [00 < G1 < G2 < G3 < G4]
使用numpy.select
:
df['res'] = (np.select([df['Band']=='00', df['Band'] == 'G1'],
[df['Y'], df['X'] + 200],
df['X'] + df['Y']))
print (df)
Band X Y Result res
0 00 200 2100 2100 2100
1 G1 300 2300 2600 500
2 G2 200 5000 5200 5200
3 G3 150 6000 6150 6150
4 00 250 2500 2500 2500
5 G1 400 6000 6400 600
6 G2 300 6000 6300 6300
7 G4 500 4500 5000 5000
如果有G
数字转换为分类是不必要的:
df['res'] = (np.select([df['Band']=='00', df['Band'] == 'G1', df['Band']>='G1'],
[df['Y'], df['X'] + 200, df['X'] + df['Y']]))
print (df)
Band X Y Result res
0 00 200 2100 2100 2100
1 G1 300 2300 2600 500
2 G2 200 5000 5200 5200
3 G3 150 6000 6150 6150
4 00 250 2500 2500 2500
5 G1 400 6000 6400 600
6 G2 300 6000 6300 6300
7 G4 500 4500 5000 5000