我有一个数据框架,需要添加一个显示差异的新列。每个样本的CurrentDay - PreviousDay
。例如,对于R1, 15-13.2 = 1.8将是新列的结果。请看下面的示例数据和答案。
这是一个样本数据集
Date,color,Sample,Height
10/24/2021,red,R1,13.2
10/24/2021,red,R2,0
10/24/2021,red,R3,9
10/24/2021,red,R4,16
10/24/2021,red,R5,4
10/24/2021,red,R6,15
10/24/2021,red,R7,9
10/24/2021,red,R8,16.5
10/24/2021,orange,O1,12.5
10/24/2021,orange,O2,17.5
10/24/2021,orange,O3,16
10/24/2021,orange,O4,12.9
10/24/2021,orange,O5,.1
10/24/2021,orange,O6,3.5
10/24/2021,orange,O7,8.5
10/24/2021,orange,O8,0
10/24/2021,yellow,Y1,0
10/24/2021,yellow,Y2,8.5
10/24/2021,yellow,Y3,11
10/24/2021,yellow,Y4,16.5
10/24/2021,yellow,Y5,14.5
10/24/2021,yellow,Y6,15
10/24/2021,yellow,Y7,5.9
10/24/2021,yellow,Y8,13
10/25/2021,red,R1,15
10/25/2021,red,R2,0
10/25/2021,red,R3,15
10/25/2021,red,R4,17.5
10/25/2021,red,R5,4.5
10/25/2021,red,R6,18
10/25/2021,red,R7,9
10/25/2021,red,R8,18
10/25/2021,orange,O1,16
10/25/2021,orange,O2,19.9
10/25/2021,orange,O3,17.8
10/25/2021,orange,O4,16
10/25/2021,orange,O5,.1
10/25/2021,orange,O6,6.5
10/25/2021,orange,O7,13
10/25/2021,orange,O8,0
10/25/2021,yellow,Y1,0
10/25/2021,yellow,Y2,10.9
10/25/2021,yellow,Y3,12
10/25/2021,yellow,Y4,18
10/25/2021,yellow,Y5,16.5
10/25/2021,yellow,Y6,16
10/25/2021,yellow,Y7,8
10/25/2021,yellow,Y8,14.6
附加列的答案应该如下所示
R1 = 1.8
R2 = 0
R3 = 6
R4 = 1.5
R5 = .5
R6 = 3
R7 = 0
R8 = 1.5
O1 = 3.5
O2 = 2.4
O3 = 1.8
O4 = 3.1
O5 = 0
O6 = 3
O7 = 4.5
08 = 0
Y1 = 0
Y2 = 2.4
Y3 = 1
Y4 = 1.5
Y5 = 2
Y6 = 1
Y7 = 2.1
Y8 = 1.6
您可以使用groupby
和diff
:
df = pd.read_csv('filename.csv')
difference = df.groupby('Sample').Height.diff()
mask = ~difference.isnull()
print(pd.concat([df[mask].Sample, difference[mask]], 1))
Sample Height
24 R1 1.8
25 R2 0.0
26 R3 6.0
27 R4 1.5
28 R5 0.5
29 R6 3.0
30 R7 0.0
31 R8 1.5
32 O1 3.5
33 O2 2.4
34 O3 1.8
35 O4 3.1
36 O5 0.0
37 O6 3.0
38 O7 4.5
39 O8 0.0
40 Y1 0.0
41 Y2 2.4
42 Y3 1.0
43 Y4 1.5
44 Y5 2.0
45 Y6 1.0
46 Y7 2.1
47 Y8 1.6
最后的for
循环将以您想要的格式打印输出:
df = df.assign(difference = df.groupby("Sample")[["Height"]].diff())
df = df[~df['difference'].isnull()]
for _, line in df.iterrows():
print("{:<4}= {:>3s}".format(line["Sample"], str(round(line["difference"] * 100)/100)))