Pandas DataFrame最大值和最小值



我有一个pandas dataframe,看起来如下:

+-----+---+---+--+--+
|     | A | B |  |  |
+-----+---+---+--+--+
| 288 | 1 | 4 |  |  |
+-----+---+---+--+--+
| 245 | 2 | 3 |  |  |
+-----+---+---+--+--+
| 543 | 3 | 6 |  |  |
+-----+---+---+--+--+
| 867 | 1 | 9 |  |  |
+-----+---+---+--+--+
| 345 | 2 | 7 |  |  |
+-----+---+---+--+--+
| 122 | 3 | 8 |  |  |
+-----+---+---+--+--+
| 233 | 1 | 1 |  |  |
+-----+---+---+--+--+
| 346 | 2 | 6 |  |  |
+-----+---+---+--+--+
| 765 | 3 | 3 |  |  |
+-----+---+---+--+--+

我想做的是从列中1到3列的列" b"列获得最大和最小值。

例如:

loop on A in range 1 to 3:
       get max and min values from column 'B'
       max = 6
       min = 3
loop on the next range of A from 1 to 3:
       get max and min values from column 'B'
       max = 9
       min = 7           
loop on the next range of A from 1 to 3:
       get max and min values from column 'B'
       max = 6
       min = 1

并将最低最大值添加到一个列中:

+-----+---+---+--+----+
|     | A | B |min|max|
+-----+---+---+--+----+
| 288 | 1 | 4 | 3 | 6 |
+-----+---+---+--+----+
| 245 | 2 | 3 |   |   |
+-----+---+---+--+----+
| 543 | 3 | 6 |   |   |
+-----+---+---+--+----+
| 867 | 1 | 9 | 7 | 9 |
+-----+---+---+--+----+
| 345 | 2 | 7 |   |   |
+-----+---+---+--+----+
| 122 | 3 | 8 |   |   |
+-----+---+---+--+----+
| 233 | 1 | 1 | 1 | 6 |
+-----+---+---+--+----+
| 346 | 2 | 6 |   |   |
+-----+---+---+--+----+
| 765 | 3 | 3 |   |   |
+-----+---+---+--+----+

如果不需要空值:

g = df.groupby(np.arange(len(df.index)) // 3)
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
     A  B  min  max
288  1  4    3    6
245  2  3    3    6
543  3  6    3    6
867  1  9    7    9
345  2  7    7    9
122  3  8    7    9
233  1  1    1    6
346  2  6    1    6
765  3  3    1    6

对于Emty值是可能添加空的空格,但是列中的所有值minmax也会转换为字符串:

g = df.groupby(np.arange(len(df.index)) // 3)
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
df.loc[df.A != 1, ['min','max']] = ''
print (df)
     A  B min max
288  1  4   3   6
245  2  3        
543  3  6        
867  1  9   7   9
345  2  7        
122  3  8        
233  1  1   1   6
346  2  6        
765  3  3    

edit1:

df['range']='range' + pd.Series(np.arange(len(df.index))//3 + 1, index=df.index).astype(str) 
g = df.groupby('range')
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
     A  B   range  min  max
288  1  4  range1    3    6
245  2  3  range1    3    6
543  3  6  range1    3    6
867  1  9  range2    7    9
345  2  7  range2    7    9
122  3  8  range2    7    9
233  1  1  range3    1    6
346  2  6  range3    1    6
765  3  3  range3    1    6

带有布尔面具的cumsum的另一个解决方案:

df['range'] = 'range' + (df.A == 1).cumsum().astype(str)
g = df.groupby('range')
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
     A  B   range  min  max
288  1  4  range1    3    6
245  2  3  range1    3    6
543  3  6  range1    3    6
867  1  9  range2    7    9
345  2  7  range2    7    9
122  3  8  range2    7    9
233  1  1  range3    1    6
346  2  6  range3    1    6
765  3  3  range3    1    6

常规解决方案

g = df.groupby(df.groupby('A').cumcount())
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
     A  B  min  max
288  1  4    3    6
245  2  3    3    6
543  3  6    3    6
867  1  9    7    9
345  2  7    7    9
122  3  8    7    9
233  1  1    1    6
346  2  6    1    6
765  3  3    1    6

最新更新