如何计算一个列的最小和最大为特定的行?



我有一个csv文件如下:

0  2  1  1    464  385  171    0:44:4   
1  1  2  26    254  444  525    0:56:2   
2  3  1  90    525  785  522    0:52:8   
3  8  2  3    525  233  555    0:52:8  
4  7  1  10    525  433  522    1:52:8   
5  9  2  55     525  555  522    1:52:8   
6  6  3  3   392  111  232    1:43:4   
7  1  4  23    322  191  112    1:43:4   
8  1  3  30    322  191  112    1:43:4   
9  1  5   2   322  191  112    1:43:4   
10  1  3  22   322  191  112    1:43:4   
11  1  4  44   322  191  112    1:43:4   
12  1  5  1   322  191  112    1:43:4   
12  1  4  3    322  191  112    1:43:4   
12  1  6  33    322  191  112    1:43:4   
12  1  6  1    322  191  112    1:43:4 
12  1  5  3    322  191  112    1:43:4   
12  1  6  33    322  191  112    1:43:4   

.
.

第三列的数字在1到6之间。我想读取列#4和#5的信息,为所有在第三列中有数字1到6的行,并找到最大和最小的金额为每一行有数字1到6分别。例如这样的输出:

Mix for row with 1: 1  
Max for row with 1: 90
Min for row with 2:  3
Max for row with 2:  55
and so on 

我可以使用下面的代码绘制图形。如何按组获取汇总统计信息?我要找的是同一组的多个统计值比如均值,最小值,最大值,每组在一次呼叫中的次数,这可行吗?

import matplotlib.pyplot as plt
import csv
x= []
y= []
with open('mydata.csv','r') as csvfile:
ap = csv.reader(csvfile, delimiter=',')
for row in ap:
x.append(int(row[2]))
y.append(int(row[7]))
plt.scatter(x, y, color = 'g',s = 4, marker='o')
plt.show()

一种简单的方法是对read_csv(),.groupby().agg()使用Pandas:

import pandas as pd
df = pd.read_csv("mydata.csv", header=None)
def min_max_avg(col):
return (col.min() + col.max()) / 2

result = df[[2, 3, 4]].groupby(2).agg(["min", "max", "mean", min_max_avg])

结果:

3                               4                             
min max       mean min_max_avg  min  max        mean min_max_avg
2                                                                 
1   1  90  33.666667        45.5  464  525  504.666667       494.5
2   3  55  28.000000        29.0  254  525  434.666667       389.5
3   3  30  18.333333        16.5  322  392  345.333333       357.0
4   3  44  23.333333        23.5  322  322  322.000000       322.0
5   1   3   2.000000         2.0  322  322  322.000000       322.0
6   1  33  22.333333        17.0  322  322  322.000000       322.0

如果你不喜欢,你可以用纯Python来做,只需要多做一点工作:

import csv
data = {}
with open("mydata.csv", "r") as file:
for row in csv.reader(file):
dct = data.setdefault(row[2], {})
for col in (3, 4):
dct.setdefault(col, []).append(row[col])
min_str = "Min for group {} - column {}: {}"
max_str = "Max for group {} - column {}: {}"
for row in data:
for col in (3, 4):
print(min_str.format(row, col, min(data[row][col])))
print(max_str.format(row, col, max(data[row][col])))

结果:

Min for group 1 - column 3: 1
Max for group 1 - column 3: 90
Min for group 1 - column 4: 464
Max for group 1 - column 4: 525
Min for group 2 - column 3: 26
Max for group 2 - column 3: 55
Min for group 2 - column 4: 254
Max for group 2 - column 4: 525
Min for group 3 - column 3: 22
Max for group 3 - column 3: 30
Min for group 3 - column 4: 322
Max for group 3 - column 4: 392
...

mydata.csv:

0,2,1,1,464,385,171,0:44:4
1,1,2,26,254,444,525,0:56:2
2,3,1,90,525,785,522,0:52:8
3,8,2,3,525,233,555,0:52:8
4,7,1,10,525,433,522,1:52:8
5,9,2,55,525,555,522,1:52:8
6,6,3,3,392,111,232,1:43:4
7,1,4,23,322,191,112,1:43:4
8,1,3,30,322,191,112,1:43:4
9,1,5,2,322,191,112,1:43:4
10,1,3,22,322,191,112,1:43:4
11,1,4,44,322,191,112,1:43:4
12,1,5,1,322,191,112,1:43:4
12,1,4,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4
12,1,6,1,322,191,112,1:43:4
12,1,5,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4

相关内容

  • 没有找到相关文章

最新更新