我有一个csv文件如下:
0 2 1 1 464 385 171 0:44:4
1 1 2 26 254 444 525 0:56:2
2 3 1 90 525 785 522 0:52:8
3 8 2 3 525 233 555 0:52:8
4 7 1 10 525 433 522 1:52:8
5 9 2 55 525 555 522 1:52:8
6 6 3 3 392 111 232 1:43:4
7 1 4 23 322 191 112 1:43:4
8 1 3 30 322 191 112 1:43:4
9 1 5 2 322 191 112 1:43:4
10 1 3 22 322 191 112 1:43:4
11 1 4 44 322 191 112 1:43:4
12 1 5 1 322 191 112 1:43:4
12 1 4 3 322 191 112 1:43:4
12 1 6 33 322 191 112 1:43:4
12 1 6 1 322 191 112 1:43:4
12 1 5 3 322 191 112 1:43:4
12 1 6 33 322 191 112 1:43:4
.
.
第三列的数字在1到6之间。我想读取列#4和#5的信息,为所有在第三列中有数字1到6的行,并找到最大和最小的金额为每一行有数字1到6分别。例如这样的输出:
Mix for row with 1: 1
Max for row with 1: 90
Min for row with 2: 3
Max for row with 2: 55
and so on
我可以使用下面的代码绘制图形。如何按组获取汇总统计信息?我要找的是同一组的多个统计值比如均值,最小值,最大值,每组在一次呼叫中的次数,这可行吗?
import matplotlib.pyplot as plt
import csv
x= []
y= []
with open('mydata.csv','r') as csvfile:
ap = csv.reader(csvfile, delimiter=',')
for row in ap:
x.append(int(row[2]))
y.append(int(row[7]))
plt.scatter(x, y, color = 'g',s = 4, marker='o')
plt.show()
一种简单的方法是对read_csv()
,.groupby()
和.agg()
使用Pandas:
import pandas as pd
df = pd.read_csv("mydata.csv", header=None)
def min_max_avg(col):
return (col.min() + col.max()) / 2
result = df[[2, 3, 4]].groupby(2).agg(["min", "max", "mean", min_max_avg])
结果:
3 4
min max mean min_max_avg min max mean min_max_avg
2
1 1 90 33.666667 45.5 464 525 504.666667 494.5
2 3 55 28.000000 29.0 254 525 434.666667 389.5
3 3 30 18.333333 16.5 322 392 345.333333 357.0
4 3 44 23.333333 23.5 322 322 322.000000 322.0
5 1 3 2.000000 2.0 322 322 322.000000 322.0
6 1 33 22.333333 17.0 322 322 322.000000 322.0
如果你不喜欢,你可以用纯Python来做,只需要多做一点工作:
import csv
data = {}
with open("mydata.csv", "r") as file:
for row in csv.reader(file):
dct = data.setdefault(row[2], {})
for col in (3, 4):
dct.setdefault(col, []).append(row[col])
min_str = "Min for group {} - column {}: {}"
max_str = "Max for group {} - column {}: {}"
for row in data:
for col in (3, 4):
print(min_str.format(row, col, min(data[row][col])))
print(max_str.format(row, col, max(data[row][col])))
结果:
Min for group 1 - column 3: 1
Max for group 1 - column 3: 90
Min for group 1 - column 4: 464
Max for group 1 - column 4: 525
Min for group 2 - column 3: 26
Max for group 2 - column 3: 55
Min for group 2 - column 4: 254
Max for group 2 - column 4: 525
Min for group 3 - column 3: 22
Max for group 3 - column 3: 30
Min for group 3 - column 4: 322
Max for group 3 - column 4: 392
...
mydata.csv
:
0,2,1,1,464,385,171,0:44:4
1,1,2,26,254,444,525,0:56:2
2,3,1,90,525,785,522,0:52:8
3,8,2,3,525,233,555,0:52:8
4,7,1,10,525,433,522,1:52:8
5,9,2,55,525,555,522,1:52:8
6,6,3,3,392,111,232,1:43:4
7,1,4,23,322,191,112,1:43:4
8,1,3,30,322,191,112,1:43:4
9,1,5,2,322,191,112,1:43:4
10,1,3,22,322,191,112,1:43:4
11,1,4,44,322,191,112,1:43:4
12,1,5,1,322,191,112,1:43:4
12,1,4,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4
12,1,6,1,322,191,112,1:43:4
12,1,5,3,322,191,112,1:43:4
12,1,6,33,322,191,112,1:43:4