给定一个csv文件
A,0,0,1,0
B,0,0,1,0
C,0,0,1,0
D,0,0,1,0
E,0,0,1,0
F,0,0,0,1
我想计算每列的总数。有没有比更像蟒蛇或更有效的方法来做到这一点
import csv
totals = [0]*4
for row in csv.reader(csvfile):
counts = [ int(x) for x in row[-4:] ]
totals = [ sum(x) for x in zip(counts, totals) ]
print(totals)
预先转换csv文件,跳过now标题列,只计算每行的和
cr = zip(*csv.reader(csvfile))
next(cr)
result = [sum(map(int,x)) for x in cr]
print(result)
[0,0,5,1]
在扩展zip
的参数时要小心,因为它会将整个文件加载到内存中。
这里有一种在没有外部库的情况下完成这项工作的综合方法:
matrix = [[int(i) for i in row[-4:]] for row in csv.reader(csvfile)]
totals = [sum(array[i] for array in matrix) for i in range(4)]
您可以使用numpy的genfromtxt
读取文件,然后切片索引列和sum
数组:
import numpy as np
my_data = np.genfromtxt(csvfile, delimiter=',')
print(my_data[:,1:].sum(axis=0))
提供:
[0. 0. 5. 1.]
使用熊猫
import pandas as pd
df = pd.read_csv('path/to/file.csv', header=None, index_col=0)
df.sum()
这是一个使用StringIO的示例
from io import StringIO
import pandas as pd
s = """A,0,0,1,0
B,0,0,1,0
C,0,0,1,0
D,0,0,1,0
E,0,0,1,0
F,0,0,0,1"""
df = pd.read_csv(StringIO(s), header=None, index_col=0)
print(df.sum())
1 0
2 0
3 5
4 1