初始CSV文件具有这些列和行
enroll_code,student_id
10030,55000
10030,55804
10250,55804
10510,55000
阅读CSV文件后,数据列表成为Sublists的列表
import csv
data=[]
with open('C:/Users/Taha/Downloads/Data.csv','r') as csvFile:
reader = csv.reader(csvFile)
data=list(reader)
print(data)
csvFile.close()
上述代码的输出将其作为数据。
data=[['enroll_code', 'student_id'], ['10030', '55000'], ['10030', '55804'], ['10250', '55804'], ['10510', '55000']]
我需要的结果是
10030:2
10250:1
10510:1
如何将其转换为字典,并具有关键10030代表并计数两个Student_id
当我这样做时:
import csv
data={}
with open('C:/Users/Taha/Downloads/Data.csv','r') as csvFile:
reader = csv.DictReader(csvFile)
data=dict(reader)
print(data)
csvFile.close()
它仅给出输出: {'enroll_code': 'student_id'}
我需要的是Python计数ENROLL_CODE上事件数量的一种方法,请说是否在CSV文件中是否有thausand条目,并且在整个CSV文件中都重复了eNROLL_CODE和Student_ID获取所需的结果。
基本上我想在基础python中对其进行编码 import pandas as pd
df=pd.read_csv('data.csv')
df.gourpby('enroll_code').count()
import pandas as pd
df=pd.read_csv('data.csv')
df.gourpby('enroll_code').count()
想到的最直接的方式只是浏览您的项目,然后将它们"计数"到字典
中假设您已经做过
data=list(reader)
您可以做:
result = {}
for item in data[1:]:
if item[0] not in result :
result [item[0]] = 1
else:
result [item[0]] += 1
我们正在按项目浏览您的数据项,跳过标题(这就是为什么我们拥有data[1:]
零件),检查该项是否在字典中,如果不是,则使用1添加1个,否则我们会增加当前计数
您可以使用collections.defaultdict
ex:
import csv
from collections import defaultdict
result = defaultdict(int)
with open('C:/Users/Taha/Downloads/Data.csv') as csvFile:
reader = csv.reader(csvFile)
next(reader) #Skip Header.
for row in reader:
result[row[0]] += 1
print(result)
输出:
defaultdict(<type 'int'>, {
'10250': 1,
'10510': 1,
'10030': 2
})
如果您不想使用任何外部库,则可以使用.get
:
data=[['enroll_code', 'student_id'], ['10030', '55000'], ['10030', '55804'], ['10250', '55804'], ['10510', '55000']]
dct = {}
for x in data[1:]:
dct[x[0]] = dct.get(x[0], 0) + 1
print(dct)
输出:
{'10030': 2, '10250': 1, '10510': 1}
.get
如果键在字典中,则返回键(x[0]
)的值,否则返回0。然后,我们将1汇总到此值(x[0]
的值或0),并将新值分配给同一键。br>这是.get
和其他字典方法
这将有效:
import csv
with open('C:/Users/Taha/Downloads/Data.csv') as f:
enroll_count = {}
reader = csv.reader(f)
next(reader)
for row in reader:
code = row[0]
if code in enroll_count:
enroll_count[code] += 1
else:
enroll_count[code] = 1
print(enroll_count)
尝试以下:
data=pd.DataFrame([['10030', '55000'], ['10030', '55804'], ['10250', '55804'], ['10510', '55000']],columns=['enroll_code', 'student_id'])
dict(data.groupby('enroll_code').count())