Python Pandas从一个键中对多项选择项进行评分



我对Python中的Pandas库很陌生,我正试图对学生对从csv文件(大量的项目和学生)读取的一组多项选择项的反应进行评分,我有一个密钥。我编写了基本的python代码,可以很容易地做到这一点,我已经包含了它,以帮助确保它清楚我要做什么:

#Create example data 5 items for 6 students
A,B,C,D = "A","B","C","D"
df =  [[A,B,C,D,C],
[A,B,C,B,C],
[A,D,D,B,C],
[A,B,C,C,C],
[A,B,B,D,C],
[A,B,C,D,C]]
# Score the items and add to data
key = ["A","B","C","D","C"]
for line in df:
score = sum([1 for i,j in zip(line,key) if i==j])
line = line.append(score)
# Now print the example for clarity
print ("I1 I2 I3 I4 I5 Score")
for i in df:
for j in i:
print (j, end="  ")
print()

打印这个,这是我想在Pandas中学习的:

I1 I2 I3 I4 I5 Score
A  B  C  D  C  5  
A  B  C  B  C  4  
A  D  D  B  C  2  
A  B  C  C  C  4  
A  B  B  D  C  4  
A  B  C  D  C  5  

这是我的开始,但显然我还有很多东西要学:

import pandas as pd
d = {'I1': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A'},
'I2': {0: 'B', 1: 'B', 2: 'D', 3: 'B', 4: 'B', 5: 'B'},
'I3': {0: 'C', 1: 'C', 2: 'D', 3: 'C', 4: 'B', 5: 'C'},
'I4': {0: 'D', 1: 'B', 2: 'B', 3: 'C', 4: 'D', 5: 'D'},
'I5': {0: 'C', 1: 'C', 2: 'C', 3: 'C', 4: 'C', 5: 'C'}}
df = pd.DataFrame(d)
key = ["A","B","C","D","C"]
df['Score'] = sum ([1 for x,y in zip(df.iloc[:,0:5],key[:]) if x==y])
print (df)

但是,很明显我失败了:

I1 I2 I3 I4 I5  Score
0  A  B  C  D  C      0
1  A  B  C  B  C      0
2  A  D  D  B  C      0
3  A  B  C  C  C      0
4  A  B  B  D  C      0
5  A  B  C  D  C      0

谢谢你教育我…

您要找的是:

df['Score'] = df.eq(key).sum(axis=1)  # equivalent to (df == key).sum(axis=1)
print(df)
# Output
I1 I2 I3 I4 I5  Score
0  A  B  C  D  C      5
1  A  B  C  B  C      4
2  A  D  D  B  C      2
3  A  B  C  C  C      4
4  A  B  B  D  C      4
5  A  B  C  D  C      5

你的数据框架有5列,你有一个包含5个元素的列表(向量)。Pandas非常聪明,可以将每一行与列表进行比较,并返回一个布尔值。最后将True值在索引轴上相加得到得分。

最新更新