我对Python中的Pandas库很陌生,我正试图对学生对从csv文件(大量的项目和学生)读取的一组多项选择项的反应进行评分,我有一个密钥。我编写了基本的python代码,可以很容易地做到这一点,我已经包含了它,以帮助确保它清楚我要做什么:
#Create example data 5 items for 6 students
A,B,C,D = "A","B","C","D"
df = [[A,B,C,D,C],
[A,B,C,B,C],
[A,D,D,B,C],
[A,B,C,C,C],
[A,B,B,D,C],
[A,B,C,D,C]]
# Score the items and add to data
key = ["A","B","C","D","C"]
for line in df:
score = sum([1 for i,j in zip(line,key) if i==j])
line = line.append(score)
# Now print the example for clarity
print ("I1 I2 I3 I4 I5 Score")
for i in df:
for j in i:
print (j, end=" ")
print()
打印这个,这是我想在Pandas中学习的:
I1 I2 I3 I4 I5 Score
A B C D C 5
A B C B C 4
A D D B C 2
A B C C C 4
A B B D C 4
A B C D C 5
这是我的开始,但显然我还有很多东西要学:
import pandas as pd
d = {'I1': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A'},
'I2': {0: 'B', 1: 'B', 2: 'D', 3: 'B', 4: 'B', 5: 'B'},
'I3': {0: 'C', 1: 'C', 2: 'D', 3: 'C', 4: 'B', 5: 'C'},
'I4': {0: 'D', 1: 'B', 2: 'B', 3: 'C', 4: 'D', 5: 'D'},
'I5': {0: 'C', 1: 'C', 2: 'C', 3: 'C', 4: 'C', 5: 'C'}}
df = pd.DataFrame(d)
key = ["A","B","C","D","C"]
df['Score'] = sum ([1 for x,y in zip(df.iloc[:,0:5],key[:]) if x==y])
print (df)
但是,很明显我失败了:
I1 I2 I3 I4 I5 Score
0 A B C D C 0
1 A B C B C 0
2 A D D B C 0
3 A B C C C 0
4 A B B D C 0
5 A B C D C 0
谢谢你教育我…
您要找的是:
df['Score'] = df.eq(key).sum(axis=1) # equivalent to (df == key).sum(axis=1)
print(df)
# Output
I1 I2 I3 I4 I5 Score
0 A B C D C 5
1 A B C B C 4
2 A D D B C 2
3 A B C C C 4
4 A B B D C 4
5 A B C D C 5
你的数据框架有5列,你有一个包含5个元素的列表(向量)。Pandas非常聪明,可以将每一行与列表进行比较,并返回一个布尔值。最后将True
值在索引轴上相加得到得分。