我在Panda数据框架中收到了参与者的调查答案:
['A', 'B', 'C', 'A' ...],
['D', 'B', 'B', 'A' ...],
......................
['D', 'C', 'C', 'A' ...]]
我有一个调查的关键向量:
['D', 'B', 'B', 'A' ...]
我需要得到一个数据帧,显示调查的布尔结果,如:
[0, 1, 0, 1 ...],
[1, 1, 1, 1 ...],
......................
[1, 0, 0, 1 ...]]
我试过使用pd.get_dummies(users_answ,keys(,但这似乎是错误的
您应该能够简单地检查DataFrame和列表之间的相等性。列表应与列之间的DataFrame
对齐:
df = pd.DataFrame([[*'ABCA'],[*'DBBA'],[*'DCCA']])
keys = [*'DBBA']
print(df)
0 1 2 3
0 A B C A
1 D B B A
2 D C C A
print(keys)
['D', 'B', 'B', 'A']
print(df == keys)
0 1 2 3
0 False True False True
1 True True True True
2 True False False True
# If you want actual integers instead of booleans
print((df == keys).astype(int))
0 1 2 3
0 0 1 0 1
1 1 1 1 1
2 1 0 0 1
最简单的方法似乎是使用pandas eq函数https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eq.html#pandas.DataFrame.eq
所以整个解决方案只有一条线:
users_answ.eq(keys, axis=0)
替代解决方案:
#new array
checked_answ = []
#taking each row of surveys answers df
for r in range(0, users_answ.shape[0]):
row = users_answ.iloc[r].tolist()
#creating the array for this row
p = []
for i in range(0, len(keys)):
if(keys[i] == row[i]):
p.append(1)
else:
p.append(0)
checked_answ.append(p)