给定:
import pandas as pd
survey = [('How much do you like apples?', 4),
('How much do you like oranges?', 5),
('How much do like bananas?', 5),
('Why do you like fruits?', "They are the best")]
labels = ['Question', 'Answer']
before= pd.DataFrame.from_records(survey, columns=labels)
应如下所示:
survey = [('How much do you like apples?', 4, "NaN"),
('How much do you like oranges?', 5, "NaN"),
('How much do like bananas?', 5, "NaN"),
('Why do you like fruits?',"NaN", "They are the best")]
labels = ['Question', 'Answer', 'Comments']
after= pd.DataFrame.from_records(survey, columns=labels)
我正在使用一个大型调查响应数据集。我遇到的问题是,在答案列下,回复要么是 1-5,要么是评论(字符串(。我正在尝试将此列分解为仅包含连续数据 (1-5( 的 Answer 列和仅包含注释(字符串(的另一个列。这些新列需要在当前的 df 中形成。有人知道一个帮助我入门的功能吗?
谢谢。
我们可以使用to_numeric
s=pd.to_numeric(before.Answer,errors='coerce')
before['Comments']=before.Answer.where(s.isnull())
before['Answer']=s
输出
before
Out[199]:
Question Answer Comments
0 How much do you like apples? 4.0 NaN
1 How much do you like oranges? 5.0 NaN
2 How much do like bananas? 5.0 NaN
3 Why do you like fruits? NaN They are the best