从r中的有序数据创建虚拟变量

我有一个具有以下类别的有序变量

非常有利(1)比较有利(2)有些不利(3)非常不利(4)不知道(8)拒绝回答(9)

我想要输出二进制显示变量:

优惠(1)不利的(0)

我想通过分组"very favorable"one_answers"有点好感";致新的"有利的";结果编码为"1";并将他们归类为"非常不利"one_answers"有点好感";对新结果不利，编码为"0"。

所以基本上我想把"1"="1";和"2";="1";"3";="0";和"4";="0">

下面是case_when()的dplyr解决方案，对于创建假人非常有用。

我的起始数据如下:

# A tibble: 6 x 2
participant category              
<int> <chr>                 
1           1 somewhat favorable (2)
2           2 very unfavorable (4)  
3           3 very favorable (1)    
4           4 don't know (8)        
5           5 very favorable (1)    
6           6 somewhat favorable (2)

所以，基本上，当它检测到1或2时，它会将行值转换为"有利(1)"和3或4成&;不利(0)&;

data %>%  
mutate(category = case_when(
str_detect(category, "(1)|(2)") ~ "favorable (1)", 
str_detect(category, "(3)|(4)") ~ "unfavorable (0)"))

由于(8)和(9)没有指定，因此代码将它们作为NAs返回。最终数据集如下:

# A tibble: 10 x 2
participant category       
<int> <chr>          
1           1 favorable (1)  
2           2 unfavorable (0)
3           3 favorable (1)  
4           4 NA             
5           5 favorable (1)  
6           6 favorable (1)  
7           7 unfavorable (0)
8           8 unfavorable (0)
9           9 favorable (1)  
10          10 unfavorable (0)

有很多方法可以做到这一点，我能想到的最简单的方法是在%中使用一些%。

e。

data$column_to_recode = as.character(data$column_to_recode) #failing to do this may result in R coercing existing factors to numeric integers representing ranks
data$column_to_recode[which(data$column_to_recode %in% c(1,2))] = 1
data$column_to_recode[which(data$column_to_recode %in% c(3,4))] = 0
data$column_to_recode[which(!(data$column_to_recode %in% c(0:4)))] = NA #or whatever else you want to do with the values that aren't 1 through 4`

然后，如果你真的想要额外的积分，你可以把它强制变成一个因子变量，但我发现这通常是过度的。

data$column_to_recode = factor(data$column_to_recode,levels=c(0,1),ordered = TRUE)

我不能从你原来的问题告诉如果数字代码是好的，或者如果你想使用字符串代替，但同样的逻辑适用，例如:

data$column_to_recode[which(data$column_to_recode %in% c("(1) somewhat favorable","(2) somewhat unfavorable"))] = "Favorable"

应该能满足你的需要。

相关内容

最新更新

热门标签：