r语言 - 根据列中某些单词的存在分配分数



我有一个数据框架,其中一列报告一顿饭的组成部分,例如:

----------------------------------
| ID |      Component              |
---------------------------------- 
| 1  |      Vegetables             |                                          
| 2  |      Pasta                  |                                               
| 3  |      Pasta, Vegetables      |                                         
| 4  |      Pulses, Vegetables     |                                         
| 5  |      Meat, Pasta, Vegetables|                                      
| 6  |      Meat, Vegetables       |                                        
| 7  |      Pulses                 |                                        
| 8  |      Meat                   |                                           
----------------------------------

我希望添加一个额外的列,给每个人一个分数。如果他们的餐食中含有意大利面,我希望他们得到1分,如果没有,我希望他们得到0分。所以参与者2、3和5得1,而其他人得0。

是否有代码允许我将此应用于术语"pasta"?

任何帮助将不胜感激!谢谢。

我们可以使用grepl匹配子串"面食"它返回一个逻辑向量,与as.integer转换为二进制或+

df1$meal_score <- +(grepl('Pasta', df1$Component))

一个简单的解决方案:

library(tidyverse)
df1 %>% 
mutate(score = +str_detect(Component, "Pasta"))
#>   ID               Component score
#> 1  1              Vegetables     0
#> 2  2                   Pasta     1
#> 3  3       Pasta, Vegetables     1
#> 4  4      Pulses, Vegetables     0
#> 5  5 Meat, Pasta, Vegetables     1
#> 6  6        Meat, Vegetables     0
#> 7  7                  Pulses     0
#> 8  8                    Meat     0

数据:

txt <- "ID|Component
1|Vegetables
2|Pasta
3|Pasta, Vegetables
4|Pulses, Vegetables
5|Meat, Pasta, Vegetables
6|Meat, Vegetables
7|Pulses
8|Meat"
df1 <- read.table(text = txt,  sep = "|", stringsAsFactors = F, header = T)

可以使用

library(dplyr)
df |> mutate(score = as.numeric(grepl("Pasta" , Component , fixed = T)))
输出
ID               Component score
1  1              Vegetables     0
2  2                   Pasta     1
3  3       Pasta, Vegetables     1
4  4      Pulses, Vegetables     0
5  5 Meat, Pasta, Vegetables     1
6  6        Meat, Vegetables     0
7  7                  Pulses     0
8  8                    Meat     0
df <- structure(list(ID = 1:8, Component = c("Vegetables", "Pasta", 
"Pasta, Vegetables", "Pulses, Vegetables", "Meat, Pasta, Vegetables", 
"Meat, Vegetables", "Pulses", "Meat")), class = "data.frame", row.names = c(NA, 
-8L))

您也可以将str_detect函数与case_when函数一起使用

library(stringr)
library(dplyr)
df <- data.frame(
ID = seq(1:8),
Component = c("Vegetables",
"Pasta",
"Pasta, Vegetables",
"Pulses, Vegetables",
"Meat, Pasta, Vegetables",
"Meat, Vegetables",
"Pulses",
"Meat")) %>% 
mutate(
score = case_when(
str_detect(Component, "Pasta") ~ 1,
T ~ 0
)
)
> df
ID               Component score
1  1              Vegetables      0
2  2                   Pasta      1
3  3       Pasta, Vegetables      1
4  4      Pulses, Vegetables      0
5  5 Meat, Pasta, Vegetables      1
6  6        Meat, Vegetables      0
7  7                  Pulses      0
8  8                    Meat      0

相关内容

最新更新