用R中单列中的字符重命名多个变量

  • 本文关键字:重命名 变量 字符 单列中 r
  • 更新时间 :
  • 英文 :


我有一个df:

A       B                       C
NP     All M4                   6
NP     All M4                   8
NP     All FBS                  3
NI     C1_D2                    8
NI     C1D9: PT PI-4, A,B AM1   6
NI     C1D9: PT P3,4 B,E A6     9
NN     W1D5: PRE                2
NN     W1D5: PRE                6
NI     W1D5: PRE                5
A <- c("NP", "NP", "NP", "NI", "NI", "N1", "NN", "NN", "N1")
B <- c("All M4", "All M4", "All FBS", "C1_D2", "C1D9: PT PI-4, A,B AM1", "C1D9: PT P3,4 B,E A6 ", "W1D5: PRE", "W1D5: PRE", "W1D5: PRE")
C <- c("6","8","3","8","6","9","2","6","5")
df <- data.frame(A, B, C)
df

我想重命名B列中的变量,然后按A列和D列分组,得到C列的和。到目前为止,我的当前代码是:

df2 <- df %>% 
mutate(D = case_when(
startsWith(B, "All") ~ "ALL",
startsWith(B, "C1_D") ~ "CASE 1 DEAL 2",
startsWith(B, "C1D9") ~ "CASE 1 DEAL 9",
startsWith(B, "W1D5") ~ "WELL 1 DEAL 5",
)) %>%
group_by(A, D) %>% summaries(C =n())

我得到错误代码:mutate()输入问题Visit x Case 3(startWith(B,"All"~"All")必须是一个双面公式,而不是字符向量。任何其他更有效地编写代码的方式都将受到赞赏,因为我不喜欢使用基本R。

df2应该看起来像这个

A   D                 C
NP  ALL               17
NI  CASE 1 DEAL 2     8
NI  CASE 1 DEAL 9     15
NN  WELL 1 DEAL 5     8
NI  WELL 1 DEAL 5     5

这就是您需要的吗?

library(dplyr)
df %>%
mutate(D = case_when(grepl("^All", B) ~ "ALL",
grepl("^C1_D", B) ~ "CASE 1 DEAL 2",
grepl("^C1D9", B) ~ "CASE 1 DEAL 9",
grepl("^W1D5", B) ~ "WELL 1 DEAL 5")) %>%
group_by(A,D) %>%
summarise(C = sum(as.numeric(C)))
# A tibble: 6 x 3
# Groups:   A [4]
A     D                 C
<chr> <chr>         <dbl>
1 N1    CASE 1 DEAL 9     9
2 N1    WELL 1 DEAL 5     5
3 NI    CASE 1 DEAL 2     8
4 NI    CASE 1 DEAL 9     6
5 NN    WELL 1 DEAL 5     8
6 NP    ALL              17
  1. stringr包中的str_detect来检测字符串
  2. 群和summarise——C的sum
df %>% 
type.convert(as.is=TRUE) %>% 
mutate(D = case_when(
str_detect(B, "All") ~ "ALL",
str_detect(B, "C1_D") ~ "CASE 1 DEAL 2",
str_detect(B, "C1D9") ~ "CASE 1 DEAL 9",
str_detect(B, "W1D5") ~ "WELL 1 DEAL 5",
TRUE ~ NA_character_)) %>%
group_by(D, A) %>% 
summarise(C = sum(C)) %>% 
select(A, D, C)
A     D                 C
<chr> <chr>         <int>
1 NP    ALL              17
2 NI    CASE 1 DEAL 2     8
3 N1    CASE 1 DEAL 9     9
4 NI    CASE 1 DEAL 9     6
5 N1    WELL 1 DEAL 5     5
6 NN    WELL 1 DEAL 5     8

我们可以创建一个键/值数据集并执行fuzzyjoin

library(dplyr)
library(fuzzyjoin)
keydat <- tibble(B2 = c("All", "C1_D", "C1D9", "W1D5"),
D = c("ALL", "CASE 1 DEAL 2", "CASE 1 DEAL 9", "WELL 1 DEAL 5"))
regex_left_join(df, keydat, by = c("B" = "B2")) %>%
select(-B2) %>%
group_by(D, A) %>% 
summarise(C = sum(as.numeric(C)), .groups = 'drop')
# A tibble: 6 x 3
D             A         C
<chr>         <chr> <dbl>
1 ALL           NP       17
2 CASE 1 DEAL 2 NI        8
3 CASE 1 DEAL 9 N1        9
4 CASE 1 DEAL 9 NI        6
5 WELL 1 DEAL 5 N1        5
6 WELL 1 DEAL 5 NN        8

最新更新