如何重塑r中的一部分数据



我有一个看起来像这样的数据集:

df <- tibble::tribble(
~subcateg, ~names,
"A00", "Kidney failure",
"A001", "Kidney failure reason1",
"A002", "Kidney failure reason2",
"A003", "Kidney failure reason3",
"B00", "Heart failure",
"B001", "Heart failure reason1",
"B002", "Heart failure reason2",
"B003", "Heart failure reason3",
"B00", "Lung failure",
"B001", "Lung failure reason1",
"B002", "Lung failure reason2",
"B003", "Lung failure reason3",
)

它在同一变量中具有类别(3个字符)和子类别(4个字符),我需要另一个具有3个字符类别的变量。我希望它看起来像这样:

df2 <- tibble::tribble(
~subcateg, ~names, ~categ, ~names2,
"A001", "Kidney failure reason1", "A00", "Kidney failure",
"A002", "Kidney failure reason2","A00", "Kidney failure",
"A003", "Kidney failure reason3","A00", "Kidney failure",
"B001", "Heart failure reason1",  "B00", "Heart failure",
"B002", "Heart failure reason2",  "B00", "Heart failure",
"B003", "Heart failure reason3",  "B00", "Heart failure",
"B001", "Lung failure reason1",  "B00", "Lung failure",
"B002", "Lung failure reason2",  "B00", "Lung failure",
"B003", "Lung failure reason3",  "B00", "Lung failure",
)

任何想法?非常感谢!

我们根据'subcateg'中出现的3个字符(nchar)创建一个分组变量,将'categ'创建为'subcateg'的first元素,删除第一行(slice),并通过从'names'列中删除后跟数字的reason子字符串来创建'names2'

library(dplyr)
library(stringr)
df %>% 
group_by(grp = cumsum(nchar(subcateg) == 3)) %>%  
mutate(categ = first(subcateg)) %>% 
slice(if(n() == 1) 1 else -1)  %>% 
ungroup %>% 
select(-grp) %>%
mutate(names2 = str_remove(names, "\s+reason\d+"))

与产出

# A tibble: 9 × 4
subcateg names                  categ names2        
<chr>    <chr>                  <chr> <chr>         
1 A001     Kidney failure reason1 A00   Kidney failure
2 A002     Kidney failure reason2 A00   Kidney failure
3 A003     Kidney failure reason3 A00   Kidney failure
4 B001     Heart failure reason1  B00   Heart failure 
5 B002     Heart failure reason2  B00   Heart failure 
6 B003     Heart failure reason3  B00   Heart failure 
7 B001     Lung failure reason1   B00   Lung failure  
8 B002     Lung failure reason2   B00   Lung failure  
9 B003     Lung failure reason3   B00   Lung failure  

如果肺衰竭类别以C开头(而不是B),这是一个错误吗?——,另一种解决方案如下:

library(tidyr)
library(dplyr)
df %>% 
separate(subcateg,"categ",sep = "[1-9]", extra = "drop", remove = F) %>% 
inner_join(df,by=c("categ" = "subcateg"),suffix = c("", "2")) %>% 
filter(!stringr::str_ends(subcateg,"00")) %>% 
relocate(categ, .after = names)

最新更新