我有一个这样的数据集(这个问题已经解决,我必须删除数据集,因为它有点敏感(
我当前的代码是:
ea2<-ea%>%
select("ea_no","ea_actual","incidence_cases2012","incidence_cases2013","incidence_cases2014","incidence_cases2016","cumulative_incidence_2014","cumulative_incidence_2016")%>%
pivot_longer(
cols = c("incidence_cases2012","incidence_cases2013","incidence_cases2014","incidence_cases2016","cumulative_incidence_2014"),
names_to = "year",
values_to ="incidence_cases"
)%>%
mutate(year=str_sub(year, 16,19)) %>%
pivot_longer(
cols = c("cumulative_incidence_2014","cumulative_incidence_2016"),
names_to = "year2",
values_to = "cumulative_incidence"
) %>% mutate(year2=str_sub(year2, 22,25))
但问题是,我不能创建一个col:;年";,并将来自两个不同变量的相同年份值放在同一ea_no中。现在输出有两年的cols(year和year2(,因为我只是简单地运行两次pivot长代码。想法输出如下:
ea_no. year. cumulative_incidence. incidence
1. 2012. xxx xxxx
2. 2014. na xxxx
3. 2016 xxx xxxx
我当前的代码输出如下:
ea_no. year. cumulative_incidence. year2 incidence
1. 2012. xxx 2012 xxxx
1. na 2014. xxxx
2. 2016 xxx na na
2. 2012 xxxx
3. 2014. na 2012 xxxx
3. 2016 xxx 2014 xxxx
有人帮忙找出解决方案吗?顺便说一句,感谢您帮助如何从var名称中巧妙地子集年份(现在我只使用具有固定距离的str_sub,但它不能用于具有不同距离的var(?非常感谢!
不太确定这是否是您所需要的,因为所需的输出似乎不完整。重要的一点是使用names_pattern
属性;在这里,您可以为新列定义regex模式,即两个捕获组:
(incidence_cases|cumulative_incidence)
,它将匹配并拆分为列,所有列与incidence_cases
或与cumulative_incidence
和(\d+)
,它将匹配并为year
创建一个新列
解决方案:
ea %>%
select(matches("ea|incidence")) %>%
pivot_longer(
cols = matches("incidence"),
names_to = c(".value", "year"),
names_pattern = "(incidence_cases|cumulative_incidence)_?(\d+)"
)
# A tibble: 408 × 5
ea_no ea_actual year incidence_cases cumulative_incidence
<dbl> <chr> <chr> <dbl> <dbl>
1 10499003 "" 2012 NA NA
2 10499003 "" 2013 NA NA
3 10499003 "" 2014 NA NA
4 10499003 "" 2016 NA NA
5 10499004 "" 2012 NA NA
6 10499004 "" 2013 NA NA
7 10499004 "" 2014 NA NA
8 10499004 "" 2016 NA NA
9 10499005 "RV01" 2012 0 NA
10 10499005 "RV01" 2013 0.00726 NA