r语言 - 在数据框架中添加列以显示最新的描述



我今天花了几个小时来找到一个解决方案,有类似的线程在那里,但不完全是我需要的。

数据集:

Year          <- c(2019, 2020, 2021, 2019, 2020, 2020, 2021, 2021)
Term          <- c("2019_T1", "2020_T1", "2021_T1", "2019_T1", "2020_T1", "2020_T2", "2021_T1", "2021_T2")
Code          <- c(1,1,1,2,2,2,2,2)
Description   <- c("Desc1","Desc1","Desc1", "Desc2", "Desc2", "Desc2", "Desc2_NotRecent","Desc2_Recent")

生成如下表:

Year    Term Code     Description
1 2019 2019_T1    1           Desc1
2 2020 2020_T1    1           Desc1
3 2021 2021_T1    1           Desc1
4 2019 2019_T1    2           Desc2
5 2020 2020_T1    2           Desc2
6 2020 2020_T2    2           Desc2
7 2021 2021_T1    2 Desc2_NotRecent
8 2021 2021_T2    2    Desc2_Recent

问题:如何添加一列来显示每个代码的最新描述。

我需要根据Term找到最近的。也许这可以通过一个简单的排序来完成,对不起,我没有弄清楚这个。

重要的是它是最近的Term值。这里,最近的Term是2021_T2。如果第一个值被选中,它可能是一个旧的描述,并混淆涉众。

我需要的结果:

Year    Term Code     Description  Most_Recent
1 2019 2019_T1    1           Desc1        Desc1
2 2020 2020_T1    1           Desc1        Desc1
3 2021 2021_T1    1           Desc1        Desc1
4 2019 2019_T1    2           Desc2 Desc2_Recent
5 2020 2020_T1    2           Desc2 Desc2_Recent
6 2020 2020_T2    2           Desc2 Desc2_Recent
7 2021 2021_T1    2 Desc2_NotRecent Desc2_Recent
8 2021 2021_T2    2    Desc2_Recent Desc2_Recent
非常感谢所有的帮助。编辑后包括Robin Gertenbach的简单解决方案。
df %>%
group_by(Code) %>%
dplyr:: mutate(Most_Recent = dplyr::last(Description, Term))

更短的解决方案,没有连接或条件

df %>%
group_by(Code) %>%
mutate(Most_Recent = last(Description, Term))
library(tidyverse)
Year <- c(2019, 2020, 2021, 2019, 2020, 2020, 2021)
Term <- c("2019_T1", "2020_T1", "2021_T1", "2019_T1", "2020_T1", "2020_T2", "2021_T2")
Code <- c(1, 1, 1, 2, 2, 2, 2)
Description <- c("Desc1", "Desc1", "Desc1", "Desc2", "Desc2", "Desc2", "Desc2_Recent")
df <- data.frame(cbind(Year, Term, Code, Description))
df <- df %>%
mutate(Year = Year %>% as.integer())
most_recent_descriptions <-
df %>%
group_by(Code) %>%
arrange(-Year) %>%
slice(1) %>%
transmute(
Code,
Most_Recent = Description
)
df %>%
left_join(most_recent_descriptions)
#> Joining, by = "Code"
#>   Year    Term Code  Description  Most_Recent
#> 1 2019 2019_T1    1        Desc1        Desc1
#> 2 2020 2020_T1    1        Desc1        Desc1
#> 3 2021 2021_T1    1        Desc1        Desc1
#> 4 2019 2019_T1    2        Desc2 Desc2_Recent
#> 5 2020 2020_T1    2        Desc2 Desc2_Recent
#> 6 2020 2020_T2    2        Desc2 Desc2_Recent
#> 7 2021 2021_T2    2 Desc2_Recent Desc2_Recent

由reprex包(v2.0.1)于2021-09-13创建

Term的值拆分为'Year''Term',将arrange的数据拆分为YearTerm,每个Code得到last的值。

library(dplyr)
library(tidyr)
df %>%
separate(Term, c('Year', 'Term'), sep = '_', convert = TRUE) %>%
arrange(Code, Year, order(gtools::mixedorder(Term))) %>%
group_by(Code) %>%
mutate(Most_Recent = last(Description)) %>%
ungroup
#   Code  Year Term  Description     Most_Recent 
#  <dbl> <int> <chr> <chr>           <chr>       
#1     1  2019 T1    Desc1           Desc1       
#2     1  2020 T1    Desc1           Desc1       
#3     1  2021 T1    Desc1           Desc1       
#4     2  2019 T1    Desc2           Desc2_Recent
#5     2  2020 T1    Desc2           Desc2_Recent
#6     2  2020 T2    Desc2           Desc2_Recent
#7     2  2021 T1    Desc2_NotRecent Desc2_Recent
#8     2  2021 T2    Desc2_Recent    Desc2_Recent

如果您希望YearTerm的值再次出现在一列中,您可以添加unite来执行上述链。

...ungroup %>% unite(Term, Year, Term)

更新澄清后:见注释:

library(tidyverse)
df %>% 
group_by(Code) %>% 
separate(Term, c("Year", "T"), sep = "_", remove = FALSE) %>% 
mutate(T = parse_number(T),
Most_recent = ifelse(Year == max(Year) & T == max(T), Description, NA)
) %>% 
fill(Most_recent, .direction = "up")

输出:

Term    Code  Year      T Description     Most_recent 
<chr>   <chr> <chr> <dbl> <chr>           <chr>       
1 2019_T1 1     2019      1 Desc1           Desc1       
2 2020_T1 1     2020      1 Desc1           Desc1       
3 2021_T1 1     2021      1 Desc1           Desc1       
4 2019_T1 2     2019      1 Desc2           Desc2_Recent
5 2020_T1 2     2020      1 Desc2           Desc2_Recent
6 2020_T2 2     2020      2 Desc2           Desc2_Recent
7 2021_T1 2     2021      1 Desc2_NotRecent Desc2_Recent
8 2021_T2 2     2021      2 Desc2_Recent    Desc2_Recent

第一次回答:以下是tidyverse方法,您可以获得您的结果:

library(dplyr)
library(tidyr)
df %>% 
group_by(Code) %>% 
mutate(Year = as.numeric(Year),
Most_recent = ifelse(Year == max(Year), Description, NA)
) %>% 
fill(Most_recent, .direction = "up")

输出:

Year Term    Code  Description  Most_recent 
<dbl> <chr>   <chr> <chr>        <chr>       
1  2019 2019_T1 1     Desc1        Desc1       
2  2020 2020_T1 1     Desc1        Desc1       
3  2021 2021_T1 1     Desc1        Desc1       
4  2019 2019_T1 2     Desc2        Desc2_Recent
5  2020 2020_T1 2     Desc2        Desc2_Recent
6  2020 2020_T2 2     Desc2        Desc2_Recent
7  2021 2021_T2 2     Desc2_Recent Desc2_Recent

最新更新