有没有办法将行添加到默认继承前一行值的 r 数据帧?



提前感谢您回答这个问题,我将在这里接受有关改进问题的提示,因为这是我第一次!

我正在将数据从我们的SQL Server拉入r-Notebook的数据帧中,该数据帧附加到正在接受季度评估的客户,该评估从2015年第四季度一直持续到2018年第二季度。问题是,评估并不总是进行,因此数据存在差距。例如,我目前将创建一个如下所示的数据帧:

client name | assessment date | assessment value
client 1    | 2015 Q4         | Green
client 1    | 2018 Q1         | Green
client 2    | 2015 Q4         | Yellow
client 2    | 2016 Q2         | Green
client 2    | 2016 Q4         | Green
client 2    | 2017 Q1         | Yellow

基本上,我需要每个客户名称都有2015年第四季度和2018年第二季度之间每个季度的记录。我将假设尚未进行评估,那么上一次评估的评估值将是默认值。数据帧最终应如下所示:

client name | assessment date | assessment value
client 1    | 2015 Q4         | Green
client 1    | 2016 Q1         | Green
client 1    | 2016 Q2         | Green
client 1    | 2016 Q3         | Green
client 1    | 2016 Q4         | Green
client 1    | 2017 Q1         | Green
client 1    | 2017 Q2         | Green
client 1    | 2017 Q3         | Green
client 1    | 2017 Q4         | Green
client 1    | 2018 Q1         | Green
client 1    | 2018 Q2         | Green
client 2    | 2015 Q4         | Yellow
client 2    | 2016 Q1         | Yellow
client 2    | 2016 Q2         | Green
client 2    | 2016 Q3         | Green
client 2    | 2016 Q4         | Green
client 2    | 2017 Q1         | Yellow
client 2    | 2017 Q2         | Yellow
client 2    | 2017 Q3         | Yellow
client 2    | 2017 Q4         | Yellow
client 2    | 2018 Q1         | Yellow
client 2    | 2018 Q2         | Yellow

谢谢!

根据@MrFlick的建议,我想试一试,因为我以前没有使用过expand

library(tidyr)
library(dplyr)
library(zoo)
df <- data.table::fread("client name | assessment date | assessment value
client 1    | 2015 Q4         | Green
client 1    | 2018 Q1         | Green
client 2    | 2015 Q4         | Yellow
client 2    | 2016 Q2         | Green
client 2    | 2016 Q4         | Green
client 2    | 2017 Q1         | Yellow")
df <- df %>% 
mutate(qtr = as.yearqtr(`assessment date`))
df2 <- expand(df,  client = `client name`,
qtr = seq(min(qtr), max(qtr), by = 0.25)) %>%
arrange(client, qtr)
df2 %>% 
mutate(qtr = as.character(qtr)) %>%
left_join(df %>% mutate(qtr = as.character(qtr)),
by = c('client' = 'client name', 'qtr' = 'qtr')) %>%
group_by(client) %>%
fill(`assessment value`) %>%
select(-`assessment date`)
# A tibble: 20 x 3
# Groups:   client [2]
client   qtr     `assessment value`
<chr>    <chr>   <chr>             
1 client 1 2015 Q4 Green             
2 client 1 2016 Q1 Green             
3 client 1 2016 Q2 Green             
4 client 1 2016 Q3 Green             
5 client 1 2016 Q4 Green             
6 client 1 2017 Q1 Green             
7 client 1 2017 Q2 Green             
8 client 1 2017 Q3 Green             
9 client 1 2017 Q4 Green             
10 client 1 2018 Q1 Green             
11 client 2 2015 Q4 Yellow            
12 client 2 2016 Q1 Yellow            
13 client 2 2016 Q2 Green             
14 client 2 2016 Q3 Green             
15 client 2 2016 Q4 Green             
16 client 2 2017 Q1 Yellow            
17 client 2 2017 Q2 Yellow            
18 client 2 2017 Q3 Yellow            
19 client 2 2017 Q4 Yellow            
20 client 2 2018 Q1 Yellow     

我不得不使用yearqtr类型并将其转换为character以保留连接期间的所有信息。可能有一种更干净的方法可以做到这一点,但希望它能为你指明正确的方向。

最新更新