R Tidyverse:结转不存在变量的值



我确实有一个问题,似乎无法有效解决。

假设我的循环结果如下(这是虚构的(。

library(tidyverse)
mytib <- tribble(
~year,  ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)

也就是说,我只得到1月、5月、8月和11月的结果(因为数据只针对这些日期(。然而,我想将2月、3月、6月的值结转(这些值将得到1月的值。6月和7月将得到5月的值等等。我不能使用"填充",因为R不"知道"有2月、三月…等等(我不能将这些月"变异"成"mytib"。(

我提出了一个解决方案,但不幸的是,它效率很低,而且容易出错。解决方案(换句话说(如下:创建一个包含从1:12开始的月份和所有唯一的shop_id的网格(见下面的代码(,然后用我在1月、5月、8月和11月观察到的商店创建单独的向量。

然后按月份和年份分组(将网格拆分为列表(。然后将所有独特的商店ID减少到我观察到的那些。将它们全部连接起来(使用dplyr::Join_all(,然后左键将原始tibble连接到简化的网格,最后将观察到的值进位(.direction="down"(,这样我就得到了一个tibbe;结果";,这就是我想要的。

尽管我达到了我想要的目标,但我想知道是否有一种解决方案比我笨拙的方法更有效,也更不容易出错。如有任何帮助或提示,我们将不胜感激。

Ps。请不要对我太苛刻,因为我对R.还很陌生

这是完整的代码:

mytib <- tribble(
~year,  ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)

grid <- expand.grid(year = 2019, 
month = 1:12, 
shop_id = unique(mytib$shop_id))

grid
jan2019 <- mytib %>% filter(year == 2019 & month ==01)
jan2019 <- jan2019$shop_id
may2019 <- mytib %>% filter(year == 2019 & month == 05)
may2019 <- may2019$shop_id
aug2019 <- mytib %>% filter(year == 2019 & month == 08)
aug2019  <-aug2019$shop_id
nov2019 <- mytib %>% filter(year == 2019 & month == 11)
nov2019 <- nov2019$shop_id

my_list <- grid %>% group_by(year, month) %>% group_split()
my_list

my_list[[1]] <- my_list[[1]][my_list[[1]]$shop_id %in% jan2019,] ; my_list[[1]]
my_list[[2]] <- my_list[[2]][my_list[[2]]$shop_id %in% jan2019,] ; my_list[[2]]
my_list[[3]] <- my_list[[3]][my_list[[3]]$shop_id %in% jan2019,] ; my_list[[3]]
my_list[[4]] <- my_list[[4]][my_list[[4]]$shop_id %in% jan2019,] ; my_list[[4]]
my_list[[5]] <- my_list[[5]][my_list[[5]]$shop_id %in% may2019,] ; my_list[[5]]
my_list[[6]] <- my_list[[6]][my_list[[6]]$shop_id %in% may2019,] ; my_list[[6]]
my_list[[7]] <- my_list[[7]][my_list[[7]]$shop_id %in% may2019,] ; my_list[[7]]
my_list[[8]] <- my_list[[8]][my_list[[8]]$shop_id %in% aug2019,] ; my_list[[8]]
my_list[[9]] <- my_list[[9]][my_list[[9]]$shop_id %in% aug2019,] ; my_list[[9]]
my_list[[10]]<- my_list[[10]][my_list[[10]]$shop_id %in% aug2019,];my_list[[10]]
my_list[[11]]<- my_list[[11]][my_list[[11]]$shop_id %in% nov2019,];my_list[[11]]
my_list[[12]]<- my_list[[12]][my_list[[12]]$shop_id %in% nov2019,];my_list[[12]]
result <- plyr::join_all(my_list, type="full")
result
result <- left_join(result, mytib, by=c("year", "month", "shop_id"))
result %>% group_by(shop_id) %>% fill(inventory,.direction =  "down") %>% print(n=35)

您的代码很好,现在您知道了每个函数的作用,可能值得重写,因为我使用的基本函数包含在您的代码中。要将所有数据中缺少值的结果添加到,我们可以使用left_joinright_join函数,它们都能够匹配某些值并维护第一个或第二个参数中的所有条目。

之后,我们按照shop_id进行分组,因为我们想分别填写每个商店的值。然后,我们使用tidyr::fill()填充值,指定要向下填充(即升序(。最后,我们使用filter来去除NA的结果。

library(tidyverse)

mytib <- tribble(
~year,  ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)

grid <- expand.grid(year = 2019, 
month = 1:12, 
shop_id = unique(mytib$shop_id))
left_join(grid, mytib, by = c("year" = "year", "month" = "month", "shop_id" = "shop_id")) %>%
group_by(shop_id) %>% 
fill(inventory, .direction = "down") %>% 
filter(!is.na(inventory))
#> # A tibble: 46 x 4
#> # Groups:   shop_id [5]
#>     year month shop_id inventory
#>    <dbl> <dbl> <chr>       <dbl>
#>  1  2019     1 A             200
#>  2  2019     2 A             200
#>  3  2019     3 A             200
#>  4  2019     4 A             200
#>  5  2019     5 A             250
#>  6  2019     6 A             250
#>  7  2019     7 A             250
#>  8  2019     8 A             300
#>  9  2019     9 A             300
#> 10  2019    10 A             300
#> # ... with 36 more rows

创建于2021-04-07由reprex包(v2.0.0(

library(tidyverse)
mytib <- tribble(
~year,  ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250)
grid <- expand.grid(year = 2019, 
month = 1:12, 
shop_id = unique(mytib$shop_id))
grid %>% 
left_join(mytib)%>% # Since the columns are the same, left_join will automatically match the columns
group_by(shop_id) %>% # Apply following function on each shop_id
fill(inventory,.direction = "down") %>% #Since the data is ordered descending in time, direction="down" will use the past value to substitute the future value
ungroup() %>% # Remove the grouped condition
filter(!is.na(inventory)) #If the store just appear in later time, the earlier time is null, we want to remove this
# A tibble: 46 x 4
year month shop_id inventory
<dbl> <dbl> <chr>       <dbl>
1  2019     1 A             200
2  2019     2 A             200
3  2019     3 A             200
4  2019     4 A             200
5  2019     5 A             250
6  2019     6 A             250
7  2019     7 A             250
8  2019     8 A             300
9  2019     9 A             300
10  2019    10 A             300
# … with 36 more rows

以下是使用tidyr::complete:的方法

library(tidyverse)
mytib %>%
complete(month = 1:12, year, shop_id) %>%
group_by(shop_id) %>%
fill(inventory, .direction = "down") %>%
mutate(inventory = replace_na(inventory, 0))
## A tibble: 60 x 4
## Groups:   shop_id [5]
#   month  year shop_id inventory
#   <dbl> <dbl> <chr>       <dbl>
# 1     1  2019 A             200
# 2     1  2019 B             300
# 3     1  2019 C             240
# 4     1  2019 D               0
# 5     1  2019 E               0
# 6     2  2019 A             200
# 7     2  2019 B             300
# 8     2  2019 C             240
# 9     2  2019 D               0
#10     2  2019 E               0
## … with 50 more rows

最新更新