我确实有一个问题,似乎无法有效解决。
假设我的循环结果如下(这是虚构的(。
library(tidyverse)
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)
也就是说,我只得到1月、5月、8月和11月的结果(因为数据只针对这些日期(。然而,我想将2月、3月、6月的值结转(这些值将得到1月的值。6月和7月将得到5月的值等等。我不能使用"填充",因为R不"知道"有2月、三月…等等(我不能将这些月"变异"成"mytib"。(
我提出了一个解决方案,但不幸的是,它效率很低,而且容易出错。解决方案(换句话说(如下:创建一个包含从1:12开始的月份和所有唯一的shop_id的网格(见下面的代码(,然后用我在1月、5月、8月和11月观察到的商店创建单独的向量。
然后按月份和年份分组(将网格拆分为列表(。然后将所有独特的商店ID减少到我观察到的那些。将它们全部连接起来(使用dplyr::Join_all(,然后左键将原始tibble连接到简化的网格,最后将观察到的值进位(.direction="down"(,这样我就得到了一个tibbe;结果";,这就是我想要的。
尽管我达到了我想要的目标,但我想知道是否有一种解决方案比我笨拙的方法更有效,也更不容易出错。如有任何帮助或提示,我们将不胜感激。
Ps。请不要对我太苛刻,因为我对R.还很陌生
这是完整的代码:
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)
grid <- expand.grid(year = 2019,
month = 1:12,
shop_id = unique(mytib$shop_id))
grid
jan2019 <- mytib %>% filter(year == 2019 & month ==01)
jan2019 <- jan2019$shop_id
may2019 <- mytib %>% filter(year == 2019 & month == 05)
may2019 <- may2019$shop_id
aug2019 <- mytib %>% filter(year == 2019 & month == 08)
aug2019 <-aug2019$shop_id
nov2019 <- mytib %>% filter(year == 2019 & month == 11)
nov2019 <- nov2019$shop_id
my_list <- grid %>% group_by(year, month) %>% group_split()
my_list
my_list[[1]] <- my_list[[1]][my_list[[1]]$shop_id %in% jan2019,] ; my_list[[1]]
my_list[[2]] <- my_list[[2]][my_list[[2]]$shop_id %in% jan2019,] ; my_list[[2]]
my_list[[3]] <- my_list[[3]][my_list[[3]]$shop_id %in% jan2019,] ; my_list[[3]]
my_list[[4]] <- my_list[[4]][my_list[[4]]$shop_id %in% jan2019,] ; my_list[[4]]
my_list[[5]] <- my_list[[5]][my_list[[5]]$shop_id %in% may2019,] ; my_list[[5]]
my_list[[6]] <- my_list[[6]][my_list[[6]]$shop_id %in% may2019,] ; my_list[[6]]
my_list[[7]] <- my_list[[7]][my_list[[7]]$shop_id %in% may2019,] ; my_list[[7]]
my_list[[8]] <- my_list[[8]][my_list[[8]]$shop_id %in% aug2019,] ; my_list[[8]]
my_list[[9]] <- my_list[[9]][my_list[[9]]$shop_id %in% aug2019,] ; my_list[[9]]
my_list[[10]]<- my_list[[10]][my_list[[10]]$shop_id %in% aug2019,];my_list[[10]]
my_list[[11]]<- my_list[[11]][my_list[[11]]$shop_id %in% nov2019,];my_list[[11]]
my_list[[12]]<- my_list[[12]][my_list[[12]]$shop_id %in% nov2019,];my_list[[12]]
result <- plyr::join_all(my_list, type="full")
result
result <- left_join(result, mytib, by=c("year", "month", "shop_id"))
result %>% group_by(shop_id) %>% fill(inventory,.direction = "down") %>% print(n=35)
您的代码很好,现在您知道了每个函数的作用,可能值得重写,因为我使用的基本函数包含在您的代码中。要将所有数据中缺少值的结果添加到,我们可以使用left_join
或right_join
函数,它们都能够匹配某些值并维护第一个或第二个参数中的所有条目。
之后,我们按照shop_id
进行分组,因为我们想分别填写每个商店的值。然后,我们使用tidyr::fill()
填充值,指定要向下填充(即升序(。最后,我们使用filter来去除NA
的结果。
library(tidyverse)
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250,)
grid <- expand.grid(year = 2019,
month = 1:12,
shop_id = unique(mytib$shop_id))
left_join(grid, mytib, by = c("year" = "year", "month" = "month", "shop_id" = "shop_id")) %>%
group_by(shop_id) %>%
fill(inventory, .direction = "down") %>%
filter(!is.na(inventory))
#> # A tibble: 46 x 4
#> # Groups: shop_id [5]
#> year month shop_id inventory
#> <dbl> <dbl> <chr> <dbl>
#> 1 2019 1 A 200
#> 2 2019 2 A 200
#> 3 2019 3 A 200
#> 4 2019 4 A 200
#> 5 2019 5 A 250
#> 6 2019 6 A 250
#> 7 2019 7 A 250
#> 8 2019 8 A 300
#> 9 2019 9 A 300
#> 10 2019 10 A 300
#> # ... with 36 more rows
创建于2021-04-07由reprex包(v2.0.0(
library(tidyverse)
mytib <- tribble(
~year, ~month, ~shop_id, ~inventory,
2019, 01, "A", 200,
2019, 01, "B", 300,
2019, 01, "C", 240,
2019, 05, "A", 250,
2019, 05, "B", 400,
2019, 05, "D", 400,
2019, 08, "A", 300,
2019, 08, "B", 250,
2019, 08, "C", 200,
2019, 11, "A", 300,
2019, 11, "E", 250)
grid <- expand.grid(year = 2019,
month = 1:12,
shop_id = unique(mytib$shop_id))
grid %>%
left_join(mytib)%>% # Since the columns are the same, left_join will automatically match the columns
group_by(shop_id) %>% # Apply following function on each shop_id
fill(inventory,.direction = "down") %>% #Since the data is ordered descending in time, direction="down" will use the past value to substitute the future value
ungroup() %>% # Remove the grouped condition
filter(!is.na(inventory)) #If the store just appear in later time, the earlier time is null, we want to remove this
# A tibble: 46 x 4
year month shop_id inventory
<dbl> <dbl> <chr> <dbl>
1 2019 1 A 200
2 2019 2 A 200
3 2019 3 A 200
4 2019 4 A 200
5 2019 5 A 250
6 2019 6 A 250
7 2019 7 A 250
8 2019 8 A 300
9 2019 9 A 300
10 2019 10 A 300
# … with 36 more rows
以下是使用tidyr::complete
:的方法
library(tidyverse)
mytib %>%
complete(month = 1:12, year, shop_id) %>%
group_by(shop_id) %>%
fill(inventory, .direction = "down") %>%
mutate(inventory = replace_na(inventory, 0))
## A tibble: 60 x 4
## Groups: shop_id [5]
# month year shop_id inventory
# <dbl> <dbl> <chr> <dbl>
# 1 1 2019 A 200
# 2 1 2019 B 300
# 3 1 2019 C 240
# 4 1 2019 D 0
# 5 1 2019 E 0
# 6 2 2019 A 200
# 7 2 2019 B 300
# 8 2 2019 C 240
# 9 2 2019 D 0
#10 2 2019 E 0
## … with 50 more rows