我需要遍历仓库项目数据并将该数据重复粘贴到特定月份。在我的实际应用程序中,我正在浏览 500k 行数据,我的函数需要 5 分钟才能运行,这并不实用。
我需要一种方法来使用某种 dplyr apply 函数来做同样的事情,最好是 sapply 或任何可以输出数据帧的东西。以下是向您展示概念的示例数据:
library(lubridate)
# Item Data Frame
item.df <- data.frame(Item = c("A1","A2","A3","A4","A5"),
Gross_Profit = c(15,20,8,18,29),
Launch_Date = c("2001-04-01","2001-04-05","2003-11-03","2015-02-
11","2017-06-15"))
# Months Data Frame
five.months <- seq(ymd(paste(year(today()),month(today()),1))-months(5),
ymd(paste(year(today()),month(today()),1))-months(1),
by = "month")
five.months.df <- data.frame(Month_Floor = five.months)
# Function to copy Item Data for each Month
repeat.item <- function(char.item,frame.months){
df.item = NULL
for(i in 1:nrow(char.item)){
Item <- rep(char.item[i,1],nrow(frame.months))
Launch_Date <- rep(char.item[i,3],nrow(frame.months))
df.col = frame.months
df.col = cbind(df.col,Item, Launch_Date)
df.item <- rbind(df.item, df.col)
}
return(df.item)
}
# Result
copied.df <- repeat.item(item.df,five.months.df)
以下是可变结果:
> item.df
Item Gross_Profit Launch_Date
1 A1 15 2001-04-01
2 A2 20 2001-04-05
3 A3 8 2003-11-03
4 A4 18 2015-02-11
5 A5 29 2017-06-15
> five.months.df
Month_Floor
1 2017-03-01
2 2017-04-01
3 2017-05-01
4 2017-06-01
5 2017-07-01
> copied.df
Month_Floor Item Launch_Date
1 2017-03-01 A1 2001-04-01
2 2017-04-01 A1 2001-04-01
3 2017-05-01 A1 2001-04-01
4 2017-06-01 A1 2001-04-01
5 2017-07-01 A1 2001-04-01
6 2017-03-01 A2 2001-04-05
7 2017-04-01 A2 2001-04-05
8 2017-05-01 A2 2001-04-05
9 2017-06-01 A2 2001-04-05
10 2017-07-01 A2 2001-04-05
11 2017-03-01 A3 2003-11-03
12 2017-04-01 A3 2003-11-03
13 2017-05-01 A3 2003-11-03
14 2017-06-01 A3 2003-11-03
15 2017-07-01 A3 2003-11-03
16 2017-03-01 A4 2015-02-11
17 2017-04-01 A4 2015-02-11
18 2017-05-01 A4 2015-02-11
19 2017-06-01 A4 2015-02-11
20 2017-07-01 A4 2015-02-11
21 2017-03-01 A5 2017-06-15
22 2017-04-01 A5 2017-06-15
23 2017-05-01 A5 2017-06-15
24 2017-06-01 A5 2017-06-15
25 2017-07-01 A5 2017-06-15
我认为您可以使用内置的merge
函数:
copied.df = merge(five.months.df, item.df, by=NULL);
它实现了两个数据帧之间的交叉连接。如果不需要所有列(如示例所示(,则可以在交叉联接之前使用subset
(这应该可以提高性能(
copied.df = merge(five.months.df, subset(item.df, select=c("Item", "Launch_Date")), by=NULL);