通过 sapply 或 lapply 函数向下复制多个数据框行,而不是在 R 中复制 for 循环



我需要遍历仓库项目数据并将该数据重复粘贴到特定月份。在我的实际应用程序中,我正在浏览 500k 行数据,我的函数需要 5 分钟才能运行,这并不实用。

我需要一种方法来使用某种 dplyr apply 函数来做同样的事情,最好是 sapply 或任何可以输出数据帧的东西。以下是向您展示概念的示例数据:

library(lubridate)  
# Item Data Frame
item.df <- data.frame(Item = c("A1","A2","A3","A4","A5"), 
Gross_Profit = c(15,20,8,18,29),
Launch_Date = c("2001-04-01","2001-04-05","2003-11-03","2015-02-
11","2017-06-15"))
# Months Data Frame
five.months <- seq(ymd(paste(year(today()),month(today()),1))-months(5),
ymd(paste(year(today()),month(today()),1))-months(1), 
by = "month")
five.months.df <- data.frame(Month_Floor = five.months)
# Function to copy Item Data for each Month
repeat.item <- function(char.item,frame.months){
df.item = NULL
for(i in 1:nrow(char.item)){
Item <- rep(char.item[i,1],nrow(frame.months))
Launch_Date <- rep(char.item[i,3],nrow(frame.months))
df.col = frame.months
df.col = cbind(df.col,Item, Launch_Date)    
df.item <- rbind(df.item, df.col) 
}  
return(df.item)
}
# Result
copied.df <- repeat.item(item.df,five.months.df)

以下是可变结果:

> item.df
Item Gross_Profit Launch_Date
1   A1           15  2001-04-01
2   A2           20  2001-04-05
3   A3            8  2003-11-03
4   A4           18  2015-02-11
5   A5           29  2017-06-15
> five.months.df
Month_Floor
1  2017-03-01
2  2017-04-01
3  2017-05-01
4  2017-06-01
5  2017-07-01
> copied.df
Month_Floor Item Launch_Date
1   2017-03-01   A1  2001-04-01
2   2017-04-01   A1  2001-04-01
3   2017-05-01   A1  2001-04-01
4   2017-06-01   A1  2001-04-01
5   2017-07-01   A1  2001-04-01
6   2017-03-01   A2  2001-04-05
7   2017-04-01   A2  2001-04-05
8   2017-05-01   A2  2001-04-05
9   2017-06-01   A2  2001-04-05
10  2017-07-01   A2  2001-04-05
11  2017-03-01   A3  2003-11-03
12  2017-04-01   A3  2003-11-03
13  2017-05-01   A3  2003-11-03
14  2017-06-01   A3  2003-11-03
15  2017-07-01   A3  2003-11-03
16  2017-03-01   A4  2015-02-11
17  2017-04-01   A4  2015-02-11
18  2017-05-01   A4  2015-02-11
19  2017-06-01   A4  2015-02-11
20  2017-07-01   A4  2015-02-11
21  2017-03-01   A5  2017-06-15
22  2017-04-01   A5  2017-06-15
23  2017-05-01   A5  2017-06-15
24  2017-06-01   A5  2017-06-15
25  2017-07-01   A5  2017-06-15

我认为您可以使用内置的merge函数:

copied.df = merge(five.months.df, item.df, by=NULL);

它实现了两个数据帧之间的交叉连接。如果不需要所有列(如示例所示(,则可以在交叉联接之前使用subset(这应该可以提高性能(

copied.df = merge(five.months.df, subset(item.df, select=c("Item", "Launch_Date")), by=NULL);

最新更新