我正在寻找一种更有效的方法来创建r中的子集。使用行=产品和列=时间的数据集,我想找到那些行(产品),其中一个项目在第1周开始销售,然后使其成为一个子集。然后在第二周做同样的事情,以此类推。
set.seed(4); d <- data.frame(
product = seq(1:10),
week1= sample(0:1,10,replace=TRUE),
week2= sample(0:3,10,replace=TRUE),
week3=sample(0:5,10,replace=TRUE),
week4= sample(0:5,10,replace=TRUE),speed=sample(100:200,10),quality=sample(20:50,10)
)
完整的数据帧是d,所以我需要知道两件事来找到所有的子集:1)前几周的销售额==0,然后2)本周的销售额不是零。
没有子集应该重叠,因为它们根据产品首次进入市场的时间对其进行分组。
我找到了一个穷人的方法来做这件事,但我知道一定有更好的方法!低效率的方法:
subset3<-d[d$week3 >0 & d$week2==0 & d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 ==0 & d$week2==0 & d$week1==0,]
效率略高,但仍然很差
subset3<-d[d$week3 >0 & d$week2+d$week1==0 ,]
subset4<-d[d$week4 >0 & d$week3 + d$week2 + d$week1==0,]
感觉我应该能够做这样的事情,但它不起作用:
subset4<-d[d$week4 >0 & sum(d$week1:d$week3) ==0, ]
我不认为ddply或apply会在这里工作,但也许我错了?我需要的结果是d的子集,所有列,像这样:
subset3 =
product week1 week2 week3 week4 speed quality
2 0 0 5 1 124 42
3 0 0 3 5 155 45
你可以这样写:
d$weekstart <- apply(d[,-1],1,function(x) which(x>0)[1] )
这将确定每个产品的第一个非零销售周。然后,您可以使用此列拆分数据集,如下所示:
result <- split(d,d$weekstart)
您可以像这样访问每个子集:
result[[1]]
将上述代码中的1
更改为您想要访问的起始周类似于将subset1
更改为subset2
等
我希望我明白你想做什么。这里尝试使用rle
函数。我对每一行应用它。(每个产品)。
ll <- apply(d,1,function(x){
y <- rle(x)
nn <- names(y$lengths[y$values ==0])
vv <- y$lengths[y$values ==0]
if(length(nn)==0)
res <- data.frame(nbr=0,goodweek='week1')
else
res <- data.frame(nbr=vv,goodweek=nn)
})
do.call(rbind,ll)
nbr goodweek
week3 2 week3 ## 2 bad weeks with 0 then week3 is good 0 0 value>0
week31 2 week3
3 0 week1
week4 1 week4
week2 1 week2
6 0 week1 ## all weeks are good
week41 1 week4
8 1 ## the last week is bad! I dont' know what to return here!
9 0 week1
week21 1 week2
这里我用你的d:
d
week1 week2 week3 week4
1 0 0 5 2
2 0 0 1 3
3 1 2 3 2
4 1 1 0 1
5 0 3 1 4
6 1 1 2 4
7 1 2 0 4
8 1 3 2 0
9 1 1 5 4
10 0 3 2 2