r语言 - 使用新变量对不同的周期进行分组



我有 da 数据集,其中有一列,其中包含一年内的多个观察结果。我想在后期分析短期和长期变化。为此,我需要定义一个新的变量time它显示我的观察日期是否在不同的日期之间。 可重现的数据帧:

Dataset <- data.frame = category=c("tools", "finance", "business", "education","tools","education"), 
date=c("2017-05-12","2018-06-07","2018-03-28","2018-05-18","2018-07-22","2018-06-03")
number_trackers = c(10, 12, 1, 30, 7, 21), 
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
time=c( ,  ,  ,  ,  , )

我有不同的列,但只有新变量timedate对这个问题很重要。 我尝试在以下代码中执行此操作,但出现错误。

if ( between(DatasetApp$analysis_date, "2018-06-28", "2018-10-28")) {
DatasetApp$time="short-term"
} else if (between (DatasetApp$analysis_date,  "2018-06-28", "2018-12-28")) {
DatasetApp$time="long-term"
} else if ( between (DatasetApp$analysis_date,  "2017-05-28", "2018-04-28")) {
DatasetApp$time="before"
}

看看cut函数。你可以做这样的事情:-

Dataset <- data.frame(
category=c("tools", "finance", "business", "education","tools","education"), 
date= as.Date(c("2017-05-12","2018-06-07","2018-03-28","2018-05-18","2018-07-22","2018-06-03")),
number_trackers = c(10, 12, 1, 30, 7, 21), 
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"))
Dataset$time <- cut(Dataset$date, 
breaks = as.Date(c("2017-05-28", "2018-06-28", "2018-10-28", "2018-12-28")),
labels = c("before", "short-term", "long-term"))

顺便说一句,我不确定这是否是故意的,您在示例中指定的日期范围似乎不合理。短期和长期日期范围之间存在重叠。

如果您希望范围重叠,则必须确定不同范围的优先级。例如,在您的示例中,日期2018-07-22同时在short-termlong-term范围内。

假设优先级顺序short-term > long-term > before您可以执行以下操作:-

Dataset$time <- NA
Dataset[Dataset$date >= "2018-06-28" & 
Dataset$date < "2018-10-28" & 
is.na(Dataset$time), "time"] <- "short-term"
Dataset[Dataset$date >= "2018-06-28" & 
Dataset$date < "2018-12-28" & 
is.na(Dataset$time), "time"] <- "long-term"
Dataset[Dataset$date >= "2017-05-28" & 
Dataset$date < "2018-04-28"& 
is.na(Dataset$time), "time"] <- "before"

上面的代码会将long-term分配给一行,如果它落在指定的日期范围内并且不在short-term范围内。同样,如果before落在指定的日期范围内并且不在short-termlong-term范围内,则将分配它。

相关内容

  • 没有找到相关文章

最新更新