我有 da 数据集,其中有一列,其中包含一年内的多个观察结果。我想在后期分析短期和长期变化。为此,我需要定义一个新的变量time
它显示我的观察日期是否在不同的日期之间。 可重现的数据帧:
Dataset <- data.frame = category=c("tools", "finance", "business", "education","tools","education"),
date=c("2017-05-12","2018-06-07","2018-03-28","2018-05-18","2018-07-22","2018-06-03")
number_trackers = c(10, 12, 1, 30, 7, 21),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"),
time=c( , , , , , )
我有不同的列,但只有新变量time
和date
对这个问题很重要。 我尝试在以下代码中执行此操作,但出现错误。
if ( between(DatasetApp$analysis_date, "2018-06-28", "2018-10-28")) {
DatasetApp$time="short-term"
} else if (between (DatasetApp$analysis_date, "2018-06-28", "2018-12-28")) {
DatasetApp$time="long-term"
} else if ( between (DatasetApp$analysis_date, "2017-05-28", "2018-04-28")) {
DatasetApp$time="before"
}
看看cut
函数。你可以做这样的事情:-
Dataset <- data.frame(
category=c("tools", "finance", "business", "education","tools","education"),
date= as.Date(c("2017-05-12","2018-06-07","2018-03-28","2018-05-18","2018-07-22","2018-06-03")),
number_trackers = c(10, 12, 1, 30, 7, 21),
c4 = c("url1.com","ur2.com","url3.com","url4.com","url5.com","url6.com"))
Dataset$time <- cut(Dataset$date,
breaks = as.Date(c("2017-05-28", "2018-06-28", "2018-10-28", "2018-12-28")),
labels = c("before", "short-term", "long-term"))
顺便说一句,我不确定这是否是故意的,您在示例中指定的日期范围似乎不合理。短期和长期日期范围之间存在重叠。
如果您希望范围重叠,则必须确定不同范围的优先级。例如,在您的示例中,日期2018-07-22
同时在short-term
和long-term
范围内。
假设优先级顺序short-term > long-term > before
您可以执行以下操作:-
Dataset$time <- NA
Dataset[Dataset$date >= "2018-06-28" &
Dataset$date < "2018-10-28" &
is.na(Dataset$time), "time"] <- "short-term"
Dataset[Dataset$date >= "2018-06-28" &
Dataset$date < "2018-12-28" &
is.na(Dataset$time), "time"] <- "long-term"
Dataset[Dataset$date >= "2017-05-28" &
Dataset$date < "2018-04-28"&
is.na(Dataset$time), "time"] <- "before"
上面的代码会将long-term
分配给一行,如果它落在指定的日期范围内并且不在short-term
范围内。同样,如果before
落在指定的日期范围内并且不在short-term
或long-term
范围内,则将分配它。