我目前正在处理一个大型数据集,该数据集记录多个位置的每日数据,我希望将每日数据汇总为一个输出,在当天给出最大警告级别(类别为红色/黄色/无(。
考虑以下设置:
location = c(rep("A", 4), rep("B", 4), rep("C", 4), rep("D",4) , rep("E", 4))
date = rep(c("19991230", "19991231", "20000101", "20000102"), 5)
warning = c("Red", "None", "None", "None", "Yellow", "None", "Red", "None", "Yellow", "Yellow", "None", "Yellow", "None", "None", "None", "None", "Yellow", "None", "None", "None")
data = data.frame(location, date, warning)
我正在尝试创建一个新列,如果在每个特定的日子没有发生警告,它将显示"无",如果发生一个或多个黄色警告(除非同一天发生一个或者多个"红色"警告(,则显示"黄色",在这种情况下,"红色"输出优先。
我已经考虑过使用日期聚合,但我不确定应用哪个函数。我也尝试过在每次约会中循环,尝试和!计数"无"警告,至少缩小范围,但没有任何运气。也许我需要在日期上使用ifelse和for循环?以下尝试不佳:
aggregate(data, by=date, FUN)
或
data <- data %>%
group_by(date) %>%
mutate(day_warning_type = case_when(
warning != "None" ~ TRUE, TRUE ~ FALSE
)) %>%
ungroup()
希望有人至少能帮助我朝着正确的方向前进,因为到目前为止,我还没有取得多大进展,因为我正在努力了解如何处理角色变量。
您使用group_by
是正确的。创建第二个按日期汇总的数据集,然后将其合并回主数据集,这可能更简单。见下文
# Summarize each date based on number of Yellow/Red/None warnings
data_sum <- data %>%
group_by(date) %>%
summarize(
day_warning_none = length(which(warning == "None")),
day_warning_yellow = length(which(warning == "Yellow")),
day_warning_red = length(which(warning == "Red"))
) %>%
ungroup() %>%
# Create a summary measure
mutate(
day_warning = case_when(
day_warning_red > 0 ~ "Red",
day_warning_yellow > 0 ~ "Yellow",
TRUE ~ "None"
)
)
head(data.sum)
date day_warning_none day_warning_yellow day_warning_red day_warning
<fct> <int> <int> <int> <chr>
1 19991230 1 3 1 Red
2 19991231 4 1 0 Yellow
3 20000101 4 0 1 Red
4 20000102 4 1 0 Yellow
# Merge back in
data2 <- left_join(data, data_sum) %>%
arrange(date)
head(data2, 10)
location date warning day_warning_none day_warning_yellow day_warning_red day_warning
1 A 19991230 Red 1 3 1 Red
2 B 19991230 Yellow 1 3 1 Red
3 C 19991230 Yellow 1 3 1 Red
4 D 19991230 None 1 3 1 Red
5 E 19991230 Yellow 1 3 1 Red
6 A 19991231 None 4 1 0 Yellow
7 B 19991231 None 4 1 0 Yellow
8 C 19991231 Yellow 4 1 0 Yellow
9 D 19991231 None 4 1 0 Yellow
10 E 19991231 None 4 1 0 Yellow
您可以在警告上创建计数,并根据计数创建标志:
data %>%
group_by(date) %>%
mutate(day_warning_type = case_when(
sum(warning == "Red") > 0 ~ "Red",
sum(warning == "Red") == 0 & sum(warning == "Yellow") > 0 ~ "Yellow",
TRUE ~ "None"
)) %>%
ungroup()
# A tibble: 20 x 4
location date warning day_warning_type
<fct> <fct> <fct> <chr>
1 A 19991230 Red Red
2 A 19991231 None Yellow
3 A 20000101 None Red
4 A 20000102 None Yellow
5 B 19991230 Yellow Red
6 B 19991231 None Yellow
7 B 20000101 Red Red
8 B 20000102 None Yellow
9 C 19991230 Yellow Red
10 C 19991231 Yellow Yellow
11 C 20000101 None Red
12 C 20000102 Yellow Yellow
13 D 19991230 None Red
14 D 19991231 None Yellow
15 D 20000101 None Red
16 D 20000102 None Yellow
17 E 19991230 Yellow Red
18 E 19991231 None Yellow
19 E 20000101 None Red
20 E 20000102 None Yellow