我有一个来自两个气象站的温度和时间数据库,看起来像这样:
# A tibble: 6 × 7
Station Date Time Temperature Tmin Tmed Tmax
<chr> <date> <time> <dbl> <dbl> <dbl> <dbl>
1 F 2021-10-15 00:11:46 16.8 15.2 17.1 20.4
2 F 2021-10-15 00:41:46 16.5 15.2 17.1 20.4
3 F 2021-10-15 01:11:46 16.2 15.2 17.1 20.4
4 F 2021-10-15 01:41:46 15.6 15.2 17.1 20.4
5 F 2021-10-15 02:11:46 15.9 15.2 17.1 20.4
6 F 2021-10-15 02:41:46 16.1 15.2 17.1 20.4
以下是通过dput()
:获得的前两天的可复制示例(对不起,我知道这是一团糟(
structure(list(Station = c("F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"), Date = structure(c(18915,
18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915,
18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915,
18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915,
18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915,
18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915, 18915,
18915, 18915, 18916, 18916, 18916, 18916, 18916, 18916, 18916,
18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916,
18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916,
18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916,
18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916, 18916,
18916, 18916, 18916, 18916, 18916), class = "Date"), Time = structure(c(706,
2506, 4306, 6106, 7906, 9706, 11506, 13306, 15106, 16906, 18706,
20506, 22306, 24106, 25906, 27706, 29506, 31306, 33106, 34906,
36706, 38506, 40306, 42106, 43906, 45706, 47506, 49306, 51106,
52906, 54706, 56506, 58306, 60106, 61906, 63706, 65506, 67306,
69106, 70906, 72706, 74506, 76306, 78106, 79906, 81706, 83506,
85306, 706, 2506, 4306, 6106, 7906, 9706, 11506, 13306, 15106,
16906, 18706, 20506, 22306, 24106, 25906, 27706, 29506, 31306,
33106, 34906, 36706, 38506, 40306, 42106, 43906, 45706, 47506,
49306, 51106, 52906, 54706, 56506, 58306, 60106, 61906, 63706,
65506, 67306, 69106, 70906, 72706, 74506, 76306, 78106, 79906,
81706, 83506, 85306), class = c("hms", "difftime"), units = "secs"),
Temperature = c(16.8, 16.5, 16.2, 15.6, 15.9, 16.1, 16.4,
16.2, 16, 16, 16.2, 16.2, 15.9, 16, 16, 16.4, 16.2, 16.5,
16.1, 16.4, 16.8, 16.6, 18.6, 16.9, 18.6, 19.5, 18.5, 18.5,
20.4, 19.1, 19.8, 19.7, 18.1, 17.4, 17.4, 16.9, 15.8, 16.8,
16.9, 16.8, 17, 15.2, 16.2, 17.4, 18.1, 18.3, 18, 17.9, 17.6,
17.9, 17.7, 17.7, 17.7, 17.8, 18.1, 18.3, 18.1, 16.2, 18,
18.8, 18.6, 19.1, 18.9, 17.9, 16.2, 17.3, 19.3, 20.2, 20.7,
20.9, 22.2, 22.3, 21.2, 21.1, 20.1, 23.3, 21.4, 20.2, 19.8,
18.9, 19.8, 20.1, 20.4, 19.5, 18.8, 18, 17.9, 17.9, 17.8,
18, 17.9, 16.5, 16.8, 16.5, 16.7, 16.7), Tmin = c(15.2, 15.2,
15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2,
15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2,
15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2,
15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2,
15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 16.2, 16.2, 16.2, 16.2,
16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2,
16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2,
16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2,
16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2, 16.2,
16.2, 16.2, 16.2, 16.2), Tmed = c(17.1, 17.1, 17.1, 17.1,
17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1,
17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1,
17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1,
17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1, 17.1,
17.1, 17.1, 17.1, 17.1, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333, 18.8083333333333, 18.8083333333333,
18.8083333333333, 18.8083333333333), Tmax = c(20.4, 20.4,
20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4,
20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4,
20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4,
20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 20.4,
20.4, 20.4, 20.4, 20.4, 20.4, 20.4, 23.3, 23.3, 23.3, 23.3,
23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3,
23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3,
23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3,
23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3, 23.3,
23.3, 23.3, 23.3, 23.3)), row.names = c(NA, -96L), class = c("tbl_df",
"tbl", "data.frame"))
我想加一列,告诉我在给定时间的温度是否接近每日最低温度。
最好的方法似乎是dplyr::between
函数,我试着这样写:
TimeTempReprod %>%
group_by(Date, Station) %>%
mutate(y = between(Temperature, Tmin, Tmin + 2))
当我运行此代码时,我在控制台中得到的是:
Error in `mutate()`:
! Problem while computing `y = dplyr::between(Temperature, Tmin, Tmin + 2)`.
ℹ The error occurred in group 1: Date = 2021-10-15, Station = "F".
Caused by error in `dplyr::between()`:
! `left` must be length 1
我试图寻找这个问题的答案,但在其他地方找不到与between
函数相关的答案。。。
我希望这个问题可以理解,如果有问题,我很抱歉。这是我学习了两年后发布到stackexchange的第一个问题,所以我仍然需要学习如何正确使用它。感谢谁会抽出时间来帮助我!
您需要捕获一个值,而Tmin
捕获每组值的整个向量,因此要解决此问题,您可以使用从向量中取出一个值的函数。由于矢量由相同的值组成,因此许多函数都可以工作,例如min
或first
:
TimeTempReprod %>%
group_by(Date, Station) %>%
mutate(y = between(Temperature, min(Tmin), min(Tmin) + 2))
发出:
# A tibble: 96 × 8
# Groups: Date, Station [2]
Station Date Time Temperature Tmin Tmed Tmax y
<chr> <date> <time> <dbl> <dbl> <dbl> <dbl> <lgl>
1 F 2021-10-15 00:11:46 16.8 15.2 17.1 20.4 TRUE
2 F 2021-10-15 00:41:46 16.5 15.2 17.1 20.4 TRUE
3 F 2021-10-15 01:11:46 16.2 15.2 17.1 20.4 TRUE
4 F 2021-10-15 01:41:46 15.6 15.2 17.1 20.4 TRUE
5 F 2021-10-15 02:11:46 15.9 15.2 17.1 20.4 TRUE
6 F 2021-10-15 02:41:46 16.1 15.2 17.1 20.4 TRUE
7 F 2021-10-15 03:11:46 16.4 15.2 17.1 20.4 TRUE
8 F 2021-10-15 03:41:46 16.2 15.2 17.1 20.4 TRUE
9 F 2021-10-15 04:11:46 16 15.2 17.1 20.4 TRUE
10 F 2021-10-15 04:41:46 16 15.2 17.1 20.4 TRUE
# … with 86 more rows
我遇到了类似的问题,但我不想对数据进行分组。我想比较A列的值是否在B列和C列之间。
df %>%
mutate(is_between = between(A, B, C))
然而,这导致了一个类似的错误,将我带到了这个线程。
解决方案是执行逐行计算:
df %>%
rowwise() %>%
mutate(is_between = between(A, B, C)) %>%
ungroup() # Removes row-wise calculation mode
这产生了预期的结果。