如果满足条件,我会使用dplyr
将value
替换为NA
,但它会将NA
放在不应该放的位置。
dput:
df <- structure(list(id = c("USC00231275", "USC00231275", "USC00231275",
"USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275",
"USC00231275", "USC00231275"), element = c("TMAX", "TMIN", "TMAX",
"TMIN", "TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN"), year = c(1937,
1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937), month = c(5,
5, 5, 5, 5, 5, 5, 5, 5, 5), day = c(1, 1, 2, 2, 3, 3, 4, 4, 5,
5), date = structure(c(-11933, -11933, -11932, -11932, -11931,
-11931, -11930, -11930, -11929, -11929), class = "Date"), value = c(0,
53.96, 68, 44.96, 62.06, 53.96, 73.04, 53.96, 69.08, 50)), .Names = c("id",
"element", "year", "month", "day", "date", "value"), row.names = c(NA,
10L), class = "data.frame")
data.frame
(注:条件仅在第1行和第2行满足)
id element year month day date value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96
3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00
4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96
5 USC00231275 TMAX 1937 5 3 1937-05-03 62.06
6 USC00231275 TMIN 1937 5 3 1937-05-03 53.96
7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04
8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96
9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08
10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00
dplyr
df %>%
group_by(date) %>%
mutate(
value = if(value[element == 'TMIN'] >= value[element == 'TMAX'])
as.numeric(NA) else value
)
id element year month day date value
(chr) (chr) (dbl) (dbl) (dbl) (date) (dbl)
1 USC00231275 TMAX 1937 5 1 1937-05-01 NA
2 USC00231275 TMIN 1937 5 1 1937-05-01 NA
3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00
4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96
5 USC00231275 TMAX 1937 5 3 1937-05-03 NA
6 USC00231275 TMIN 1937 5 3 1937-05-03 NA
7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04
8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96
9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08
10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00
请注意,只有1
和2
行应该更改,但dplyr
更改了5
和6
行,即使不满足这些条件。
下面的代码应该做你试图做的
df %>%
group_by(date) %>%
mutate(new_value = ifelse( ( (value[element == 'TMIN'] >= value[element == 'TMAX']) & element=='TMIN'), NA, value)) %>%
ungroup
对于这是否是一个错误的问题,我不认为是。只看一年的数据,其中TMIN>=TMAX,你有以下
df %>%
filter(date == '1937-05-01') %>%
mutate(res = (value[element == 'TMIN'] >= value[element == 'TMAX'])) %>%
mutate(new_value = ifelse( (res & element=='TMIN'), NA, value))
id element year month day date value res new_value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00 TRUE 0
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96 TRUE NA
构造value[element == 'TMIN'] >= value[element == 'TMAX'])
将始终为真,如在res
列中所见。下面的代码对此进行了一些分解,希望能澄清(我希望)。
### Just looking at one date
> df2 <- df %>% filter(date == '1937-05-01')
> df2
id element year month day date value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96
### This comparison will be recycled for every element in the group,
### so it will always be TRUE or always FALSE.
> c(df2$value[df2$element == 'TMIN'], df2$value[df2$element == 'TMAX'])
[1] 53.96 0.00
由于整个组只有一个比较,所以他们总是看到TRUE或FALSE。
给出正确结果的代码显示了如何进行比较。
一个可能的最终解决方案是:
df %>%
group_by(date) %>%
mutate(value = ifelse( ( (value[element == 'TMIN'] >= value[element == 'TMAX']) & element=='TMIN'), NA, value)) %>%
ungroup