如何计算R中包含早于另一行的值的行数

我已经使用read.table将数据从CSV导入RStudio数据的类型是"；列表"；看起来像这样：

时间1

客户端	目标1	目标2
123	0	1	9:00
12	1	0	9:15
234	1	0	9:12
234	0	9:30

我们可以用所需的逻辑对按客户端分组的数据进行筛选，然后用summarise和n_distinct()进行筛选。将时间列更改为小时：分钟时间格式很重要，我们可以使用lubridate::hm()

library(dplyr)
d %>%
mutate(Time = lubridate::hm(Time)) %>%
group_by(Client) %>%
filter(any(Goal2==1 & Time > Time[Goal1==1])) %>%
ungroup() %>%
summarise(n = n_distinct(Client))
# A tibble: 1 × 1
n
<int>
1     1

这里有一些关键内容：

pivot_longer以将不同的Goals获得到单个列中
将Time转换为实际的时间格式，这样您就可以计算出哪个时间更早

library(tidyverse)
d <-
read.table(header = T,
text = "Client Goal1   Goal2   Time
123   0   1   9:00
123   1   0   9:15
234   1   0   9:12
234   0   1   9:30")
d %>%
pivot_longer(
starts_with("Goal"),
names_to = "Goal",
values_to = "is_goal",
names_prefix = "Goal"
) %>%
mutate(n_clients = length(unique(Client))) %>% # to keep for later as denominator of percentage
mutate(Goal = as.integer(Goal)) %>% # turn to numeric so you can assess who got both
filter(is_goal > 0) %>% # remove empty entries
mutate(Time = hm(Time)) %>% # convert to time to calculate what was first
group_by(Client) %>% # operate per-client
filter(sum(Goal) == 3) %>%  # remove clients who didn't achieve both goals
mutate(in_order = Time[Goal == 1] < Time[Goal == 2]) %>% # score whether goal 2 was after 1
ungroup() %>%
filter(in_order) %>% # remove clients who were not in order
distinct(Client, n_clients) %>%
summarise(percentage = 100 * nrow(.) / n_clients) # summarize as percentage
#> # A tibble: 1 x 1
#>   percentage
#>        <dbl>
#> 1         50

^{创建于2021-12-28由reprex包(v0.3.0(}

相关内容

最新更新

热门标签：