如何计算R中包含早于另一行的值的行数

  • 本文关键字:包含早 于另一 何计算 计算 r
  • 更新时间 :
  • 英文 :


我已经使用read.table将数据从CSV导入RStudio数据的类型是";列表";看起来像这样:

时间1
客户端 目标1 目标2
123 0 1 9:00
12 1 0 9:15
234 1 0 9:12
234 09:30

我们可以用所需的逻辑对按客户端分组的数据进行筛选,然后用summarisen_distinct()进行筛选。将时间列更改为小时:分钟时间格式很重要,我们可以使用lubridate::hm()

library(dplyr)
d %>%
mutate(Time = lubridate::hm(Time)) %>%
group_by(Client) %>%
filter(any(Goal2==1 & Time > Time[Goal1==1])) %>%
ungroup() %>%
summarise(n = n_distinct(Client))
# A tibble: 1 × 1
n
<int>
1     1

这里有一些关键内容:

  1. pivot_longer以将不同的Goals获得到单个列中
  2. Time转换为实际的时间格式,这样您就可以计算出哪个时间更早
library(tidyverse)
d <-
read.table(header = T,
text = "Client Goal1   Goal2   Time
123   0   1   9:00
123   1   0   9:15
234   1   0   9:12
234   0   1   9:30")
d %>%
pivot_longer(
starts_with("Goal"),
names_to = "Goal",
values_to = "is_goal",
names_prefix = "Goal"
) %>%
mutate(n_clients = length(unique(Client))) %>% # to keep for later as denominator of percentage
mutate(Goal = as.integer(Goal)) %>% # turn to numeric so you can assess who got both
filter(is_goal > 0) %>% # remove empty entries
mutate(Time = hm(Time)) %>% # convert to time to calculate what was first
group_by(Client) %>% # operate per-client
filter(sum(Goal) == 3) %>%  # remove clients who didn't achieve both goals
mutate(in_order = Time[Goal == 1] < Time[Goal == 2]) %>% # score whether goal 2 was after 1
ungroup() %>%
filter(in_order) %>% # remove clients who were not in order
distinct(Client, n_clients) %>%
summarise(percentage = 100 * nrow(.) / n_clients) # summarize as percentage
#> # A tibble: 1 x 1
#>   percentage
#>        <dbl>
#> 1         50

创建于2021-12-28由reprex包(v0.3.0(

最新更新