如何通过比较R中的两列来计算不同的值?



我想通过比较r中数据框的两列来计算值。

例如:

col1  col2
A      A
A      A
A      B
G      G
G      H
Y      Y
Y      Y
J      P
J      P
J      J
K      L

我希望得到一个输出,显示匹配的计数(如果两列有相同的值)和不匹配的计数(如果两列有不同的值),并显示匹配和不匹配的百分比在下一列

col1   count_match  count_notmatch   percent_match   percent_notmatch
A       2           1                66.66%          33.33%
G       1           1                50.00%          50.00%
Y       2           0                100.00%         0
J       1           2                33.33%          66.66%
K       0           1                0               100%

我如何做到这一点?谢谢你的帮助。

您可以将数据按col1summarise()分组:

library(dplyr)
df %>%
group_by(col1) %>%
summarise(count_match = sum(col1 == col2),
count_nomatch = n() - count_match,
across(contains("match"), ~ .x / n() * 100, .names = "{sub('count', 'percent', .col)}"))
# # A tibble: 5 × 5
#   col1  count_match count_nomatch percent_match percent_nomatch
#   <chr>       <int>         <int>         <dbl>           <dbl>
# 1 A               2             1          66.7            33.3
# 2 G               1             1          50              50
# 3 J               1             2          33.3            66.7
# 4 K               0             1           0             100  
# 5 Y               2             0         100               0

您可以通过几个步骤构建汇总表:

library(tidyverse)
library(scales)
d <- structure(list(col1 = c("A", "A", "A", "G", "G", "Y", "Y", "J", "J", "J", "K"), 
col2 = c("A", "A", "B", "G", "H", "Y", "Y", "P", "P", "J", "L")), class = "data.frame", row.names = c(NA, -11L))

d %>% 
mutate(match = col1 == col2, 
nomatch = !match) %>% 
group_by(col1) %>% 
summarise(count_match = sum(match),
count_nomatch = sum(nomatch)) %>% 
rowwise() %>% 
mutate(percent_match = count_match/sum(count_match, count_nomatch), 
percent_nomatch = 1 - percent_match) %>% 
mutate(across(starts_with("percent"), ~percent(.x))) %>%
ungroup()
#> # A tibble: 5 × 5
#>   col1  count_match count_nomatch percent_match percent_nomatch
#>   <chr>       <int>         <int> <chr>         <chr>          
#> 1 A               2             1 67%           33%            
#> 2 G               1             1 50%           50%            
#> 3 J               1             2 33%           67%            
#> 4 K               0             1 0%            100%           
#> 5 Y               2             0 100%          0%

由reprex包(v2.0.1)创建于2022-07-18

最新更新