我想通过比较r中数据框的两列来计算值。
例如:
col1 col2
A A
A A
A B
G G
G H
Y Y
Y Y
J P
J P
J J
K L
我希望得到一个输出,显示匹配的计数(如果两列有相同的值)和不匹配的计数(如果两列有不同的值),并显示匹配和不匹配的百分比在下一列
col1 count_match count_notmatch percent_match percent_notmatch
A 2 1 66.66% 33.33%
G 1 1 50.00% 50.00%
Y 2 0 100.00% 0
J 1 2 33.33% 66.66%
K 0 1 0 100%
我如何做到这一点?谢谢你的帮助。
您可以将数据按col1
和summarise()
分组:
library(dplyr)
df %>%
group_by(col1) %>%
summarise(count_match = sum(col1 == col2),
count_nomatch = n() - count_match,
across(contains("match"), ~ .x / n() * 100, .names = "{sub('count', 'percent', .col)}"))
# # A tibble: 5 × 5
# col1 count_match count_nomatch percent_match percent_nomatch
# <chr> <int> <int> <dbl> <dbl>
# 1 A 2 1 66.7 33.3
# 2 G 1 1 50 50
# 3 J 1 2 33.3 66.7
# 4 K 0 1 0 100
# 5 Y 2 0 100 0
您可以通过几个步骤构建汇总表:
library(tidyverse)
library(scales)
d <- structure(list(col1 = c("A", "A", "A", "G", "G", "Y", "Y", "J", "J", "J", "K"),
col2 = c("A", "A", "B", "G", "H", "Y", "Y", "P", "P", "J", "L")), class = "data.frame", row.names = c(NA, -11L))
d %>%
mutate(match = col1 == col2,
nomatch = !match) %>%
group_by(col1) %>%
summarise(count_match = sum(match),
count_nomatch = sum(nomatch)) %>%
rowwise() %>%
mutate(percent_match = count_match/sum(count_match, count_nomatch),
percent_nomatch = 1 - percent_match) %>%
mutate(across(starts_with("percent"), ~percent(.x))) %>%
ungroup()
#> # A tibble: 5 × 5
#> col1 count_match count_nomatch percent_match percent_nomatch
#> <chr> <int> <int> <chr> <chr>
#> 1 A 2 1 67% 33%
#> 2 G 1 1 50% 50%
#> 3 J 1 2 33% 67%
#> 4 K 0 1 0% 100%
#> 5 Y 2 0 100% 0%
由reprex包(v2.0.1)创建于2022-07-18