r-删除在减号前后不出现相同术语的行



在下面的data.frameout中,第一列在-符号前后有两种terms

(A(相同单词出现在-符号前后的术语(例如第1行和第2行的Baseline(

(B(-符号前后出现NO相同单词的术语(例如第47行和第50行(

有没有办法创建一个函数来删除out数据帧中(B(类型的行?

library(emmeans)
dd <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/1.csv")
res1 <- lm(gi ~ teaching_level*time, data = dd)
out <- na.omit(data.frame(emmeans(res1, pairwise ~ teaching_level*time)[[2]])) 
out
#                                              contrast    estimate        SE df     t.ratio     p.value
#1                 elementary Baseline - mixed Baseline  0.15185787 0.2895842 59  0.52439968 0.999994441
#2             elementary Baseline - secondary Baseline -0.10316420 0.2494777 59 -0.41352074 0.999999536
.
.
#47       (secondary Post-test 1) - (mixed Post-test 2) -1.03135871 0.5588269 59 -1.84557815 0.786224904
.
#50      (secondary Post-test 1) - (mixed Post-test 3) -0.78350792 0.5588269 59 -1.40205835 0.958572283
.
.

我们可以在-后面的空格中将"contrast"列一分为二,然后提取每个拆分列中的单词,检查filter这些行是否有intersecting单词

library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
out %>% 
rownames_to_column('rn') %>%
as_tibble %>% 
separate(contrast, into = c('pre', 'post'), sep = "\s+-\s+", 
remove = FALSE) %>%
mutate(across(pre:post, ~ map(str_extract_all(., "[A-Za-z0-9-]+\s*\d*"), trimws))) %>%
filter(lengths(map2(pre, post, intersect)) > 0) %>% 
select(-pre, -post)

-输出

# A tibble: 17 × 7
rn    contrast                                           estimate    SE    df t.ratio p.value
<chr> <chr>                                                 <dbl> <dbl> <dbl>   <dbl>   <dbl>
1 1     elementary Baseline - mixed Baseline                 0.152  0.290    59  0.524  1.00   
2 2     elementary Baseline - secondary Baseline            -0.103  0.249    59 -0.414  1.00   
3 3     elementary Baseline - (elementary Post-test 1)      -0.869  0.205    59 -4.23   0.00433
4 12    mixed Baseline - secondary Baseline                 -0.255  0.306    59 -0.833  0.999  
5 14    mixed Baseline - (mixed Post-test 1)                -0.533  0.299    59 -1.78   0.822  
6 17    mixed Baseline - (mixed Post-test 2)                -1.61   0.588    59 -2.74   0.232  
7 20    mixed Baseline - (mixed Post-test 3)                -1.36   0.588    59 -2.32   0.475  
8 24    secondary Baseline - (secondary Post-test 1)        -0.326  0.245    59 -1.33   0.971  
9 27    secondary Baseline - (secondary Post-test 2)        -0.344  0.363    59 -0.945  0.998  
10 31    (elementary Post-test 1) - (mixed Post-test 1)       0.488  0.219    59  2.23   0.537  
11 32    (elementary Post-test 1) - (secondary Post-test 1)   0.440  0.200    59  2.20   0.557  
12 39    (mixed Post-test 1) - (secondary Post-test 1)       -0.0484 0.237    59 -0.204  1.00   
13 41    (mixed Post-test 1) - (mixed Post-test 2)           -1.08   0.566    59 -1.91   0.750  
14 44    (mixed Post-test 1) - (mixed Post-test 3)           -0.832  0.566    59 -1.47   0.943  
15 48    (secondary Post-test 1) - (secondary Post-test 2)   -0.0174 0.347    59 -0.0503 1      
16 57    (mixed Post-test 2) - (secondary Post-test 2)        1.01   0.620    59  1.64   0.889  
17 59    (mixed Post-test 2) - (mixed Post-test 3)            0.248  0.759    59  0.326  1.00   

注:使用OP后中的先前数据集


使用新数据

# A tibble: 3 × 7
rn    contrast                                       estimate    SE    df t.ratio p.value
<chr> <chr>                                             <dbl> <dbl> <dbl>   <dbl>   <dbl>
1 1     elementary Baseline - mixed Baseline              0.152 0.290    59   0.524 1.00   
2 2     elementary Baseline - secondary Baseline         -0.103 0.249    59  -0.414 1.00   
3 3     elementary Baseline - (elementary Post-test 1)   -0.869 0.205    59  -4.23  0.00433

它可以封装在一个函数中

f1 <- function(data, contrast_col) {
data %>%
as_tibble %>%
separate({{contrast_col}}, into = c('pre', 'post'), sep = "\s+-\s+", remove = FALSE) %>%
mutate(across(pre:post, ~ map(str_extract_all(., "[A-Za-z0-9-]+\s*\d*"), trimws))) %>% 
filter(lengths(map2(pre, post, intersect)) > 0) %>%
select(-pre, -post)

}

f1(out, contrast)
# A tibble: 3 × 6
contrast                                       estimate    SE    df t.ratio p.value
<chr>                                             <dbl> <dbl> <dbl>   <dbl>   <dbl>
1 elementary Baseline - mixed Baseline              0.152 0.290    59   0.524 1.00   
2 elementary Baseline - secondary Baseline         -0.103 0.249    59  -0.414 1.00   
3 elementary Baseline - (elementary Post-test 1)   -0.869 0.205    59  -4.23  0.00433

最新更新