在下面的data.frameout
中,第一列在-
符号前后有两种terms
:
(A(相同单词出现在-
符号前后的术语(例如第1行和第2行的Baseline
(
(B(在-
符号前后出现NO相同单词的术语(例如第47行和第50行(
有没有办法创建一个函数来删除out
数据帧中(B(类型的行?
library(emmeans)
dd <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/1.csv")
res1 <- lm(gi ~ teaching_level*time, data = dd)
out <- na.omit(data.frame(emmeans(res1, pairwise ~ teaching_level*time)[[2]]))
out
# contrast estimate SE df t.ratio p.value
#1 elementary Baseline - mixed Baseline 0.15185787 0.2895842 59 0.52439968 0.999994441
#2 elementary Baseline - secondary Baseline -0.10316420 0.2494777 59 -0.41352074 0.999999536
.
.
#47 (secondary Post-test 1) - (mixed Post-test 2) -1.03135871 0.5588269 59 -1.84557815 0.786224904
.
#50 (secondary Post-test 1) - (mixed Post-test 3) -0.78350792 0.5588269 59 -1.40205835 0.958572283
.
.
我们可以在-
后面的空格中将"contrast"列一分为二,然后提取每个拆分列中的单词,检查filter
这些行是否有intersect
ing单词
library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
out %>%
rownames_to_column('rn') %>%
as_tibble %>%
separate(contrast, into = c('pre', 'post'), sep = "\s+-\s+",
remove = FALSE) %>%
mutate(across(pre:post, ~ map(str_extract_all(., "[A-Za-z0-9-]+\s*\d*"), trimws))) %>%
filter(lengths(map2(pre, post, intersect)) > 0) %>%
select(-pre, -post)
-输出
# A tibble: 17 × 7
rn contrast estimate SE df t.ratio p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 elementary Baseline - mixed Baseline 0.152 0.290 59 0.524 1.00
2 2 elementary Baseline - secondary Baseline -0.103 0.249 59 -0.414 1.00
3 3 elementary Baseline - (elementary Post-test 1) -0.869 0.205 59 -4.23 0.00433
4 12 mixed Baseline - secondary Baseline -0.255 0.306 59 -0.833 0.999
5 14 mixed Baseline - (mixed Post-test 1) -0.533 0.299 59 -1.78 0.822
6 17 mixed Baseline - (mixed Post-test 2) -1.61 0.588 59 -2.74 0.232
7 20 mixed Baseline - (mixed Post-test 3) -1.36 0.588 59 -2.32 0.475
8 24 secondary Baseline - (secondary Post-test 1) -0.326 0.245 59 -1.33 0.971
9 27 secondary Baseline - (secondary Post-test 2) -0.344 0.363 59 -0.945 0.998
10 31 (elementary Post-test 1) - (mixed Post-test 1) 0.488 0.219 59 2.23 0.537
11 32 (elementary Post-test 1) - (secondary Post-test 1) 0.440 0.200 59 2.20 0.557
12 39 (mixed Post-test 1) - (secondary Post-test 1) -0.0484 0.237 59 -0.204 1.00
13 41 (mixed Post-test 1) - (mixed Post-test 2) -1.08 0.566 59 -1.91 0.750
14 44 (mixed Post-test 1) - (mixed Post-test 3) -0.832 0.566 59 -1.47 0.943
15 48 (secondary Post-test 1) - (secondary Post-test 2) -0.0174 0.347 59 -0.0503 1
16 57 (mixed Post-test 2) - (secondary Post-test 2) 1.01 0.620 59 1.64 0.889
17 59 (mixed Post-test 2) - (mixed Post-test 3) 0.248 0.759 59 0.326 1.00
注:使用OP后中的先前数据集
使用新数据
# A tibble: 3 × 7
rn contrast estimate SE df t.ratio p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 elementary Baseline - mixed Baseline 0.152 0.290 59 0.524 1.00
2 2 elementary Baseline - secondary Baseline -0.103 0.249 59 -0.414 1.00
3 3 elementary Baseline - (elementary Post-test 1) -0.869 0.205 59 -4.23 0.00433
它可以封装在一个函数中
f1 <- function(data, contrast_col) {
data %>%
as_tibble %>%
separate({{contrast_col}}, into = c('pre', 'post'), sep = "\s+-\s+", remove = FALSE) %>%
mutate(across(pre:post, ~ map(str_extract_all(., "[A-Za-z0-9-]+\s*\d*"), trimws))) %>%
filter(lengths(map2(pre, post, intersect)) > 0) %>%
select(-pre, -post)
}
f1(out, contrast)
# A tibble: 3 × 6
contrast estimate SE df t.ratio p.value
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 elementary Baseline - mixed Baseline 0.152 0.290 59 0.524 1.00
2 elementary Baseline - secondary Baseline -0.103 0.249 59 -0.414 1.00
3 elementary Baseline - (elementary Post-test 1) -0.869 0.205 59 -4.23 0.00433