我有一个具有以下结构的数据集:
研究 | 治疗 | 严重程度people_with_pain | >1ample_size | |
---|---|---|---|---|
0001 | 扑热息痛 | |||
0001 | 阿司匹林 | 7.0<10><20>|||
0001 | 按摩 | 10.2 | 20 | >21 |
0002 | 扑热息痛 | |||
0002 | 阿司匹林 | 6.0 | 。 | |
0003 | 按摩 | <2.0>10 | 25 | |
0003 | 扑热息痛 | 3.5 | <10><25>
这是我使用tidyverse的解决方案。我不完全确定这是否是你想要的,因为我的结果和你的预期结果之间存在一些差异。然而,为了获得treatment
的所有组合,我使用了数据帧本身的left_join
,然后使用了基本上消除了所有处理对重复的filter(treatment.x < treatment.y)
。
library(dplyr)
library(tidyr)
dat |>
left_join(dat, by = "study") |>
filter(treatment.x < treatment.y) |>
unite("treatment", starts_with("treatment"), sep = "-") |>
group_by(treatment) |>
summarize({
severity <- c(severity.x, severity.y)
people_with_pain <- c(people_with_pain.x, people_with_pain.y)
sample_size <- c(sample_size.x, sample_size.y)
data.frame(severity_mean = mean(severity),
severity_sd = sd(severity),
severity_median = median(severity),
severity_IQR = IQR(severity),
people_with_pain = sum(people_with_pain)/sum(sample_size),
nstudies = length(unique(study)))
})
##> treatment severity_mean severity_sd severity_median severity_IQR
##> 1 aspirin-massage 8.600 2.2627417 8.60 1.600
##> 2 aspirin-paracetamol 6.000 0.8164966 6.00 0.500
##> 3 massage-paracetamol 5.175 3.5668614 4.25 3.175
##> people_with_pain nstudies
##> 1 0.7317073 1
##> 2 NA 2
##> 3 0.5473684 2