我有一个数据集,在细胞系、途径、药物列中有重复的行,但活性列有不同的输出。例如,在下面数据框的前两行中,除了活性外,从细胞、药物到途径的所有内容都是相同的,第一行在活性列中具有RESISTANT,第二行在活性栏中具有SENSITIVE。我希望保留第二行,该行在活动中具有SENSITIVE输出。
你能帮我怎么做吗。我想对数据帧中具有类似输出的所有行执行此操作,我想保留第二个重复的行。
**cell** **drug** **pathway** **activity**
AU656 5-FLORO OTHER RESISTANT
AU656 5-FLORO OTHER SENSITIVE
AU656 ALISERTIB MITOSIS INTERMEDIATE
AU656 ALISERTIB MITOSIS RESISTANT
AU656 AFITINIB EGFR SENSITIVE
AU656 AZD6482 PI3K INTERMEDIATE
AU656 DORAMAPIMOD JNK INTERMEDIATE
AU656 DORAMAPIMOD JNK SENSITIVE
我们根据细胞、药物、途径和slice
对第二行(如果存在(进行分组,取min
最小值为2和组大小(n()
(,因此对于组大小为1,它返回第一行
library(dplyr)
df1 %>%
group_by(cell, drug, pathway) %>%
slice(min(2, n())) %>%
ungroup
-输出
# A tibble: 5 × 4
cell drug pathway activity
<chr> <chr> <chr> <chr>
1 AU656 5-FLORO OTHER SENSITIVE
2 AU656 AFITINIB EGFR SENSITIVE
3 AU656 ALISERTIB MITOSIS RESISTANT
4 AU656 AZD6482 PI3K INTERMEDIATE
5 AU656 DORAMAPIMOD JNK SENSITIVE
数据
df1 <- structure(list(cell = c("AU656", "AU656", "AU656", "AU656", "AU656",
"AU656", "AU656", "AU656"), drug = c("5-FLORO", "5-FLORO", "ALISERTIB",
"ALISERTIB", "AFITINIB", "AZD6482", "DORAMAPIMOD", "DORAMAPIMOD"
), pathway = c("OTHER", "OTHER", "MITOSIS", "MITOSIS", "EGFR",
"PI3K", "JNK", "JNK"), activity = c("RESISTANT", "SENSITIVE",
"INTERMEDIATE", "RESISTANT", "SENSITIVE", "INTERMEDIATE", "INTERMEDIATE",
"SENSITIVE")), class = "data.frame", row.names = c(NA, -8L))