R:如果某个值出现在更左边的列中,而不是出现在更右边的列中的另一个值,则从矩阵中筛选行

  • 本文关键字:另一个 筛选 右边 左边 如果 r subset
  • 更新时间 :
  • 英文 :


我有一个矩阵,想做以下操作:

  1. 删除所有包含";Z";不止一次
  2. 删除所有包含至少两次出现"0"的行;S〃;在直接相邻列中
  3. 删除其中";2D";不仅存在一次;1D";不是或不是只出现在更左边的列中一次(列编号较低(

这是一个MWE,带有解释:

x <- matrix(c(
# Point 1:
"Z", "1D", "Z", "S",  # Delete row because Z is present more than once.
# Point 2:
"S", "S", "Z", "1D", # Delete row because S is present at least twice and in columns following each other directly.
"S", "Z", "S", "1D", # Ok because "S" is present multiple times but there is at least one column between the occurrences.
# Point 3:
"1D", "Z", "2D", "1D", # 1D is followed by a later "2D" which is correct, but another "1D" follows after "2D", so delete this row.
"Z", "S", "2D", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
"2D", "1D", "Z", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
"1D", "Z", "S", "2D", # Valid row
"1D", "2D", "S", "Z"), # Valid row 
nrow = 8, byrow = TRUE)
# Possible solution for removing columns with multiple occurences of "Z"
require(matrixStats)
x <- x[!rowCounts(x, value = "Z")>1, ]

第二点和第三点怎么做?

您可以尝试使用此自定义函数:

apply_rules <- function(y) {
rule1 <- sum(grepl('Z', y)) > 1
rule2 <- any(with(rle(grepl('S', y)), values & lengths > 1))
d1 <- which(y == '1D')
d2 <- which(y == '2D')
rule3 <- length(d1) < 1 || any(d1 > d2)
rule1 || rule2 || rule3
}
apply(x, 1, apply_rules)
#[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
x[!apply(x, 1, apply_rules), ]
#    [,1] [,2] [,3] [,4]
#[1,] "S"  "Z"  "S"  "1D"
#[2,] "1D" "Z"  "S"  "2D"
#[3,] "1D" "2D" "S"  "Z" 

更新我没有注意到你的评论,我们可以在一行中有一个1D,所以我做了一些修改,输出正是你所期望的:

library(dplyr)
x %>%
as_tibble(names_repair = 'unique') %>%
rowwise() %>%
mutate(Sum_Z = sum(c_across(everything()) == "Z"), 
col = paste0(V1, V2, V3, V4), 
SS_exist = grepl("S{2,}", col),
both_1D_2D = grepl("1D", col) & grepl("2D", col),
`1D after 2D` = grepl("2D1D", col),
`1D` = grepl("1D", col)) %>%
filter(Sum_Z <= 1, !SS_exist, `1D`, !`1D after 2D`, both_1D_2D || `1D`) %>%
select(V1:V4) %>%
as.matrix(dimnames = NULL)

V1   V2   V3  V4  
[1,] "S"  "Z"  "S" "1D"
[2,] "1D" "Z"  "S" "2D"
[3,] "1D" "2D" "S" "Z" 

最新更新