如何一起提取和总和字符元素在一起



我有一个包含字符串的列表(是dataframe中的列),例如:

list("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", 
    NA_character_, NA_character_, "4 pieces of tissue, the largest measuring 4 x 2 x 2m", 
    "2 pieces of tissue, the larger measuring 4 x 2 x 2 m", c("4 pieces of tissue, the largest measuring 5 x 4 x 2 m", 
    "4 pieces of tissue, the largest measuring 6 x 2 x 1 m", 
    "4 pieces of tissue, the largest measuring 4 x 3 x 1 m"), 
    NA_character_, c("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", 
    "4 pieces of tissue, the largest measuring 5 x 2 x 2 m", 
    "4 pieces of tissue, the largest measuring 4 x 2 x 1 m"), 
    NA_character_, "4 pieces of tissue, the largest measuring 8 x 2 x 2m")

此列表是从该行生成的

x$NumbOfBx <- str_extract_all(x[,y], "([A-Za-z]*|[0-9]) (specimens|pieces).*?(([0-9]).*?x.*?([0-9]).*?x.*?([0-9])).*?([a-z])")作为下面功能的一部分

我想为列表中每个元素提取组织碎片数量的总和。我一直在尝试:

function(x,y) {
  x<-data.frame(x)
      x$NumbOfBx <- str_extract_all(x[,y], "([A-Za-z]*|[0-9]) (specimens|pieces).*?(([0-9]).*?x.*?([0-9]).*?x.*?([0-9])).*?([a-z])")
      x$NumbOfBx <- sapply(x$NumbOfBx, function(x) sum(as.numeric(unlist(str_extract_all(x$NumbOfBx, "^\d+")))))
  x$NumbOfBxs <- unlist(x$NumbOfBx)
  x$NumbOfBx <- as.numeric(str_extract(x$NumbOfBx, "^.*?\d"))
  return(x)
}

,但我得到了错误

Error in x$NumbOfBx : $ operator is invalid for atomic vectors

data

L <- list("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", 
NA_character_, NA_character_, "4 pieces of tissue, the largest measuring 4 x 2 x 2m", 
"2 pieces of tissue, the larger measuring 4 x 2 x 2 m", c("4 pieces of tissue, the largest measuring 5 x 4 x 2 m", 
"4 pieces of tissue, the largest measuring 6 x 2 x 1 m", 
"4 pieces of tissue, the largest measuring 4 x 3 x 1 m"), 
NA_character_, c("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", 
"4 pieces of tissue, the largest measuring 5 x 2 x 2 m", 
"4 pieces of tissue, the largest measuring 4 x 2 x 1 m"), 
NA_character_, "4 pieces of tissue, the largest measuring 8 x 2 x 2m")

一个衬里基础R解决方案

sapply(L, function(x) sum(as.numeric(substr(x, regexpr("\d+(?= pieces of tissue)", x, perl=TRUE, useBytes=TRUE),
                                               regexpr("\d+(?= pieces of tissue)", x, perl=TRUE, useBytes=TRUE)))))

输出

4 NA NA  4  2 12 NA 12 NA  4

类似的东西?简而言之,假设您的数据是列表,则可以提取单词样本|样品之前的数字值,将其转换为数字,然后汇总列表中每个向量中发现的计数。这是您提出的策略,并进行了一些修改...

# Assuming your list is defined as my.list
xtr.pieces <- function(ml) {
  my.sums <- lapply(ml, (function(el){
    sum (sapply(el, (function(tmp){
      if (!is.na(tmp)) {
        loc <- regexpr("[0-9]{1,2}.{0,3}[sample|specimen]", tmp)
        if (loc > 0) {
          tmp <- substr(tmp, loc, loc + attributes(loc)$match.length)
          as.numeric(gsub("[^[:digit:]]", "", tmp))
        }
      } else {
        0
      }
    })))
  }))
  return (my.sums)
}

nas在这里被计为0。您可以执行,然后得到:

unlist(xtr.pieces(ml))
[1]  4  0  0  4  2 12  0 12  0  4

最新更新