计算 R 中数据帧中的字符串匹配数



我有一个看起来像这样的数据帧,我想比较book_id1和book_id2,并计算"之间并用逗号分隔的字符串数量

id1 id2 book_id1                      numberofbook_id1 book_id2          numberofbook_id2
 1   2  ["19167120","237494310","195166798"]    3      ["19167120","237494310"]   2
 1   3  ["19167120","237494310","195166798"]    3      []                         0
 2   3  ["19167120","237494310"]               2       []                         0

我想作为输出的内容是这样的:

id1 id2 book_id1                     numberofbook_id1 book_id2          numberofbook_id2    count
 1   2  ["19167120","237494310","195166798"]    3      ["19167120","237494310"]   2            2
 1   3  ["19167120","237494310","195166798"]    3      []                         0            0
 2   3   ["19167120","237494310"]               2      []                         0            0

提前谢谢你

如果要获取匹配字符串的数量

 library(stringr)
 count <- sapply(Map(intersect,str_extract_all(df$book_id1, '\d+'),
         str_extract_all(df$book_id2, '\d+')), length)
 count
 #[1] 2 0 0
 transform(df, count=count)

或者,如果您只需要计数,

nchar(gsub('[^,]+', '',df$book_id1))+1
#[1] 3 3 2
count <- nchar(gsub('[^,]+', '',df$book_id2))
transform(df, count= ifelse(count==1, count+1, 0))
#    id1 id2                             book_id1 numberofbook_id1
#1   1   2 ["19167120","237494310","195166798"]                3
#2   1   3 ["19167120","237494310","195166798"]                3
#3   2   3             ["19167120","237494310"]                2
#                   book_id2 numberofbook_id2 count
#1 ["19167120","237494310"]                2     2
#2                       []                0     0
#3                       []                0     0
   

数据

df <- structure(list(id1 = c(1L, 1L, 2L), id2 = c(2L, 3L, 3L), book_id1 =
 c("["19167120","237494310","195166798"]", 
"["19167120","237494310","195166798"]", "["19167120","237494310"]"
), numberofbook_id1 = c(3L, 3L, 2L), book_id2 = c("["19167120","237494310"]", 
"[]", "[]"), numberofbook_id2 = c(2L, 0L, 0L)), .Names = c("id1", 
"id2", "book_id1", "numberofbook_id1", "book_id2", "numberofbook_id2"
 ), class = "data.frame", row.names = c(NA, -3L))

相关内容

  • 没有找到相关文章

最新更新