我有一个看起来像这样的数据帧,我想比较book_id1和book_id2,并计算"之间并用逗号分隔的字符串数量
id1 id2 book_id1 numberofbook_id1 book_id2 numberofbook_id2
1 2 ["19167120","237494310","195166798"] 3 ["19167120","237494310"] 2
1 3 ["19167120","237494310","195166798"] 3 [] 0
2 3 ["19167120","237494310"] 2 [] 0
我想作为输出的内容是这样的:
id1 id2 book_id1 numberofbook_id1 book_id2 numberofbook_id2 count
1 2 ["19167120","237494310","195166798"] 3 ["19167120","237494310"] 2 2
1 3 ["19167120","237494310","195166798"] 3 [] 0 0
2 3 ["19167120","237494310"] 2 [] 0 0
提前谢谢你
如果要获取匹配字符串的数量
library(stringr)
count <- sapply(Map(intersect,str_extract_all(df$book_id1, '\d+'),
str_extract_all(df$book_id2, '\d+')), length)
count
#[1] 2 0 0
transform(df, count=count)
或者,如果您只需要计数,
nchar(gsub('[^,]+', '',df$book_id1))+1
#[1] 3 3 2
count <- nchar(gsub('[^,]+', '',df$book_id2))
transform(df, count= ifelse(count==1, count+1, 0))
# id1 id2 book_id1 numberofbook_id1
#1 1 2 ["19167120","237494310","195166798"] 3
#2 1 3 ["19167120","237494310","195166798"] 3
#3 2 3 ["19167120","237494310"] 2
# book_id2 numberofbook_id2 count
#1 ["19167120","237494310"] 2 2
#2 [] 0 0
#3 [] 0 0
数据
df <- structure(list(id1 = c(1L, 1L, 2L), id2 = c(2L, 3L, 3L), book_id1 =
c("["19167120","237494310","195166798"]",
"["19167120","237494310","195166798"]", "["19167120","237494310"]"
), numberofbook_id1 = c(3L, 3L, 2L), book_id2 = c("["19167120","237494310"]",
"[]", "[]"), numberofbook_id2 = c(2L, 0L, 0L)), .Names = c("id1",
"id2", "book_id1", "numberofbook_id1", "book_id2", "numberofbook_id2"
), class = "data.frame", row.names = c(NA, -3L))