id first middle last Age
1 Carol Jenny Smith 15
2 Sarah Carol Roberts 20
3 Josh David Richardson 22
我正在尝试在任何名称列(名字、中间名、姓氏(中找到特定名称。例如,如果我找到一个名字为 Carol 的人(不管它是名字/中间名/姓氏(,我想改变一列"Carol"并给出 1。所以我想要的是以下内容
id first middle last Age Carol
1 Carol Jenny Smith 15 1
2 Sarah Carol Roberts 20 1
3 Josh David Richardson 22 0
我一直在努力ifelse(c(first, middle, last( == "Carol" , 1, 0 (或"卡罗尔"%in%首先...等但由于某种原因,我只能在一列而不是多列上工作。.谁能帮我?提前谢谢你!
我们可以使用rowSums
df$Carol <- as.integer(rowSums(df[2:4] == "Carol") > 0)
df
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
如果我们需要它作为一个函数
fun <- function(df, value) {
as.integer(rowSums(df[2:4] == value) > 0)
}
fun(df, "Carol")
#[1] 1 1 0
fun(df, "Sarah")
#[1] 0 1 0
但这假设您要搜索的列位于位置 2:4
。
使列位置具有更大的灵活性
fun <- function(df, cols, value) {
as.integer(rowSums(df[cols] == value) > 0)
}
fun(df, c("first", "last","middle"), "Carol")
#[1] 1 1 0
fun(df, c("first", "last","middle"), "Sarah")
#[1] 0 1 0
这是一个tidyverse
选项。我们首先将数据重塑为长格式,按id
分组,并在至少一行中找到具有所需名称的id
级别。然后我们重新塑造回宽格式。
library(tidyverse)
df %>%
gather(key, value, first:last) %>%
group_by(id) %>%
mutate(Carol = as.numeric(any(value=="Carol"))) %>%
spread(key, value)
id Age Carol first last middle 1 1 15 1 Carol Smith Jenny 2 2 20 1 Sarah Roberts Carol 3 3 22 0 Josh Richardson David
或者,作为一个函数:
find.target = function(data, target) {
data %>%
gather(key, value, first:last) %>%
group_by(id) %>%
mutate(!!target := as.numeric(any(value==target))) %>%
spread(key, value) %>%
# Move new target column to end
select(-target, target)
}
find.target(df, "Carol")
find.target(df, "Sarah")
您也可以一次执行多个操作。例如:
map(c("Sarah", "Carol", "David"), ~ find.target(df, .x)) %>%
reduce(left_join)
id Age first last middle Sarah Carol David 1 1 15 Carol Smith Jenny 0 1 0 2 2 20 Sarah Roberts Carol 1 1 0 3 3 22 Josh Richardson David 0 0 1
使用 tidyverse
library(tidyverse)
f1 <- function(data, wordToCompare, colsToCompare) {
wordToCompare <- enquo(wordToCompare)
data %>%
select(colsToCompare) %>%
mutate(!! wordToCompare := map(., ~
.x == as_label(wordToCompare)) %>%
reduce(`|`) %>%
as.integer)
}
f1(df1, Carol, c("first", 'middle', 'last'))
# first middle last Carol
#1 Carol Jenny Smith 1
#2 Sarah Carol Roberts 1
#3 Josh David Richardson 0
f1(df1, Sarah, c("first", 'middle', 'last'))
# first middle last Sarah
#1 Carol Jenny Smith 0
#2 Sarah Carol Roberts 1
#3 Josh David Richardson 0
<小时 />或者这也可以通过pmap
来完成
df1 %>%
mutate(Carol = pmap_int(.[c('first', 'middle', 'last')],
~ +('Carol' %in% c(...))))
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
可以包装到函数中
f2 <- function(data, wordToCompare, colsToCompare) {
wordToCompare <- enquo(wordToCompare)
data %>%
mutate(!! wordToCompare := pmap_int(.[colsToCompare],
~ +(as_label(wordToCompare) %in% c(...))))
}
f2(df1, Carol, c("first", 'middle', 'last'))
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
注意:两种整齐的方法都不需要任何重塑
<小时 />使用 base R
,我们可以遍历"第一"、"中间"、"最后"列,并使用==
进行比较,以获得逻辑vector
的list
,我们将其Reduce
为具有|
的单个逻辑vector
,并将其强制为二进制+
df1$Carol <- +(Reduce(`|`, lapply(df1[2:4], `==`, 'Carol')))
df1
# id first middle last Age Carol
#1 1 Carol Jenny Smith 15 1
#2 2 Sarah Carol Roberts 20 1
#3 3 Josh David Richardson 22 0
<小时 />注意:这篇文章有欺骗性。 例如在这里
数据
df1 <- structure(list(id = 1:3, first = c("Carol", "Sarah", "Josh"),
middle = c("Jenny", "Carol", "David"), last = c("Smith",
"Roberts", "Richardson"), Age = c(15L, 20L, 22L)),
class = "data.frame", row.names = c(NA,
-3L))
使用apply
族的解决方案
df$Carol = lapply(1:nrow(df), function(x) any(df[x,]=="Carol))
按照您的建议使用 mutate
和 if_else()
的另一个选项:
library(tidyverse)
data = read_table(" id first middle last Age
1 Carol Jenny Smith 15
2 Sarah Carol Roberts 20
3 Josh David Richardson 22")
data %>%
mutate(carol = if_else(first == "Carol" | middle == "Carol" | last == "Carol",
"yes",
"no"))
结果:
# A tibble: 3 x 6
id first middle last Age carol
<dbl> <chr> <chr> <chr> <dbl> <chr>
1 1 Carol Jenny Smith 15 yes
2 2 Sarah Carol Roberts 20 yes
3 3 Josh David Richardson 22 no