r-编写一个函数，将向量作为输入，丢弃不需要的值，消除重复，并返回原始向量的相应索引

我正试图编写一个函数，该函数接受一个向量，并根据以下几个步骤对其进行子集设置：

丢弃任何不需要的值
删除重复项
在考虑步骤(1(和(2(之后，返回原始向量的索引

例如，提供以下输入矢量：

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

和

throw_away_val <- "cat"

我希望我的函数get_indexes(x = vec_animals, y = throw_away_val)返回：

# [1] 1 6   # `1` is the index of the 1st unique ("dog") in `vec_animals`, `6` is the index of the 2nd unique ("dolphin")

另一个例子

vec_years <- c(2003, 2003, 2003, 2007, 2007, 2011, 2011, 2011)
throw_away_val <- 2003

# [1] 4 6 # `4` is the position of 1st unique (`2007`) after throwing away unwanted val; `6` is the position of 2nd unique (`2011`).

我的初次尝试

以下函数返回索引，但不考虑重复的

get_index <- function(x, throw_away) {
which(x != throw_away)
}

然后返回原始CCD_ 2的索引，例如：

get_index(vec_animals, "cat")
#> [1] 1 2 3 4 6 7

如果我们使用这个输出来子集vec_animal，我们得到：

vec_animals[get_index(vec_animals, "cat")]
#> [1] "dog"     "dog"     "dog"     "dog"     "dolphin" "dolphin"

您可以建议对该输出进行操作，例如：

vec_animals[get_index(vec_animals, "cat")] |> unique()
#> [1] "dog"     "dolphin"

但是不需要，我需要get_index()立即返回正确的索引(在本例中为1和6(。

编辑

提供了一个相关的程序，在该程序中，我们可以获得首次出现重复的索引

library(bit64)
vec_num <- as.integer64(c(4, 2, 2, 3, 3, 3, 3, 100, 100))
unipos(vec_num)
#> [1] 1 2 4 8

或者更普遍的

which(!duplicated(vec_num))
#> [1] 1 2 4 8

如果不需要扔掉不想要的价值观，这样的解决方案会很好。

尝试：

get_index <- function(x, throw_away) {
which(!duplicated(x) & x!=throw_away)
}
> get_index(vec_animals, "cat")
[1] 1 6

这里有一个简单的自写函数，它提供了所需的信息。

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
get_indexes <- function(x, throw_away){
elements <- (unique(x))[(unique(x)) != throw_away]
index <- lapply(1:length(elements), function(i) {which(x %in% elements[i]) })
index2return <- c()
for (j in 1:length(index)) {
index2return <- c(index2return, min(index[[j]]))
}
return(index2return)
}
get_indexes(x = vec_animals, throw_away = "cat")
[1] 1 6

我的方法：

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
throw_away_val <- "cat"
my_function <- function(x, y) {
my_df <- data.frame("Origin" = x,
"Position" = seq.int(from = 1, to = length(x), by = 1),
stringsAsFactors = FALSE)
my_var <- which(my_df$Origin %in% y)
if (length(my_var)) {
my_df <- my_df[-my_var,]
}
my_df <- my_df[!duplicated(my_df$Origin),]
return (my_df)
}
my_df <- my_function(vec_animals, throw_away_val)

我的初次尝试

相关内容

最新更新

热门标签：