r-查找每列数据集列表的唯一级别



我有一个18个数据集的列表,每个数据集都有一些列,我如何写一个循环来通过列的索引找到交集,并返回列的索引列表。


df1 <- data.frame(id = c(1:5), loc = c("a","b","c","a","b"))
df2 <- data.frame(id = c(3:7), ta = c("c","b","d","a","b"))
df3 <- data.frame(id = c(1:5), az = c("d","a","e","d","b"))
df <- list(df1, df2, df3)
df <- lapply(df, function(i) lapply(i, function(j) as.character(j)))
intersect(df[[1]][1], df[[2]][1], df[[3]][1])
intersect(df[[1]][2], df[[2]][2], df[[3]][2])

使用tidyverse,我们可以使用map/reduce

library(purrr)
library(dplyr)
map(df, pull, 1) %>% 
reduce(intersect)
#[1] 3 4 5

或者作为一种功能

f1 <- function(lstA, ind) {
map(lstA, pull, ind) %>%
reduce(intersect)
}

f1(df, 1)
#[1] 3 4 5
f1(df, 2)
#[1] "a" "b"

您可以在intersect函数上使用Reduce,在sapply中使用[来选择子列表编号。

单一:

Reduce(intersect, sapply(df, `[`, 1))
# [1] "3" "4" "5"
Reduce(intersect, sapply(df, `[`, 2))
# [1] "a" "b"

或者全部:

lapply(1:2, function(i) Reduce(intersect, sapply(df, `[`, i)))
# [[1]]
# [1] "3" "4" "5"
# 
# [[2]]
# [1] "a" "b"

最新更新