我试图找出如何从df2
列A
更新df1
列A
的新值的各种记录。df1
有2141列A
观测值,在Index
列中有唯一的ID。df2
有268个更新的列A
值,它们在Index
列中具有关联的唯一ID。我尝试使用merge()
,甚至一个简单的for循环与if语句没有运气,如:
for (i in 1:nrow(df1)){
if (df1$Index[i] == df2$Index[i]){
df1$A[i] <- df2$A[1]
}
}
我的两个数据帧的简化示例和我想要达到的结果:
df1:
Index A
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
df2:
Index A
1 2 85
2 3 46
3 6 79
4 7 64
5 10 40
Updated df1:
Index A
1 1 NA
2 2 85
3 3 46
4 4 NA
5 5 NA
6 6 79
7 7 64
8 8 NA
9 9 NA
10 10 40
我必须相信这很简单,但我不知道如何更新我的主数据框架。我在网上搜索的想法一直回到merge()
函数或类似的连接函数。谢谢你的帮助和指导。
试试这个
df <- data.frame(Index = 1:10 , A = NA)
for(i in 1:nrow(df)){
x <- which(i == df2$Index)
y <- which(i == df1$Index)
if(length(x) > 0) df$A[i] <- df2$A[x]
else if(length(y) > 0) df$A[i] <- df1$A[y]
else df$A[i] <- NA
}
输出Index A
1 1 NA
2 2 85
3 3 46
4 4 NA
5 5 NA
6 6 79
7 7 64
8 8 NA
9 9 NA
10 10 40
在dplyr中,您可以这样做:
df2 %>%
full_join(df1) %>%
group_by(Index) %>%
filter(ifelse((length(row_number()) > 1) & is.na(A), 1, 2) == 2) %>%
ungroup() %>%
arrange(-desc(Index))
假设每个df最多只有一个唯一的Index值
# A tibble: 10 × 2
Index A
<int> <int>
1 1 NA
2 2 85
3 3 46
4 4 NA
5 5 NA
6 6 79
7 7 64
8 8 NA
9 9 NA
10 10 40
这些是我能想到的最"简单"的方法;希望其中一个适合:
df1 <- read.table(text = " Index A
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA", header = TRUE)
df2 <- read.table(text = " Index A
1 2 85
2 3 46
3 6 79
4 7 64
5 10 40", header = TRUE)
library(dplyr)
full_join(df1, df2, by = "Index") %>%
mutate(A = coalesce(A.x, A.y)) %>%
select(Index, A)
#> Index A
#> 1 1 NA
#> 2 2 85
#> 3 3 46
#> 4 4 NA
#> 5 5 NA
#> 6 6 79
#> 7 7 64
#> 8 10 40
library(powerjoin)
power_full_join(df1, df2, by = "Index",
conflict = rw ~ ifelse(all(is.na(.x), is.na(.y)),
NA_integer_,
sum(.x, .y, na.rm = TRUE)))
#> Index A
#> 1 1 NA
#> 2 2 85
#> 3 3 46
#> 4 4 NA
#> 5 5 NA
#> 6 6 79
#> 7 7 64
#> 8 10 40
# Plyr method: note the order of dataframes is important
# this may not work on your 'real' data
library(plyr)
join(df2, df1, type = "full", by = "Index") %>%
arrange(Index)
#> Index A
#> 1 1 NA
#> 2 2 85
#> 3 3 46
#> 4 4 NA
#> 5 5 NA
#> 6 6 79
#> 7 7 64
#> 8 10 40
由reprex包(v2.0.1)创建于2022-06-30