我有一个参考表:
ref <- data.frame("Strong"=c("A","A","B","B","C","C","D"),
"Medium"=c("A","B","B","C","C","D","D"),
"Moderate"=c("B","C","C","C","D","D","D"),
"Weak"=c("C","C","D","D","D","D","D"))
rownames(ref) <- c("WS1","WS2","WS3","WS4","WS5","WS6","WS7")
还有一个大dataframe
(示例如下(:
df <- data.frame("Rad"=c("Weak","Weak","Weak","Moderate","Moderate"), "Wind"=c("WS4","WS3","WS3","WS2","WS4"))
我需要从参考表中查找Wind
值和df
Rad
值ref
。为此,我使用以下代码检索索引,然后使用这些索引值从ref
复制值:
df$x <- apply(df,1,function(x){which(colnames(ref) == df[x,"Rad"])})
df$x <- apply(df,1,function(x){which(colnames(ref) == x$Rad)})
df$y <- apply(df,1,function(x){which(rownames(ref) == df[x,"Wind"])})
df$y <- apply(df,1,function(x){which(rownames(ref) == x$Wind)})
预期输出如下所示:
Rad Wind PG
1 Weak WS4 D
2 Weak WS3 D
3 Weak WS3 D
4 Moderate WS2 C
5 Moderate WS4 C
上面的代码有效,但存在问题:
- 我不应该写"两次"行,但是如果我只运行第二行,代码将无法运行。
- 第一行没有达到预期的效果(而且不应该这样做,因为语法是错误的(,但是,同样,如果我不先使用"失败"的第二行,第二行将无法运行
- 最后,虽然这有效,但我很确定还有其他更简单的方法可以完成我正在做的事情。任何提示将不胜感激!
另一种
使用 data.table
的方法。应该快速运行,即使在大型数据集上也是如此。通过@IceCreamToucan使用与解决方案相同的逻辑,但保留在data.table
.
解释:使用熔融ref
表,对df
执行更新联接。
library( data.table )
setDT(df)[ melt( setDT( ref, keep.rownames = TRUE ), id.vars = "rn" ),
PG := i.value,
on = .( Wind == rn, Rad == variable )][]
# Rad Wind PG
# 1: Weak WS4 D
# 2: Weak WS3 D
# 3: Weak WS3 D
# 4: Moderate WS2 C
# 5: Moderate WS4 C
我们可以分别用RAD
和WIND
列以及ref
的子集match
rownames
和colnames
ref
。
df$PG <- ref[cbind(match(df$Wind, rownames(ref)), match(df$Rad, colnames(ref)))]
df
# Rad Wind PG
#1 Weak WS4 D
#2 Weak WS3 D
#3 Weak WS3 D
#4 Moderate WS2 C
#5 Moderate WS4 C
library(tidyverse)
library(data.table) # for melt
ref_long <-
ref %>%
rownames_to_column('row') %>%
melt('row')
df %>%
left_join(ref_long, by = c('Rad' = 'variable', 'Wind' = 'row'))
# Rad Wind value
# 1 Weak WS4 D
# 2 Weak WS3 D
# 3 Weak WS3 D
# 4 Moderate WS2 C
# 5 Moderate WS4 C