我遇到了一些精神块,试图匹配两个数据集之间的值。这是我的数据的摘录:
town <- c("Acworth", "Albany", "Amherst", "Bedford")
weight_factor <- c(0.432, 0.89, 1.3, 0.6777)
df1 <- data.frame(town, weight_factor)
以及此数据框:
name <- c("Peter", "Rob", "Gillian", "Matt", "Louise", "Eva", "Tom")
vote <- c("R", "D", "D", "I", "R", "D", "D")
home <- c("New York", "Florida", "Acworth", "London", "Toronto", "Porto", "Minsk")
weight_factor <- 1
df2 <- data.frame(name, vote, home, weight_factor)
想象一个类似的数据集,但在df1
中有大约300个观测值,df2
中约有10,000个观测值。我想做的是通过df2$home
变量进行grep,以查看是否与df1$town
的任何值匹配,如果是的,则将df2$weight_factor
的相应值替换为df1$weight_factor
的值。
因此,如果此代码正确执行,则df2$weight_factor
的新值应为:
1, 1, 0.432, 1, 1, 1, 1
我尝试使用带有grepl命令的循环中的if stategent实现此目的,但是这似乎不起作用,因为似乎需要i和j去做这个。任何帮助将不胜感激,谢谢!
首先,正确构建您的data.frames!它应该只是:
df1 <- data.frame(town, weight_factor)
和
df2 <- data.frame(name, vote, home, weight_factor)
没有理由使用cbind
(迫使矩阵,失去了每个向量的类别的信息),然后将其归因于data.frame
。使用以上两个数据。Frames,只需尝试:
ind<-match(df2$home,df1$town)
df2$weight_factor[!is.na(ind)]<-df1$weight_factor[ind[!is.na(ind)]]
# name vote home weight_factor
#1 Peter R New York 1.000
#2 Rob D Florida 1.000
#3 Gillian D Acworth 0.432
#4 Matt I London 1.000
#5 Louise R Toronto 1.000
#6 Eva D Porto 1.000
#7 Tom D Minsk 1.000