r-根据正则表达式的出现,在数据框架之间有条件匹配的变量值



我遇到了一些精神块,试图匹配两个数据集之间的值。这是我的数据的摘录:

town <- c("Acworth", "Albany", "Amherst", "Bedford")
weight_factor <- c(0.432, 0.89, 1.3, 0.6777)
df1 <- data.frame(town, weight_factor)

以及此数据框:

name <- c("Peter", "Rob", "Gillian", "Matt", "Louise", "Eva", "Tom")
vote <- c("R", "D", "D", "I", "R", "D", "D")
home <- c("New York", "Florida", "Acworth", "London", "Toronto", "Porto", "Minsk")
weight_factor <- 1
df2 <- data.frame(name, vote, home, weight_factor)

想象一个类似的数据集,但在df1中有大约300个观测值,df2中约有10,000个观测值。我想做的是通过df2$home变量进行grep,以查看是否与df1$town的任何值匹配,如果是的,则将df2$weight_factor的相应值替换为df1$weight_factor的值。

因此,如果此代码正确执行,则df2$weight_factor的新值应为:

1, 1, 0.432, 1, 1, 1, 1

我尝试使用带有grepl命令的循环中的if stategent实现此目的,但是这似乎不起作用,因为似乎需要i和j去做这个。任何帮助将不胜感激,谢谢!

首先,正确构建您的data.frames!它应该只是:

df1 <- data.frame(town, weight_factor)

df2 <- data.frame(name, vote, home, weight_factor)

没有理由使用cbind(迫使矩阵,失去了每个向量的类别的信息),然后将其归因于data.frame。使用以上两个数据。Frames,只需尝试:

ind<-match(df2$home,df1$town)
df2$weight_factor[!is.na(ind)]<-df1$weight_factor[ind[!is.na(ind)]]
#     name vote     home weight_factor
#1   Peter    R New York         1.000
#2     Rob    D  Florida         1.000
#3 Gillian    D  Acworth         0.432
#4    Matt    I   London         1.000
#5  Louise    R  Toronto         1.000
#6     Eva    D    Porto         1.000
#7     Tom    D    Minsk         1.000

最新更新