我想从一个数据帧中获取州名,并根据县提示代码将其放入另一个数据帧中。下面是两个
的代码片段> head(arr)
state_abb fips_state_county_code
1 CO 8001
2 CO 8001
3 CO 8001
4 CO 8001
5 CO 8001
6 CO 8001
> head(countyname)
county countyname
245 8001 Adams County
246 8003 Alamosa County
247 8005 Arapahoe County
248 8007 Archuleta County
249 8009 Baca County
250 8011 Bent County
当我使用left_join
时,我得到了大量的重复项
xcountyname= left_join(arr, countyname, by =c("county" = "fips_state_county_code"))
> head(xcountyname)
county countyname state_abb
1 8001 Adams County CO
2 8001 Adams County CO
3 8001 Adams County CO
4 8001 Adams County CO
5 8001 Adams County CO
6 8001 Adams County CO
它应该只添加state列,但现在它已经从535行增加到超过71k行。是否有更好的方法来做到这一点,以便状态只填充xcountyname数据帧的现有行?
编辑:arr是另一个数据集的子集,这就是为什么有重复。但我仍然只希望将带有州名的行与xcountyname
中的现有提示代码一起传输更新:
单态abb:
xcountyname= left_join(arr, countyname, by =c("fips_state_county_code"="county")) %>%
distinct()
state_abb fips_state_county_code countyname
1 CO 8001 Adams County
所有状态abb:
xcountyname= full_join(arr, countyname, by =c("fips_state_county_code"="county")) %>%
distinct()
state_abb fips_state_county_code countyname
1 CO 8001 Adams County
2 <NA> 8003 Alamosa County
3 <NA> 8005 Arapahoe County
4 <NA> 8007 Archuleta County
5 <NA> 8009 Baca County
6 <NA> 8011 Bent County
第一次回答:做一个full_join
:
如果要删除重复项,只需在最后一行添加distinct()
:
library(dplyr)
xcountyname= full_join(arr, countyname, by =c("fips_state_county_code"="county"))
state_abb fips_state_county_code countyname
1 CO 8001 Adams County
2 CO 8001 Adams County
3 CO 8001 Adams County
4 CO 8001 Adams County
5 CO 8001 Adams County
6 CO 8001 Adams County
7 <NA> 8003 Alamosa County
8 <NA> 8005 Arapahoe County
9 <NA> 8007 Archuleta County
10 <NA> 8009 Baca County
11 <NA> 8011 Bent County
每个提示只需要一个状态abb,所以使用distinct()
left_join(
distinct(arr %>% select(state_abb, county=fips_state_county_code)),
countyname
)