我有两个数据集。一个比另一个长得多,但我想把较小数据集的纬度/经度放在较大的数据集上。
我的第一个数据集(长数据集(有一些错误的纬度和经度(它们实际上并没有说正确和错误,它们是坐标(:
country city latitude longitude
usa xyz wrong wrong
usa abc wrong wrong
usa iff correct correct
usa sfo correct correct
usa lax correct correct
第二个数据集较短,但包含正确的lat和lons,仅用于需要修复的坐标:
country city latitude longitude
usa xyz correct correct
usa abc correct correct
我基本上想得到以下:
country city latitude longitude
usa xyz correct correct
usa abc correct correct
usa iff correct correct
usa sfo correct correct
usa lax correct correct
需要明确的是,较小数据集中的所有纬度和经度都是正确的,因此应始终替换较大数据集中的值。
有什么建议吗?提前感谢您的帮助。
试试这些:
merged <- merge(original, fixes, by = c("country", "city"), all = TRUE)
merged
# country city latitude.x longitude.x latitude.y longitude.y
# 1 usa abc wrong wrong correct correct
# 2 usa iff correct correct <NA> <NA>
# 3 usa lax correct correct <NA> <NA>
# 4 usa sfo correct correct <NA> <NA>
# 5 usa xyz wrong wrong correct correct
merged <- transform(merged,
latitude = ifelse(is.na(latitude.y), latitude.x, latitude.y),
longitude = ifelse(is.na(longitude.y), longitude.x, longitude.y)
)[, c("country", "city", "latitude", "longitude")]
merged
# country city latitude longitude
# 1 usa abc correct correct
# 2 usa iff correct correct
# 3 usa lax correct correct
# 4 usa sfo correct correct
# 5 usa xyz correct correct
dplyr
library(dplyr)
original %>%
left_join(fixes, by = c("country", "city")) %>%
mutate(
latitude = if_else(is.na(latitude.y), latitude.x, latitude.y),
longitude = if_else(is.na(longitude.y), longitude.x, longitude.y)
) %>%
select(-contains("."))
# country city latitude longitude
# 1 usa xyz correct correct
# 2 usa abc correct correct
# 3 usa iff correct correct
# 4 usa sfo correct correct
# 5 usa lax correct correct
数据表
library(data.table)
originalDT <- as.data.table(original)
fixesDT <- as.data.table(fixes)
fixesDT[originalDT, on = c("country", "city")][
,c("latitude", "longitude") :=
.(fifelse(is.na(latitude), i.latitude, latitude),
fifelse(is.na(longitude), i.longitude, longitude)) ][
, .(country, city, latitude, longitude) ]
# country city latitude longitude
# 1: usa xyz correct correct
# 2: usa abc correct correct
# 3: usa iff correct correct
# 4: usa sfo correct correct
# 5: usa lax correct correct
或者,如果你喜欢data.table
的管道外观,那么
library(magrittr)
fixesDT[originalDT, on = c("country", "city")] %>%
.[, c("latitude", "longitude") :=
.(fifelse(is.na(latitude), i.latitude, latitude),
fifelse(is.na(longitude), i.longitude, longitude)) ] %>%
.[, .(country, city, latitude, longitude) ]