r-在一个函数tidyverse样式中更改列中的多个字符串



对于以下数据帧

> df <- data.frame(Country = c("Republic of Ireland", "United Kingdom", "United States of America"))
# Country
# <chr>
# Republic of Ireland               
# United Kingdom                
# United States of America

有没有一种方法可以让我使用函数(tidyverse风格(来更改国家名称。我还希望能够引用数据框架中的特定列。

这就是我迄今为止所做的:

# c("Old name", "new name")
name_change = list(c("Republic of Ireland", "Ireland"), 
c("United Kingdom", "UK"),
c("Russia Moscow", "Russia"),
c("United States of America", "USA"))

name_change_func <- function(vec, data = c2, df_col = Country){
# Expecting vec c("Old name", "new name")
old_n <- vec[1]
new_n <- vec[2]
data %>% 
mutate(!!df_col = gsub(old_n, new_n, !!df_col ))
}
map_df(name_change, ~name_change_func(.x)) %>%
group_by(Country) %>%
filter(row_number(Country) == 1)

这是行不通的,但如果我们改变!!df_col直接到Country,它会起作用(有点像,会得到需要过滤掉的重复名称,我们实际上并没有像添加行那样更改名称(。

有办法解决这个问题吗?能够将函数参数用作函数内部的列
如果您知道更好的解决方案,可获得额外积分。

您可以使用命名向量来替换,该向量可以在str_replace_all中使用。

library(dplyr)
library(stringr)
#c("Old name" = "new name")
name_change = c("Republic of Ireland" = "Ireland", 
"United Kingdom" = "UK",
"Russia Moscow" = "Russia",
"United States of America" = "USA")
df %>% mutate(new_country = str_replace_all(Country, name_change))
#                   Country new_country
#1      Republic of Ireland     Ireland
#2           United Kingdom          UK
#3 United States of America         USA

一个替代方案是tidyverse中的case_when

library(dplyr)
df <- data.frame(Country = c("Republic of Ireland", "United Kingdom", "United States of America"))
df <- 
df %>% 
dplyr::mutate(NewCountry = 
case_when(
Country == "Republic of Ireland" ~ "Ireland",
Country == "United States of America" ~ "US",
Country == "United Kingdom" ~ "UK",
Country == "Russia Moscow" ~ "Russia"
)
)
#                    Country NewCountry
# 1      Republic of Ireland    Ireland
# 2           United Kingdom         UK
# 3 United States of America         US

最新更新