添加一列,其中的值取决于另一列中的值是否与四个矢量之一匹配

  • 本文关键字:一列 四个 一匹 取决于 添加 是否 r
  • 更新时间 :
  • 英文 :


我有如下数据:

library(stringi)
datfake <- as.data.frame(runif(100, 0, 3000))
names(datfake)[1] <- "Inc"
datfake$type <- sample(LETTERS, 100, replace = TRUE)
datfake$province <- stri_rand_strings(100, 1, "[A-P]")
region_south <- c("A", "B", "C", "D")
region_north <- c("E", "F", "G", "H", "I")
region_east <- c("J", "K", "L")
region_west <- c("M", "N", "O", "P")

编辑:

在我的实际数据中,区域如下:

region_north <- c("Drenthe", "Friesland", "Groningen")
region_east <- c("Flevoland", "Gelderland", "Overijssel")
region_west <- c("Zeeland", "Noord-Holland", "Utrecht", "Zuid-Holland")
region_south <- c("Limburg", "Noord-Brabant")

我想添加一个专栏,告诉我每个省份的原因。我提出的所有解决方案都有点笨拙(例如,将向量region_south变成两列数据帧,第二列表示south,然后合并(。做这件事最简单的方法是什么?

期望输出:

Inc      type province region
1  297.7387         C        J   east
2 2429.0961         E        D  south

一个想法是使用mget来获取区域,取消列出并利用命名的矢量对象,将值与省份匹配并返回名称,即

v1 <- unlist(mget(ls(.GlobalEnv, pattern = 'region_')))
res <- names(v1)[match(datfake$province, v1)]
gsub('region_(.+)[0-9]+','\1' ,res)

[1] "north" "east"  "north" "north" "south" "south" "south" "west"  "west"  "east"  "south" "south" "west"  "north" "north" "south" "east"  "north" "south" "east"  "north" "west" 
[23] "south" "west"  "north" "west"  "east"  "north" "east"  "south" "south" "east"  "south" "west"  "north" "east"  "west"  "south" "south" "east"  "north" "west"  "west"  "south"
[45] "north" "east"  "south" "west"  "north" "south" "east"  "west"  "north" "north" "north" "south" "north" "south" "north" "north" "west"  "north" "north" "south" "west"  "north"
[67] "east"  "south" "north" "west"  "south" "west"  "north" "north" "north" "south" "north" "east"  "west"  "south" "west"  "north" "west"  "east"  "north" "west"  "south" "east" 
[89] "north" "west"  "north" "north" "west"  "south" "west"  "north" "west"  "west"  "south" "west"

我们可以在这里使用case_whengrepl

library(dplyr)
df$region <- case_when(
grepl(paste0("^[", paste(region_north, collapse=""), "]$"), df$province) ~ "north",
grepl(paste0("^[", paste(region_south, collapse=""), "]$"), df$province) ~ "south",
grepl(paste0("^[", paste(region_east, collapse=""), "]$"), df$province) ~ "east",
grepl(paste0("^[", paste(region_west, collapse=""), "]$"), df$province) ~ "west"
)

最新更新