是否有一种方法可以使R中的此foreach循环使文本更换更有效



很高兴将答案点授予可以帮助我矢量化此过程的人。我想搜索以查看是否缺少一个城市名称如果确实丢失了城市名称,请贴上丢失的城市名称。

假设我有这样的数据:

df <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr", "16221 north freeway"))

我想要这样的数据:

df.desired <- data.frame(X=c(1:5), Houston.Addresses=c("548 w 19th st, houston, tx", "6611 Portwest Dr. #190, houston, tx", "3555 Timmons Ln Ste 300, Houston, TX, 77027-6466", "3321 Westpark Dr, houston, tx", "16221 north freeway, houston, tx"))

我当前的方法在大型数据集上效率非常低,我敢肯定有一个矢量化。有人可以协助此循环的矢量化吗?:

foreach(i=1:nrow(df))%do%{
  t <- tolower(df[i,"Houston.Addresses"])
  x <- grepl("houston", t)
  if(!isTRUE(x)){
    df[i, "Houston.Addresses" ] <- 
      paste0(df[i, "Houston.Addresses" ], ", houston, tx")
    }
}

预先感谢!

而不是通过每行运行,我们使用grep(即vectorized(创建一个逻辑索引,然后分配'houston.addresses'的元素,该元素对应于索引'i1'(转换为character类后(通过paste的子字符串

i1 <- !grepl("houston", tolower(df$Houston.Addresses))
df$Houston.Addresses <- as.character(df$Houston.Addresses)
df$Houston.Addresses[i1] <- paste0(df$Houston.Addresses[i1], ", houston, tx")

如果我们想提高效率,我们可以使用data.table进行分配(:=(

library(data.table)
setDT(df)[, Houston.Addresses := as.character(Houston.Addresses)
            ][!grepl("houston", tolower(Houston.Addresses)),
                 Houston.Addresses := paste0(Houston.Addresses, ", houston, tx")]

另一种建议使用 ifelse

df$Houston.Addresses <- ifelse(grepl("houston", df$Houston.Addresses, ignore.case=TRUE), 
    paste0(df$Houston.Addresses, ", Houston, TX"), 
    df$Houston.Addresses)

最新更新