R-功能范围之外的更新集

我正在尝试将类型添加到我的genres集中。但是，我的流派集得到了NULL。

功能：

install.packages("sets"); library(sets)
genres = set()
find_all_genres = function(genres_string) {
  if (genres_string == "N/A") {
    return(NA)
  }
  genres_list = strsplit(genres_string, ",\s+")[[1]]
  for (genre in genres_list) {
    genres = genres | set(genre)
  }
}
sapply(df2$Genre, FUN = find_all_genres)

样本：

> head(df2$Genre)
[1] "Documentary, Biography, Romance" "Short, Thriller"                 "Documentary"                     "Drama, Romance"                  "War, Short"                     
[6] "Documentary, Biography"

预期的输出将仅是：

的线条

genres = {"Action", "Drama", "Comedy"}

当然还有更多流派。

另外，我如何加快功能？我是R

的新手

使用 scan在中读取它，并删除unique。g在末尾的注释中给出。没有包装。

unique(scan(text = g, what = "", sep = ",", na.strings = "N/A", 
  strip.white = TRUE, quiet = TRUE))

给予：

[1] "Documentary" "Biography"   "Romance"     "Short"       "Thriller"   
[6] "Drama"       "War"

如果要排序，则使用sort。

功能

如果要添加一些以前的值，将整个内容作为函数编写：

add <- function(...) {
    unique(scan(text = c(...), what = "", sep = ",", na.strings = "N/A", 
      strip.white = TRUE, quiet = TRUE))
}
# examples
g_split <- add(g)
G <- c("Drama", "Comedy")
G <- add(G, g)

注意

可再现形式的输入是：

g <- c("Documentary, Biography, Romance", "Short, Thriller", "Documentary", 
  "Drama, Romance", "War, Short", "Documentary, Biography")

功能

注意

相关内容

最新更新

热门标签：