R-功能范围之外的更新集

  • 本文关键字:更新 功能 范围 r set
  • 更新时间 :
  • 英文 :


我正在尝试将类型添加到我的genres集中。但是,我的流派集得到了NULL

功能:

install.packages("sets"); library(sets)
genres = set()
find_all_genres = function(genres_string) {
  if (genres_string == "N/A") {
    return(NA)
  }
  genres_list = strsplit(genres_string, ",\s+")[[1]]
  for (genre in genres_list) {
    genres = genres | set(genre)
  }
}
sapply(df2$Genre, FUN = find_all_genres)

样本:

> head(df2$Genre)
[1] "Documentary, Biography, Romance" "Short, Thriller"                 "Documentary"                     "Drama, Romance"                  "War, Short"                     
[6] "Documentary, Biography"  

预期的输出将仅是:

的线条
genres = {"Action", "Drama", "Comedy"}

当然还有更多流派。

另外,我如何加快功能?我是R

的新手

使用 scan在中读取它,并删除uniqueg在末尾的注释中给出。没有包装。

unique(scan(text = g, what = "", sep = ",", na.strings = "N/A", 
  strip.white = TRUE, quiet = TRUE))

给予:

[1] "Documentary" "Biography"   "Romance"     "Short"       "Thriller"   
[6] "Drama"       "War" 

如果要排序,则使用sort

功能

如果要添加一些以前的值,将整个内容作为函数编写:

add <- function(...) {
    unique(scan(text = c(...), what = "", sep = ",", na.strings = "N/A", 
      strip.white = TRUE, quiet = TRUE))
}
# examples
g_split <- add(g)
G <- c("Drama", "Comedy")
G <- add(G, g)

注意

可再现形式的输入是:

g <- c("Documentary, Biography, Romance", "Short, Thriller", "Documentary", 
  "Drama, Romance", "War, Short", "Documentary, Biography")

最新更新