的新手
我正在尝试将类型添加到我的genres
集中。但是,我的流派集得到了NULL
。
功能:
install.packages("sets"); library(sets)
genres = set()
find_all_genres = function(genres_string) {
if (genres_string == "N/A") {
return(NA)
}
genres_list = strsplit(genres_string, ",\s+")[[1]]
for (genre in genres_list) {
genres = genres | set(genre)
}
}
sapply(df2$Genre, FUN = find_all_genres)
样本:
> head(df2$Genre)
[1] "Documentary, Biography, Romance" "Short, Thriller" "Documentary" "Drama, Romance" "War, Short"
[6] "Documentary, Biography"
预期的输出将仅是:
的线条genres = {"Action", "Drama", "Comedy"}
当然还有更多流派。
另外,我如何加快功能?我是R
使用 scan
在中读取它,并删除unique
。g
在末尾的注释中给出。没有包装。
unique(scan(text = g, what = "", sep = ",", na.strings = "N/A",
strip.white = TRUE, quiet = TRUE))
给予:
[1] "Documentary" "Biography" "Romance" "Short" "Thriller"
[6] "Drama" "War"
如果要排序,则使用sort
。
功能
如果要添加一些以前的值,将整个内容作为函数编写:
add <- function(...) {
unique(scan(text = c(...), what = "", sep = ",", na.strings = "N/A",
strip.white = TRUE, quiet = TRUE))
}
# examples
g_split <- add(g)
G <- c("Drama", "Comedy")
G <- add(G, g)
注意
可再现形式的输入是:
g <- c("Documentary, Biography, Romance", "Short, Thriller", "Documentary",
"Drama, Romance", "War, Short", "Documentary, Biography")