基本上我正在尝试为电影类型创建一个矢量。电影类型的每个元素包含一个或多个单词。我的问题是我们能单独的单词,这样每个索引包含一个单词?
[17]"动作、剧情、推理";动作、犯罪、惊悚&;
[19]动作、科幻、惊悚&;传记、犯罪、戏剧
[21]动作、冒险、戏剧&;动作、冒险、奇幻">
[23]动作、剧情、科幻"犯罪,剧情,
[25]动作,科幻,冒险、戏剧、科幻&;
[27]犯罪、戏剧、推理&;动作、犯罪、剧情
[29]剧情、恐怖、科幻&;动作、犯罪、剧情
[31]喜剧、音乐&;喜剧、剧情、惊悚电影
[33]喜剧、剧情电影"《犯罪,戏剧》
[35]《戏剧,西方》;犯罪、剧情
[37]动作、冒险、剧情动作、冒险、惊悚
这是单个向量的输出。我的问题是如果我要创建一个向量,每个索引包含单个单词我能怎么做?
如果我理解正确的话
x <- c("Action, Drama, Mystery ", "Action, Sci-Fi, Thriller ", "Action, Adventure, Drama ",
"Action, Drama, Sci-Fi ", "Action, Sci-Fi ", "Crime, Drama, Mystery ",
"Drama, Horror, Sci-Fi ", "Comedy, Music ", "Comedy, Drama ",
"Drama, Western ", "Action, Adventure, Drama ", "Action, Crime, Thriller ",
"Biography, Crime, Drama ", "Action, Adventure, Fantasy ", "Crime, Drama ",
"Adventure, Drama, Sci-Fi ", "Action, Crime, Drama ", "Action, Crime, Drama ",
"Comedy, Drama, Thriller ", "Crime, Drama ", "Crime, Drama ",
"Action, Adventure, Thriller ")
trimws(unique(unlist(sapply(x, strsplit, split = ", "))))
#> [1] "Action" "Drama" "Mystery" "Sci-Fi" "Thriller" "Adventure"
#> [7] "Drama" "Sci-Fi" "Crime" "Horror" "Comedy" "Music"
#> [13] "Western" "Biography" "Fantasy"
由reprex包(v2.0.1)在2018-10-14上创建
因为它只是一个矢量,而不是一个数据帧,所以没有必要使用sapply
或类似的东西,但这就足够了:
unlist(strsplit(trimws(x), ", "))