我有一个类似于a的列,每个观察(歌曲)具有不同数量的元素(类型)。我可以在不指定R中的目标列的情况下拆分列吗?
列A | "("嘻哈","流行"、"流行说唱","r& b","嘻哈南部","陷阱","灵魂陷阱")"> |
---|
"("舞蹈流行","女孩集团","流行"、"post-teen流行","达人秀","英国流行")"> |
在base R
中,我们可以在去掉[
,]
和引号('
,"
)后使用read.csv
df2 <- read.csv(text = gsub('\[|\]|'|"', "", df1$ColumnA),
header = FALSE, na.strings = "", col.names = paste0("genre", 1:7))
与产出
df2
genre1 genre2 genre3 genre4 genre5 genre6 genre7
1 hip hop pop pop rap r&b southern hip hop trap trap soul
2 dance pop girl group pop post-teen pop talent show uk pop <NA>
第二个数据集可以使用上面输出的mtabulate
创建
library(qdapTools)
mtabulate(as.data.frame(t(df2)))
与产出
girl group pop pop rap post-teen pop r&b southern hip hop talent show trap trap soul uk pop dance pop hip hop
V1 0 1 1 0 1 1 0 1 1 0 0 1
V2 1 1 0 1 0 0 1 0 0 1 1 0
数据df1 <- structure(list(ColumnA = c("['hip hop', 'pop', 'pop rap', 'r&b',
'southern hip hop', 'trap', 'trap soul']",
"['dance pop', 'girl group', 'pop', 'post-teen pop', 'talent show', 'uk pop']"
)), class = "data.frame", row.names = c(NA, -2L))
不知道你想要的输出,但这里有一个想法:
df %>%
mutate(col_a = col_a %>% str_remove_all("\[") %>%
str_remove_all("\]") %>%
str_split(pattern = ", ")) %>%
unnest(col_a) %>%
count(col_a) %>%
pivot_wider(names_from = col_a, values_from = n)
# A tibble: 1 × 12
`'dance pop'` 'girl group…¹ 'hip …² `'pop'` 'pop …³ 'post…⁴ `'r&b'` 'sout…⁵ 'tale…⁶ 'trap…⁷ 'trap…⁸ 'uk p…⁹
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 2 1 1 1 1 1 1 1 1
# … with abbreviated variable names ¹`'girl group'`, ²`'hip hop'`, ³`'pop rap'`, ⁴`'post-teen pop'`,
# ⁵`'southern hip hop'`, ⁶`'talent show'`, ⁷`'trap'`, ⁸`'trap soul'`, ⁹`'uk pop'`