按变量拆分列并创建新列 R



我正在尝试使用答案第一个问题拆分下面的列。现在,我正在使用这封信在 df 中创建新列。我想在名称之前使用字母作为新列名称。在下面的例子中,G,D,W,C,UTIL。 由于类别GFirst Person名称之间只有"空格"等,因此我挠头如何才能将类别G以及名字和姓氏分开并将它们加入适当的列。

library(stringr)
test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", "G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))
1 G First Person D Another Last W Fake Name C Test Another UTIL Another Test
2 G Fake Name W Another Fake D Third person UTIL Another Name C Name Another
test$G <- str_split_fixed(test$Lineup, " ", 2)

结果:

G
G

有希望的结果:

G             D            W              C             UTIL    
First Person  Another Last  Fake Name      Test Another  Another Test
Fake Name     Third Person  Another Fake   Name Another  Another Name

以下是使用tidyverse的一种方法:

# example data
test <- data.frame(Lineup = c("G First Person D Another Last W Fake  Name C Test Another UTIL Another Test", 
"G Fake Name W Another Fake D Third person UTIL Another Name C Name Another "))
library(tidyverse)
# create a dataset of words and info about
# their initial row id
# whether they should be a column in our new dataset
# group to join on
dt_words = test %>%
mutate(id = row_number()) %>%
separate_rows(Lineup) %>%
mutate(is_col = Lineup %in% c(LETTERS, "UTIL"),
group = cumsum(is_col))
# get the corresponding values of your new dataset
dt_values = dt_words %>%
filter(is_col == FALSE) %>%
group_by(group, id) %>%
summarise(values = paste0(Lineup, collapse = " "))
# get the columns of your new dataset
# join corresponding values
# reshape data
dt_words %>%
filter(is_col == TRUE) %>%
select(-is_col) %>%
inner_join(dt_values, by=c("group","id")) %>%
select(-group) %>%
spread(Lineup, values) %>%
select(-id)
#    C            D            G            UTIL            W
# 1  Test Another Another Last First Person Another Test    Fake Name
# 2 Name Another  Third person    Fake Name Another Name Another Fake

请注意,此处的假设是始终有一个大写字母来拆分值,这些大写字母将成为新数据集中的列。

最新更新