通过标题可以清楚地看到,我的逻辑已经过时了。所以我会尽我最大的努力明确我的目标。
我有10列,其中有2行,一行包含列名,另一行包含主题名。
1 2 3 4 5 6 7 8 9 10 #(Column Count)
Name1 --- --- Name2 --- --- Name3 --- --- Name4 #(Column Names)[Row1]
Topic1 Topic2 Topic3 Topic4 Topic5 Topic6 Topic7 Topic8 Topic9 Topic10 #(Topic Names)[Row2]
基本上,我想删除所有包含"---"的列,但将这些列下的值移动到最近的左侧未删除的列。所以想要的执行应该是这样的:
1 2 3 4
Name1 Name2 Name3 Name4
Topic1 Topic4 Topic7 Topic10
Topic2 Topic5 Topic8
Topic3 Topic6 Topic9
我们可以使用
library(zoo)
df2 <- transform(stack(df1),
ind = na.locf0(replace(ind, grepl('---', ind), NA)))
lst1 <- split(df2$values, as.character(df2$ind))
out <- do.call(cbind, lapply(lst1, `length<-`, max(lengths(lst1))))
out
# Name1 Name2 Name3 Name4
#[1,] "Topic1" "Topic4" "Topic7" "Topic10"
#[2,] "Topic2" "Topic5" "Topic8" NA
#[3,] "Topic3" "Topic6" "Topic9" NA
或者另一种选择是重塑为"长"格式,然后转换回"宽"格式
library(dplyr)
library(tidyr)
library(data.table)
df1 %>%
pivot_longer(everything()) %>%
mutate(name = na_if(name, "---")) %>%
fill(name) %>%
mutate(rn = rowid(name)) %>%
select(name, value, rn) %>%
pivot_wider(names_from = name, values_from = value) %>%
select(-rn)
# A tibble: 3 x 4
# Name1 Name2 Name3 Name4
# <chr> <chr> <chr> <chr>
#1 Topic1 Topic4 Topic7 Topic10
#2 Topic2 Topic5 Topic8 <NA>
#3 Topic3 Topic6 Topic9 <NA>
数据
df1 <- structure(list(Name1 = "Topic1", `---` = "Topic2", `---` = "Topic3",
Name2 = "Topic4", `---` = "Topic5", `---` = "Topic6", Name3 = "Topic7",
`---` = "Topic8", `---` = "Topic9", Name4 = "Topic10"), row.names = c(NA,
-1L), class = "data.frame")