我正在尝试通过切换一些单元格信息来重塑数据集。下面是我的示例数据集的外观。
data <- data.frame(var1 = c("Text","A","B","C","D"),
var2 = c("Text",NA, 1,0,1),
var3 = c("112-1",NA,NA,"text",NA),
var4 = c("Text",1,0,NA, NA),
var5 = c("113-1",NA,"text",NA,NA))
> data
var1 var2 var3 var4 var5
1 Text Text 112-1 Text 113-1
2 A <NA> <NA> 1 <NA>
3 B 1 <NA> 0 text
4 C 0 text <NA> <NA>
5 D 1 <NA> <NA> <NA>
它首先需要一些清洁。var1
有item
信息。var2
和var4
都有分数信息。var3
和var5
在第一行有id
信息。 我需要重塑这个数据集,如下所示。
> data.1
id A B C D
1 112 NA 1 0 1
2 113 1 0 NA NA
考虑到此数据文件在具有相同模式的多列中(例如,具有更多列 var6,var7,var8,var9,.etc),如何重塑为这个所需的数据集?
这与我昨天的回答没有太大区别,但这会给你你想要的结果。将第一行移到一列上,使 id 与所需值位于同一列上,删除不必要的列,然后将第一行设为列名。添加一些透视,然后它应该大致是您需要的:
data <- data.frame(var1 = c("Text","A","B","C","D"), var2 = c("Text",NA, 1,0,1), var3 = c("112",NA,NA,NA,NA), var4 = c("Text",1,0,NA, NA), var5 = c(113,NA,NA,NA,NA))
library(dplyr)
library(tidyr)
data2<-data%>%
mutate_all(as.character) #Making character to avoid factor issues
data2[1, 2:(ncol(data2) - 1)] <- data2[1, 3:ncol(data2)] #Shifting first row over one column
data3<-data2%>%
select(-var3,-var5) #Removing the uneeded columns
colnames(data3) <- data3[1,] #Taking the first row and making it the column names
data3 <- data3[-1, ] #removing row 1, since it was made into column names
data3%>%
tidyr::pivot_longer(-Text, names_to = "id", values_to = "time")%>% #Making the data into longer format
tidyr::pivot_wider(names_from = Text, values_from = time) #Then back into wide
您可以移动第一行,删除,列%% 2和t
ranspose。
data[1, ] <- data[1, -1]
data <- data[c(TRUE, seq_len(ncol(data))[-1] %% 2 == 0)]
setNames(as.data.frame(t(data[, -1]), row.names=FALSE), c('id', data[[1]][-1])) |>
type.convert(as.is=TRUE)
# id A B C D
# 1 112-1 NA 1 0 1
# 2 113-1 1 0 NA NA
顺便说一句,你如何获得这些数据?也许你有一个x-y问题。
library(dplyr)
library(tidyr)
library(stringr)
#First rename the columns to more appropriate
n = 2 #Number of pairs of columns you have (here 2)
nam <- do.call(paste0, (expand.grid(c("n_", "id_"), seq(n))))
colnames(data) <- c("col", nam)
#Then, the data manipulation
data %>%
mutate(across(starts_with("id"), ~ first(str_remove(.x, "-")))) %>%
fill(starts_with("id")) %>%
slice(-1) %>%
pivot_longer(-col, names_to = c(".value", "rn"), names_sep = "_") %>%
pivot_wider(names_from = "col", values_from = 'n') %>%
select(-rn)
id A B C D
1 1121 NA 1 0 1
2 1131 1 0 NA NA