从 r 中未构造的数据集重塑形状

  • 本文关键字:数据集 r reshape
  • 更新时间 :
  • 英文 :


我正在尝试通过切换一些单元格信息来重塑数据集。下面是我的示例数据集的外观。

data <- data.frame(var1 = c("Text","A","B","C","D"),
var2 = c("Text",NA, 1,0,1),
var3 = c("112-1",NA,NA,"text",NA),
var4 = c("Text",1,0,NA, NA),
var5 = c("113-1",NA,"text",NA,NA))
> data
var1 var2  var3 var4  var5
1 Text Text 112-1 Text 113-1
2    A <NA>  <NA>    1  <NA>
3    B    1  <NA>    0  text
4    C    0  text <NA>  <NA>
5    D    1  <NA> <NA>  <NA>

它首先需要一些清洁。var1item信息。var2var4都有分数信息。var3var5在第一行有id信息。 我需要重塑这个数据集,如下所示。

> data.1
id  A B  C  D
1 112 NA 1  0  1
2 113  1 0 NA NA

考虑到此数据文件在具有相同模式的多列中(例如,具有更多列 var6,var7,var8,var9,.etc),如何重塑为这个所需的数据集?

这与我昨天的回答没有太大区别,但这会给你你想要的结果。将第一行移到一列上,使 id 与所需值位于同一列上,删除不必要的列,然后将第一行设为列名。添加一些透视,然后它应该大致是您需要的:

data <- data.frame(var1 = c("Text","A","B","C","D"), var2 = c("Text",NA, 1,0,1), var3 = c("112",NA,NA,NA,NA), var4 = c("Text",1,0,NA, NA), var5 = c(113,NA,NA,NA,NA))
library(dplyr)
library(tidyr)
data2<-data%>%
mutate_all(as.character) #Making character to avoid factor issues
data2[1, 2:(ncol(data2) - 1)] <- data2[1, 3:ncol(data2)] #Shifting first row over one column
data3<-data2%>%
select(-var3,-var5) #Removing the uneeded columns
colnames(data3) <- data3[1,] #Taking the first row and making it the column names
data3 <- data3[-1, ] #removing row 1, since it was made into column names

data3%>%
tidyr::pivot_longer(-Text, names_to = "id", values_to = "time")%>% #Making the data into longer format
tidyr::pivot_wider(names_from = Text, values_from = time) #Then back into wide

您可以移动第一行,删除,列%% 2和transpose。

data[1, ] <- data[1, -1]
data <- data[c(TRUE, seq_len(ncol(data))[-1] %% 2 == 0)]
setNames(as.data.frame(t(data[, -1]), row.names=FALSE), c('id', data[[1]][-1])) |>
type.convert(as.is=TRUE)
#      id  A B  C  D
# 1 112-1 NA 1  0  1
# 2 113-1  1 0 NA NA

顺便说一句,你如何获得这些数据?也许你有一个x-y问题。

library(dplyr)
library(tidyr)
library(stringr)
#First rename the columns to more appropriate
n = 2 #Number of pairs of columns you have (here 2)
nam <- do.call(paste0, (expand.grid(c("n_", "id_"), seq(n))))
colnames(data) <- c("col", nam)
#Then, the data manipulation
data %>% 
mutate(across(starts_with("id"), ~ first(str_remove(.x, "-")))) %>% 
fill(starts_with("id")) %>% 
slice(-1) %>% 
pivot_longer(-col, names_to = c(".value", "rn"), names_sep = "_") %>% 
pivot_wider(names_from = "col", values_from = 'n') %>% 
select(-rn)
id    A     B     C     D    
1 1121  NA    1     0     1    
2 1131  1     0     NA    NA   

最新更新