关于用Gatherit/Spread在R中从LONG到WIDE重塑数据帧的说明



很抱歉回到Stack上有几个线程的主题上,但我正在尝试使用Tidyverse、Gather/Spread函数以及pivot_wider函数将数据集从LONG重塑为WIDE,我迷失了方向。这是我用来测试的子集的一个样本

structure(list(pid = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", 
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", 
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", 
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100", 
"101", "102", "103", "104", "105", "106", "107", "108", "109", 
"110", "111", "112", "113", "114", "115", "116", "117", "118", 
"119", "120", "121", "122", "123", "124", "125", "126", "127", 
"128", "129", "130", "131", "132", "133", "134", "135", "136", 
"137", "138", "139", "140", "141", "142", "143", "144", "145", 
"146", "147", "148", "149", "150", "151", "152", "153", "154", 
"155", "156", "157", "158", "159", "160", "161", "162", "163", 
"164", "165", "166", "167", "168", "169", "170", "171", "172", 
"173", "174", "175", "176", "177", "178", "179", "180", "181", 
"182", "183", "184", "185", "186", "187", "188", "189", "190", 
"191", "192", "193", "194", "195", "196", "197", "198", "199", 
"200", "201", "202", "203", "204", "205", "206", "207", "208", 
"209", "210", "211", "212", "213", "214", "215", "216"), class = "factor"), 
timewave = structure(c(1L, 2L, 3L, 4L, 5L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), dev_icd = structure(c(1L, 
1L, 1L, 1L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"), 
lab_bnp = c(388, 199, 387.5, 318, 154, 949.4)), row.names = c(NA, 
6L), class = "data.frame")

这里是我向提出的两个命令

test.wide2 <- test.long2 %>%
pivot_wider(id_cols = pid, 
names_from = timewave, 
values_from = c(dev_icd, lab_bnp), 
names_sep = "")

或者

test.wide <- test.long2 %>%
group_by(pid) %>%
gather("dev_icd", "lab_bnp", 
key = variable, value = number ) %>%
unite(combi, variable, timewave) %>%
spread(combi, number)

两者都没有像我预期的那样工作,我得到了很多NA或NULL值,不明白我的错误和正确的过程是什么。将非常感谢任何不仅在解决问题方面,而且在理解重塑逻辑/哲学方面的帮助

您需要更明确地说明您想要什么,我们只能假设。您不能期望任何不以最宽格式存在的值。我猜你想要这样的东西。

test.long2 %>%
pivot_wider(id_cols = c("pid", "timewave"), 
names_from = pid, 
values_from = c(dev_icd, lab_bnp), 
names_sep = "_pid")
# A tibble: 5 x 5
timewave dev_icd_pid1 dev_icd_pid2 lab_bnp_pid1 lab_bnp_pid2
<fct>    <fct>        <fct>               <dbl>        <dbl>
1 1        No           Yes                  388          949.
2 2        No           NA                   199           NA 
3 3        No           NA                   388.          NA 
4 4        No           NA                   318           NA 
5 5        No           NA                   154           NA 

感谢Merjin van Tiborg的帮助,我终于解决了这个问题。在dev-icd和lab_bnp的行和列上按时间波数重复PID的正确命令如下:

test.wide <- hf.longsmall %>%
pivot_wider(id_cols = c("pid", "timewave"), 
names_from = timewave, 
values_from = c(dev_icd, lab_bnp), 
names_sep = "_t")

这相当于下面的

test.wide1  <- hf.longsmall %>% 
group_by(pid, timewave) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = timewave, 
values_from = c(dev_icd, lab_bnp), 
names_sep = "_t") %>%
select(-row)

我收到以下警告-->quot;values_from中的值不是唯一标识的;输出将包含列表列";,这是由于在数据输入期间重复PID的真实(且危险(错误。无论如何,我只能使用上面报告的group_by选项来理解这个问题。

感谢大家的耐心

最新更新