我的数据看起来像:
patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)
我想对患者id进行分组,以便数据框中每行只有1名患者。
然后将其他行作为附加列(通过在末尾添加一个数字来命名)。因此数据帧将是patientid、weight1、hight1、bmi1、weight2、hight2、bmi2等。列的数量将对应于重复的患者id的数量。
我认为group_by和spread是关键函数,但我不明白。在本例中,患者id为101的行只有highight1、bmi1和weight1列的值,患者101的值为weight1、highight1、bmi1、weight2、highight2、bmi2、weight3、highight3、bmi3,患者102的值为weight1、highight1、bmi1、weight2、highight2、bmi2。
使用ave
+reshape
的base R选项
reshape(
transform(
df,
q = ave(patientid, patientid, FUN = seq_along)
),
direction = "wide",
idvar = "patientid",
timevar = "q"
)
为
patientid weight.1 height.1 bmi.1 weight.2 height.2 bmi.2 weight.3 height.3
1 100 1 0 0 NA NA NA NA NA
2 101 1 6 5 2 0 0 3 0
5 102 1 0 0 2 1 1 NA NA
bmi.3
1 NA
2 0
5 NA
也许,我们可以在通过'patientid'创建序列列后使用pivot_wider
library(tidyr)
library(data.table)
library(dplyr)
df1 %>%
mutate(rn = rowid(patientid)) %>%
pivot_wider(names_from = rn, values_from = c(weight, height, bmi),
names_sep="")
输出:# A tibble: 3 x 10
patientid weight1 weight2 weight3 height1 height2 height3 bmi1 bmi2 bmi3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 NA NA 0 NA NA 0 NA NA
2 101 1 2 3 6 0 0 5 0 0
3 102 1 2 NA 0 1 NA 0 1 NA
数据:
df1 <- data.frame(patientid, weight, height, bmi)
group_by和spread应该是tidyverse的一部分,我想。
我用基础重塑重塑你的数据,并使用重量作为测量id。
patientid <- c(100,101,101,101,102,102)
weight <- c(1,1,2,3,1,2)
height <- c(0,6,0,0,0,1)
bmi <- c(0,5,0,0,0,1)
cat("datan")
df <- data.frame(patientid = patientid,
n = weight,
weight = weight,
height = height,
bmi = bmi)
df
cat("reshaped to wid formatn")
reshape(data = df,
idvar = "patientid",
timevar = "n",
# c("weight", "height", "bmi"),
direction = "wide")
#?reshape()