这是我的数据的一个子集:
structure(list(First.Name = c(5006L, 5006L, 5007L, 5007L, 5008L,
5009L), Session = c("Post", "Pre", "Post", "Pre", NA, "Post"),
RHR = c(65.2352941176471, 60, 62.5882352941176, 63, 63.4,
48.6060606060606), HRV = c(79.1470588235294, 73.5, 91.4117647058823,
80.5555555555556, 102.4, 146.606060606061), Hours.in.Bed = c(6.76441176470588,
6.325, 5.98058823529412, 4.86, 6.503, 5.43787878787879),
Hours.of.Sleep = c(5.88058823529412, 5.59833333333333, 4.89117647058824,
3.93666666666667, 5.933, 5.10484848484848), Sleep.Disturbances = c(6.85294117647059,
6.66666666666667, 4.52941176470588, 3.55555555555556, 5.2,
2.93939393939394), Latency.min = c(6.96558823529412, 3.31333333333333,
3.77411764705882, 2.81333333333333, 2.88, 2.90424242424242
), Cycles = c(5.73529411764706, 5.83333333333333, 3.23529411764706,
2.22222222222222, 5, 3.33333333333333), REM.Sleep.hours = c(1.42970588235294,
1.55, 0.466470588235294, 0.413333333333333, 1.42, 0.698181818181818
), Deep.Sleep.hours = c(0.612058823529412, 0.55, 1.17352941176471,
0.972222222222222, 0.68, 1.73909090909091), Light.Sleep.hours = c(3.83647058823529,
3.49666666666667, 3.25058823529412, 2.55111111111111, 3.835,
2.66636363636364), Awake.hours = c(0.881764705882353, 0.723333333333333,
1.08764705882353, 0.92, 0.568, 0.333030303030303), Missing.Data.hours = c(0,
0, 0, 0, 0, 0), Respiratory.Rate = c(NaN, NaN, NaN, NaN,
NaN, NaN), Year_Day = c(147.852941176471, 127.5, 145.117647058824,
129.888888888889, 130.5, 146), Week_Year = c(21.5588235294118,
18.6666666666667, 21.1764705882353, 19, 19.1, 21.2727272727273
)), row.names = c(NA, -6L), groups = structure(list(First.Name = 5006:5009,
.rows = structure(list(1:2, 3:4, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
看起来像:
First.Name Session RHR HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturbances Latency.min Cycles REM.Sleep.hours Deep.Sleep.hours Light.Sleep.hou~
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5006 Post 65.2 79.1 6.76 5.88 6.85 6.97 5.74 1.43 0.612 3.84
2 5006 Pre 60 73.5 6.32 5.60 6.67 3.31 5.83 1.55 0.55 3.50
3 5007 Post 62.6 91.4 5.98 4.89 4.53 3.77 3.24 0.466 1.17 3.25
4 5007 Pre 63 80.6 4.86 3.94 3.56 2.81 2.22 0.413 0.972 2.55
5 5008 NA 63.4 102. 6.50 5.93 5.2 2.88 5 1.42 0.68 3.84
6 5009 Post 48.6 147. 5.44 5.10 2.94 2.90 3.33 0.698 1.74 2.67
我正试图根据Session
列减去所有列中的特定行。具体来说,每个First.Name
ID的Post
-Pre
跨列。但是,有些id缺少Pre
或Post
值,或者两者都缺少。
例如:主题5006
的RHR
列将是Post - Pre
或65.2-60
,依此类推。
我试过各种
DF %>%
group_by(First.Name) %>%
summarise(RHR[Session == "Post"] - RHR[Session == "Pre"])
但是我确信有一种方法可以使用总结或应用函数,而不需要为值的差异改变新列。帮助感激。
也许可以使用summary across来一次获取所有行。
summarize(df%>%group_by(First.Name), across(RHR:Week_Year, function(x) {return(x[1]-x[2])}))
First.Name RHR HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturbances Latency.min Cycles
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5006 5.24 5.65 0.439 0.282 0.186 3.65 -0.0980
2 5007 -0.412 10.9 1.12 0.955 0.974 0.961 1.01
3 5008 NA NA NA NA NA NA NA
4 5009 NA NA NA NA NA NA NA
我们也可以使用diff
library(dplyr)
DF %>%
group_by(First.Name) %>%
summarise(across(RHR:Week_Year, ~ -diff(.)[1]))
与产出
# A tibble: 4 x 16
# First.Name RHR HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturban… Latency.min Cycles REM.Sleep.hours Deep.Sleep.hours Light.Sleep.hou…
#* <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 5006 5.24 5.65 0.439 0.282 0.186 3.65 -0.0980 -0.120 0.0621 0.340
#2 5007 -0.412 10.9 1.12 0.955 0.974 0.961 1.01 0.0531 0.201 0.699
#3 5008 NA NA NA NA NA NA NA NA NA NA
#4 5009 NA NA NA NA NA NA NA NA NA NA
# … with 5 more variables: Awake.hours <dbl>, Missing.Data.hours <dbl>, Respiratory.Rate <dbl>, Year_Day <dbl>, Week_Year <dbl>