R根据特定的标准对数据进行分组并减去所有列中的行

这是我的数据的一个子集:

structure(list(First.Name = c(5006L, 5006L, 5007L, 5007L, 5008L, 
5009L), Session = c("Post", "Pre", "Post", "Pre", NA, "Post"), 
RHR = c(65.2352941176471, 60, 62.5882352941176, 63, 63.4, 
48.6060606060606), HRV = c(79.1470588235294, 73.5, 91.4117647058823, 
80.5555555555556, 102.4, 146.606060606061), Hours.in.Bed = c(6.76441176470588, 
6.325, 5.98058823529412, 4.86, 6.503, 5.43787878787879), 
Hours.of.Sleep = c(5.88058823529412, 5.59833333333333, 4.89117647058824, 
3.93666666666667, 5.933, 5.10484848484848), Sleep.Disturbances = c(6.85294117647059, 
6.66666666666667, 4.52941176470588, 3.55555555555556, 5.2, 
2.93939393939394), Latency.min = c(6.96558823529412, 3.31333333333333, 
3.77411764705882, 2.81333333333333, 2.88, 2.90424242424242
), Cycles = c(5.73529411764706, 5.83333333333333, 3.23529411764706, 
2.22222222222222, 5, 3.33333333333333), REM.Sleep.hours = c(1.42970588235294, 
1.55, 0.466470588235294, 0.413333333333333, 1.42, 0.698181818181818
), Deep.Sleep.hours = c(0.612058823529412, 0.55, 1.17352941176471, 
0.972222222222222, 0.68, 1.73909090909091), Light.Sleep.hours = c(3.83647058823529, 
3.49666666666667, 3.25058823529412, 2.55111111111111, 3.835, 
2.66636363636364), Awake.hours = c(0.881764705882353, 0.723333333333333, 
1.08764705882353, 0.92, 0.568, 0.333030303030303), Missing.Data.hours = c(0, 
0, 0, 0, 0, 0), Respiratory.Rate = c(NaN, NaN, NaN, NaN, 
NaN, NaN), Year_Day = c(147.852941176471, 127.5, 145.117647058824, 
129.888888888889, 130.5, 146), Week_Year = c(21.5588235294118, 
18.6666666666667, 21.1764705882353, 19, 19.1, 21.2727272727273
)), row.names = c(NA, -6L), groups = structure(list(First.Name = 5006:5009, 
.rows = structure(list(1:2, 3:4, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

看起来像:

First.Name Session   RHR   HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturbances Latency.min Cycles REM.Sleep.hours Deep.Sleep.hours Light.Sleep.hou~
<int> <chr>   <dbl> <dbl>        <dbl>          <dbl>              <dbl>       <dbl>  <dbl>           <dbl>            <dbl>            <dbl>
1       5006 Post     65.2  79.1         6.76           5.88               6.85        6.97   5.74           1.43             0.612             3.84
2       5006 Pre      60    73.5         6.32           5.60               6.67        3.31   5.83           1.55             0.55              3.50
3       5007 Post     62.6  91.4         5.98           4.89               4.53        3.77   3.24           0.466            1.17              3.25
4       5007 Pre      63    80.6         4.86           3.94               3.56        2.81   2.22           0.413            0.972             2.55
5       5008 NA       63.4 102.          6.50           5.93               5.2         2.88   5              1.42             0.68              3.84
6       5009 Post     48.6 147.          5.44           5.10               2.94        2.90   3.33           0.698            1.74              2.67

我正试图根据Session列减去所有列中的特定行。具体来说，每个First.NameID的Post-Pre跨列。但是，有些id缺少Pre或Post值，或者两者都缺少。

例如:主题5006的RHR列将是Post - Pre或65.2-60，依此类推。

我试过各种

DF %>% 
group_by(First.Name) %>% 
summarise(RHR[Session == "Post"] - RHR[Session == "Pre"])

但是我确信有一种方法可以使用总结或应用函数，而不需要为值的差异改变新列。帮助感激。

也许可以使用summary across来一次获取所有行。

summarize(df%>%group_by(First.Name), across(RHR:Week_Year, function(x) {return(x[1]-x[2])}))
First.Name    RHR   HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturbances Latency.min  Cycles
<int>  <dbl> <dbl>        <dbl>          <dbl>              <dbl>       <dbl>   <dbl>
1       5006  5.24   5.65        0.439          0.282              0.186       3.65  -0.0980
2       5007 -0.412 10.9         1.12           0.955              0.974       0.961  1.01  
3       5008 NA     NA          NA             NA                 NA          NA     NA     
4       5009 NA     NA          NA             NA                 NA          NA     NA

我们也可以使用diff

library(dplyr)
DF %>% 
group_by(First.Name) %>%
summarise(across(RHR:Week_Year,  ~ -diff(.)[1]))

与产出

# A tibble: 4 x 16
#  First.Name    RHR   HRV Hours.in.Bed Hours.of.Sleep Sleep.Disturban… Latency.min  Cycles REM.Sleep.hours Deep.Sleep.hours Light.Sleep.hou…
#*      <int>  <dbl> <dbl>        <dbl>          <dbl>            <dbl>       <dbl>   <dbl>           <dbl>            <dbl>            <dbl>
#1       5006  5.24   5.65        0.439          0.282            0.186       3.65  -0.0980         -0.120            0.0621            0.340
#2       5007 -0.412 10.9         1.12           0.955            0.974       0.961  1.01            0.0531           0.201             0.699
#3       5008 NA     NA          NA             NA               NA          NA     NA              NA               NA                NA    
#4       5009 NA     NA          NA             NA               NA          NA     NA              NA               NA                NA    
# … with 5 more variables: Awake.hours <dbl>, Missing.Data.hours <dbl>, Respiratory.Rate <dbl>, Year_Day <dbl>, Week_Year <dbl>

相关内容

最新更新

热门标签：