我想知道是否有办法在下面的data
中用唯一的sch.id
(例如,每个sch.id
的第一行(为每行中的一行子集?
由于有160个唯一的sch.id
,所以我希望在最终输出中有160行。
library(tidyverse)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
data <- hsb %>% group_by(sch.id) %>% mutate(math_ave = mean(math))
如果我们需要所有的变量,一个选项是在mutate
之后使用distinct
,这样它将保留每个"sch.id"的第一行
library(dplyr)
hsb %>%
group_by(sch.id) %>%
mutate(math_ave = mean(math)) %>%
ungroup %>%
distinct(sch.id, .keep_all = TRUE)
# A tibble: 160 x 9
# sch.id math size sector meanses minority female ses math_ave
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1224 5.88 842 0 -0.428 0 1 -1.53 9.72
# 2 1288 7.86 1855 0 0.128 0 1 -0.788 13.5
# 3 1296 12.7 1719 0 -0.42 1 1 -0.148 7.64
# 4 1308 13.2 716 1 0.534 0 0 0.422 16.3
# 5 1317 12.9 455 1 0.351 0 1 0.882 13.2
# 6 1358 -1.35 1430 0 -0.014 1 0 0.032 11.2
# 7 1374 16.7 2400 0 -0.007 0 0 0.322 9.73
# 8 1433 12.9 899 1 0.718 0 0 0.812 19.7
# 9 1436 24.1 185 1 0.569 0 0 0.222 18.1
#10 1461 13.0 1672 0 0.683 0 1 0.042 16.8
# … with 150 more rows
或者另一个没有ungroup
ing的选项是slice
第一行
hsb %>%
group_by(sch.id) %>%
mutate(math_ave = mean(math)) %>%
slice(1)
或将base R
与ave
和duplicated
一起使用
transform(hsb, math_ave = ave(math, sch.id))[!duplicated(hsb$sch.id),]
data.table
方法可以是:
library(data.table)
setDT(data)[, .SD[1], sch.id]
# sch.id math size sector meanses minority female ses math_ave
# 1: 1224 5.876 842 0 -0.428 0 1 -1.528 9.715447
# 2: 1288 7.857 1855 0 0.128 0 1 -0.788 13.510800
# 3: 1296 12.668 1719 0 -0.420 1 1 -0.148 7.635958
# 4: 1308 13.233 716 1 0.534 0 0 0.422 16.255500
# 5: 1317 12.862 455 1 0.351 0 1 0.882 13.177687
# ---
#156: 9359 19.797 1184 1 0.360 0 0 0.612 15.270623
#157: 9397 5.873 1314 0 0.140 0 0 0.502 10.355468
#158: 9508 13.932 1119 1 -0.132 0 0 0.242 13.574657
#159: 9550 1.766 1532 0 0.059 1 0 -0.228 11.089138
#160: 9586 14.076 262 1 0.627 0 1 0.852 14.863695