如何在R中计算概率



嗨,所以我正在上统计课,我们得到了一个数据集"NHANES",我们过滤了它以获得成年吸烟者 ->"NHANES_adult"。

library(NHANES)
# create a NHANES dataset without duplicated IDs 
NHANES <-
NHANES %>%
distinct(ID, .keep_all = TRUE) 
NHANES_adult <- NHANES %>%
filter(Age >= 18) %>%  # only include individuals 18 or older
filter(SmokeNow != 'NA')  # drop any observations with NA for SmokeNow

我的教授问了以下问题:

1b.现在,让我们从NHANES_adult数据帧中抽取 100 个人的单个样本,并计算吸烟者的比例,并将其保存到名为 p_smokers 的变量中。

set.seed(12345)  # PROVIDED CODE - this will cause it to create the same
# random sample each time
sample_size = 100 # size of each sample
p_smokers <- NHANES_adult %>%
sample(sample_size) %>%  # take a sample from the data frame [I think this is okay]
____(____ = ____(____)) %>% # compute the probability of smoking [This is the point at which I'm struggling to understand what one-line function fits these blank parameters.
____()  # extract the variable from the data frame [I believe this is the mutate() function?]
p_smokers

也许这就是你要找的。似乎您应该使用sample_n()而不是sample()。要在一行中找到比例,请使用mean()

sample_size <- 100
NHANES_adult %>%
sample_n(sample_size) %>%  
summarize(p_smok = mean(SmokeNow == "Yes")) %>% 
pull(p_smok) 

最新更新