r语言 - 如何更改多个子集中的值



我有一个数据框架,其中有10017个观察值,分为159个金融机构。我如何改善每个金融机构分布的正态性,而不必使用excel并手动更改分布的1%和99%上的值超过+-3SD的数据?

我是新来的数据分析,所以我希望它是清楚的

我要求tapply(df$x, df$id, quantile, (0.01,0.99)),然后我改变了Excel的异常值

下面的例子可能会对您有所帮助

library(dplyr)

mtcars %>% 
#Selection of just two variables to exemplify
select(vs,drat) %>%
#Grouping by vs variables
group_by(vs) %>% 
mutate(
#Computing the quantiles of drat by vs
q_01 = quantile(drat,0.01),
q_99 = quantile(drat,0.99),
#Changing the values to NA when they are more extreme than the quantiles
drat = if_else(drat < q_01 | drat > q_99,NA_real_,drat)
)

# Creating a function to change the values if they are more extreme than their quantiles
remove_quantile <- function(x){
if_else(x < quantile(x,0.01) | x > quantile(x,0.99),NA_real_,x)
}

mtcars %>% 
group_by(vs) %>% 
#Applying the function across all numeric variables from the data set
mutate(across(.cols = where(is.numeric),.fns = remove_quantile))

相关内容

最新更新