我有一个数据框架,其中有10017个观察值,分为159个金融机构。我如何改善每个金融机构分布的正态性,而不必使用excel并手动更改分布的1%和99%上的值超过+-3SD的数据?
我是新来的数据分析,所以我希望它是清楚的
我要求tapply(df$x, df$id, quantile, (0.01,0.99))
,然后我改变了Excel的异常值
下面的例子可能会对您有所帮助
library(dplyr)
mtcars %>%
#Selection of just two variables to exemplify
select(vs,drat) %>%
#Grouping by vs variables
group_by(vs) %>%
mutate(
#Computing the quantiles of drat by vs
q_01 = quantile(drat,0.01),
q_99 = quantile(drat,0.99),
#Changing the values to NA when they are more extreme than the quantiles
drat = if_else(drat < q_01 | drat > q_99,NA_real_,drat)
)
# Creating a function to change the values if they are more extreme than their quantiles
remove_quantile <- function(x){
if_else(x < quantile(x,0.01) | x > quantile(x,0.99),NA_real_,x)
}
mtcars %>%
group_by(vs) %>%
#Applying the function across all numeric variables from the data set
mutate(across(.cols = where(is.numeric),.fns = remove_quantile))