

  1. 我想对包含多个0值的数据集执行主成分分析,这些0值不缺少数据(即它们指的是温度)。目的是根据不同地点和季节的温度变化对数据进行聚类。因此,由于prcomp()函数认为0值是R中的缺失值,我想知道什么可以阻止我向整个数据集添加常数(例如1)。这样,0值将被转换为1,并且这个常数也将被添加到数据集中的每个数值变量中。通过这样做,我假设我可以保留数据的原始变体,而不会在技术上阻碍R执行我希望它执行的PCA。但是由于我对这种方法不是很有信心,所以我想问你是否有什么可以阻止我这样做。
# Create a reproducible dataset
my_df <- data.frame(
Location = rep(LETTERS[1:6], 1000/2), 
Zone = sample(c("Europe", " America", "Africa", "Antartic"), replace = TRUE),
Temperatures = round(rnorm(1000), digits = 2)*10,
RISK_MM = round(rnorm(1000), digits = 2)*100,
Pressure = round(rnorm(1000), digits = 2)*1000,
Sunshine = round(rnorm(1000), digits = 2))
# Add "0" values and NAs to my dataset in regards of specific categorical variables
my_df <- my_df %>% 
Temperatures = case_when(
str_detect(Zone,"Antartic") & str_detect(Location,"A") ~ 0,
str_detect(Zone,"Antartic") & str_detect(Location,"B") ~ 0,
str_detect(Zone,"Europe") & str_detect(Location,"A") ~ 0,
str_detect(Zone,"Europe") & str_detect(Location,"B") ~ 0,
str_detect(Zone,"America") & str_detect(Location,"C") ~ 0,
str_detect(Zone,"Antartic") & str_detect(Location,"C") ~ NA_real_,
str_detect(Zone,"Africa") & str_detect(Location,"D") ~ NA_real_,
str_detect(Zone,"Africa") & str_detect(Location,"F") ~ NA_real_,
TRUE ~ as.numeric(as.character(Temperatures))))
# Convert characters into factors
my_df <- mutate_if(my_df, is.character, as.factor)
# Print the results
  1. 一些列也缺少我认为在使用mice()函数执行PCA之前对我感兴趣的分类独立变量进行估算的值,例如:
# Run the multiple (m = 5) imputation
imp <- my_df %>%
group_by(Location, Zone) %>% 
mice(m = 5, maxit = 50, method = "cart", seed = 123)
# Create a dataset after imputation
completeImputedData <- complete(imp, 1)
# Convert the initial dataset to a numerical dataset
completeImputedData_num <- completeImputedData %>%
ungroup() %>%
# Add a constant 
completeImputedData_num_cs <- completeImputedData_num + 1
# Dimension reduction using PCA and scale the data.
my_pca <- prcomp(completeImputedData_num_cs,  scale = TRUE, center = TRUE)
# Keep going...






#       variable q_zeros p_zeros q_na p_na q_inf p_inf    type unique
# 1 Temperatures      12     0.4    0    0     0     0 numeric    383
# 2      RISK_MM      21     0.7    0    0     0     0 numeric    378
# 3     Pressure       0     0.0    0    0     0     0 numeric    376
# 4     Sunshine       9     0.3    0    0     0     0 numeric    364 
#           variable q_zeros p_zeros q_na p_na q_inf p_inf    type unique
# 1 var.Temperatures       0       0    0    0     0     0 numeric    383
# 2      var.RISK_MM       0       0    0    0     0     0 numeric    378
# 3     var.Pressure       0       0    0    0     0     0 numeric    376
# 4     var.Sunshine       0       0    0    0     0     0 numeric    364 
