我有一个包含188列和100行的数据集(加上一个头行(。我试图将R中的kmodes
聚类方法(来自klaR
包(应用于该矩阵。
数组数据结构中有两种类型的数据:字符串和二进制。两者都有空值。
例如:
Q27_history Q28
1 <NA>
<NA> yes, sometimes
计算簇内总和的函数平方和:
set.seed (96743)
# function to compute total within-cluster sum of square
wss <- function(k) {
sum((kmodes( whois_data, k)$withindiff))
}
# Compute and plot wss for k = 1 to k = 15
k.values <- 2:15
# extract wss for 2-15 clusters
wss_values <- map_dbl(k.values, wss)
print(wss_values)
错误文本:
Error in x[[jj]][iseq] <- vjj : replacement has length zero
担心:
Error in print(wss_values) :object 'wss_values' is not found
我已尝试将kmodes(na.fill(data, fill=""), k)
放入:
wss <- function(k) {
sum((kmodes( whois_data, k)$withindiff))
kmodes(na.fill(data, fill=""), k)
}
但之后library(purrr)
停止工作并且没有找到变量map_dbl
我应该如何使用空数据内联行?
我认为在使用kmode时不能有NA,它应该会抛出一个错误:
set.seed(111)
whois_data = data.frame(Q1 = rbinom(100,1,0.5),
Q2 = sample(c("Y","N"),100,replace=TRUE),
Q3 = sample(c(NA,1:3),100,replace=TRUE))
kmodes(whois_data,3)
Error in old.cluster != cluster :
comparison of these types is not implemented
在没有NA的情况下进行kmode更有意义:
wss <- function(k,df) {
sum((kmodes(df, k)$withindiff))
}
library(purrr)
map_dbl(2:5, wss,df = whois_data[complete.cases(whois_data),])
[1] 91 58 70 42