x[[jj]][iseq] <- vjj 中的错误：替换在 R 中的长度为零(KlaR 包)

我有一个包含188列和100行的数据集(加上一个头行(。我试图将R中的kmodes聚类方法(来自klaR包(应用于该矩阵。

数组数据结构中有两种类型的数据：字符串和二进制。两者都有空值。

例如：

Q27_history     Q28
1          <NA> 
<NA>    yes, sometimes

计算簇内总和的函数平方和：

set.seed (96743)
# function to compute total within-cluster sum of square 
wss <- function(k) {
sum((kmodes( whois_data, k)$withindiff))
}
# Compute and plot wss for k = 1 to k = 15
k.values <- 2:15
# extract wss for 2-15 clusters
wss_values <- map_dbl(k.values, wss)
print(wss_values)

错误文本：

Error in x[[jj]][iseq] <- vjj : replacement has length zero

担心：

Error in print(wss_values) :object 'wss_values' is not found

我已尝试将kmodes(na.fill(data, fill=""), k)放入：

wss <- function(k) {
sum((kmodes( whois_data, k)$withindiff))
kmodes(na.fill(data, fill=""), k)
}

但之后library(purrr)停止工作并且没有找到变量map_dbl

我应该如何使用空数据内联行？

我认为在使用kmode时不能有NA，它应该会抛出一个错误：

set.seed(111)
whois_data = data.frame(Q1 = rbinom(100,1,0.5),
Q2 = sample(c("Y","N"),100,replace=TRUE),
Q3 = sample(c(NA,1:3),100,replace=TRUE))
kmodes(whois_data,3)
Error in old.cluster != cluster : 
comparison of these types is not implemented

在没有NA的情况下进行kmode更有意义：

wss <- function(k,df) {
sum((kmodes(df, k)$withindiff))
}
library(purrr)
map_dbl(2:5, wss,df = whois_data[complete.cases(whois_data),])
[1] 91 58 70 42

相关内容

最新更新

热门标签：