r语言 - 使用雏菊进行聚类分析



我正在尝试使用 RStudio 执行分层聚类分析,方法是使用包daisy .这是我的数据集:

data.frame':341 obs. of  28 variables:
$ Impo_Env : Ord.factor w/ 3 levels "Low"<"Med"<"High": 3 2 3 2 3 2 3 3 2 3 ...
$ ComparativePriority_IAS: Ord.factor w/ 3 levels "Low"<"Med"<"High": 3 1 3 2 3 2 3 2 3 2 ...
$ Strategy_Eradication: Ord.factor w/ 3 levels "No intervention"<..: 3 2 3 2 3 2 3 2 2 3 ...
$ Knowl_BiodivLoss: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 2 2 2 ...
$ Control_Trade: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Engagement_Retail: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_PastProj: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 2 1 ...
$ Priority_IAS: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_Eradic: Factor w/ 2 levels "0","1": 2 1 2 1 2 2 1 2 2 1 ...
$ Alert_CFS: Factor w/ 2 levels "0","1": 1 2 1 2 1 2 2 1 2 1 ...
$ Alert_Municipality: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Alert_Park: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 1 ...
$ Alert_Police: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Alert_Firemen: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
$ Supp_AuthorityIAS: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_Env: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
$ Info_Tv: Factor w/ 2 levels "0","1": 2 2 1 2 2 2 2 1 2 1 ...
$ Info_Web: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 1 2 2 ...
$ Info_Radio: Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 1 1 ...
$ Info_Magazines: Factor w/ 2 levels "0","1": 1 1 2 1 2 1 1 2 1 1 ...
$ Info_School: Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 2 ...
$ Blacklist: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Workshop: Factor w/ 2 levels "0","1": 1 1 2 1 2 1 2 2 1 1 ...
$ SuppFin_FutProj: Factor w/ 2 levels "0","1": 2 1 2 1 2 2 2 2 2 2 ...
$ Tourist_dummy: Factor w/ 2 levels "0","1": 1 1 1 2 2 1 1 1 2 1 ...
$ Gender: Factor w/ 2 levels "Female","Male": 1 2 1 2 1 1 2 2 2 1 ...
$ logIASknown: num  2.89 2.94 2.89 2.56 3.14 ...
$ Age: int  20 41 14 10 26 33 19 59 23 16 ...

我想将欧几里得距离与daisy一起使用,但是当我跑步时

daisy(fuu, metric = c("euclidean"), type=list(ordratio = c(1,2,3), asymm=c(4:24), symm=c(25,26)))

输出不正常。使用高尔距离代替欧几里得距离:

警告消息:在 daisy(fuu, metric = c("euclidean"(, type = list(ordratio = c(1,:对于混合变量,自动使用度量 "gower">

我该如何解决它?

如集群包中包含的菊花函数文档中的"详细信息"部分所述:

标称、有序和 (a( 对称二进制数据的处理是 通过使用高尔的一般相异系数实现 (1971(. 如果 x 包含这些数据类型的任何列,则两个参数 度量和立场将被忽略,高尔系数将被使用 作为指标

换句话说,对于要计算的欧几里得度量(距离作为差值平方根和(,输入列必须是数字(众数(变量(即当x是矩阵时的所有列(,因此被识别为区间缩放变量,而不是名义(类因子列(变量或有序(类有序列(变量。在类型参数中指定变量类型不会更改这一事实。

在这些前提下,假设它对所有 28 个变量都有意义,尽管有些是定性的二进制,您可以尝试使用 as.numeric 转换它们并继续,原因是:混合变量自动使用度量"gower"覆盖

最新更新