我通过使用Java WEKA LIB启动使用开源代码群集我的数据当数据集的格式.arff的格式时,它可以正确运行文件名是" u.user"您可以在此处找到文件暗示http://files.grouplens.org/datasets/movielens/ml-100k-readme.txt
这是我的代码
import weka.clusterers.SimpleKMeans;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import java.io.IOException;
public class Clustering {
public static void main(String args[]) throws Exception{
//load dataset
String dataset = "C:/Users/DELL/Desktop/work/u.user";
DataSource source = new DataSource(dataset);
//get instances object
Instances data = source.getDataSet();
// new instance of clusterer
SimpleKMeans model = new SimpleKMeans();//Simple EM (expectation maximisation)
//number of clusters
model.setNumClusters(4);
//set distance function
//model.setDistanceFunction(new weka.core.ManhattanDistance());
// build the clusterer
model.buildClusterer(data);
System.out.println(model);
}
}
运行此错误显示
Exception in thread "main" java.io.IOException: File not found : C:UsersDELLDesktopworku.names
weka.core.converters.C45Loader.setSource(C45Loader.java:190)
weka.core.converters.AbstractFileLoader.setFile(AbstractFileLoader.java:90)
weka.core.converters.ConverterUtils$DataSource.reset(ConverterUtils.java:306)
weka.core.converters.ConverterUtils$DataSource.<init>(ConverterUtils.java:141)
Clustering.main(Clustering.java:24)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
at weka.core.converters.C45Loader.setSource(C45Loader.java:190)
at weka.core.converters.AbstractFileLoader.setFile(AbstractFileLoader.java:90)
at weka.core.converters.ConverterUtils$DataSource.reset(ConverterUtils.java:306)
at weka.core.converters.ConverterUtils$DataSource.<init>(ConverterUtils.java:141)
at Clustering.main(Clustering.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Process finished with exit code 1
我确定它是因为延长文件,当我使用其他文件时,bavause。您能帮我如何将数据聚类
您还需要注意文件格式(不仅是扩展名)。将数据集格式转换为匹配WEKA ARFF格式。对于数据u.user
,您需要将扩展名更改为 *.arff(例如user.arff
)和格式为:
@RELATION user
@ATTRIBUTE id INTEGER % this is actually useless
@ATTRIBUTE age INTEGER
@ATTRIBUTE gender {M,F}
@ATTRIBUTE occupation {administrator,artist,doctor,educator,engineer,entertainment,executive,healthcare,homemaker,lawyer,librarian,marketing,none,other,programmer,retired,salesman,scientist,student,technician,writer} % from u.occupation
@ATTRIBUTE zipcode STRING
@DATA
1,24,M,technician,85711
2,53,F,other,94043
3,23,M,writer,32067
4,24,M,technician,43537
5,33,F,other,15213
6,42,M,executive,98101
7,57,M,administrator,91344
8,36,M,administrator,05201
...
您应该能够将数据集解析为weka.core.Instances
。但是,不幸的是,SimpleKMeans
会以:
weka.core.unsupportedattributetypeexception: weka.clusterers.simplekmeans:无法处理字符串属性!
因此,您至少要有3个选项:
- 矢量化或将数据的功能转换为数字值(也删除无用的数据,例如
id
) - 使用另一种可以处理诸如
weka.clusterers.HierarchicalClusterer
的聚类算法 - 结合两个解决方案
祝你好运!