我有这个数据:
String[] a = {"a", "b", "c", "d"};
String[] b = {"c", "d"};
String[] c = {"b", "c"};
现在,我需要这些列表的每个交叉点的图形表示,主要是这样会得到这样的维恩图:http://manuals.bioinformatics.ucr.edu/_/rsrc/1353282523430/home/R_BioCondManual/r-bioc-temp/venn1.png?height=291&宽度=400
在我的实现中,这些列表大多包含1000多个条目,我将有10多个列表,因此一个好的表示将创建一组字符串并与它们相交。在我非常简单的情况下,这将导致
set_a = {"c"}; // in all three lists
set_b = {"b", "d"}; // in two of three lists
set_c = {"a"}; // in one of three lists
现在的另一个要求是,交集的大小应该与列表中出现的次数成比例。因此CCD_ 1的大小应该是CCD_。
有满足那个需求的库吗?
我认为这个程序完成了您想要的转换:
// The input
String[][] a = {
{"a", "b", "c", "d"},
{"c", "d"},
{"b", "c"}
};
System.out.println("Input: "+ Arrays.deepToString(a));
// Convert the input to a Set of Sets (so that we can hangle it more easily
Set<Set<String>> input = new HashSet<Set<String>>();
for (String[] s : a) {
input.add(new HashSet<String>(Arrays.asList(s)));
}
// The map is used for counting how many times each element appears
Map<String, Integer> count = new HashMap<String, Integer>();
for (Set<String> s : input) {
for (String i : s) {
if (!count.containsKey(i)) {
count.put(i, 1);
} else {
count.put(i, count.get(i) + 1);
}
}
}
//Create the output structure
Set<String> output[] = new HashSet[a.length + 1];
for (int i = 1; i < output.length; i++) {
output[i] = new HashSet<String>();
}
// Fill the output structure according the map
for (String key : count.keySet()) {
output[count.get(key)].add(key);
}
// And print the output
for (int i = output.length - 1; i > 0; i--) {
System.out.println("Set_" + i + " = " + Arrays.toString(output[i].toArray()));
}
输出:
Input: [[a, b, c, d], [c, d], [b, c]]
Set_3 = [c]
Set_2 = [d, b]
Set_1 = [a]