我正在使用扫描仪读取 2 个文本文件(可能包含重复项(并将它们写入 arraylist。我正在比较两个数组列表以找到差异。当我打印出来时,我可以看到有什么区别,但我不知道哪个记录来自哪个文件(文本文件名(
文本内容1.txt
TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,VEM,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,ZER,20190703113154,20190601000000,20190701000000,
文本内容2.txt
TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,VEM,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,ZER,20190703113154,20190601000000,20190701000000,
法典:
Scanner prodScanner = new Scanner(prodFile);
while (prodScanner.hasNextLine()) {
String currentRecord = prodScanner.nextLine().trim();
if (currentRecord.length() > 0) {
prodRecordsFromStatement.add(currentRecord);
}
}
Scanner nonProdScanner = new Scanner(nonProdFile);
while (nonProdScanner.hasNextLine()) {
String currentRecord = nonProdScanner.nextLine().trim();
if (currentRecord.length() > 0) {
nonProdRecordsFromStatement.add(currentRecord);
}
}
Collection<String> result = new ArrayList<>(CollectionUtils.disjunction(prodRecordsFromStatement, nonProdRecordsFromStatement));
List<String> resultList = new ArrayList<>(result);
Collections.sort(resultList);
实际结果:
TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,
预期成果: 我希望显示文件/列表的名称以便于理解
text2.txt,TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,
遍历resultList
检查以查看当前项目是否也在prodRecordsFromStatement
中。
如果是这样,则它来自文件 1,否则来自文件 2。
您的解决方案需要多大的性能? 如果性能不是非常关键,并且您的列表不长,那么您可以切换到使用subtract
而不是析取。
例如
Collection<String> resultProdRecords = new ArrayList<>(CollectionUtils.subtract(prodRecordsFromStatement, nonProdRecordsFromStatement));
Collection<String> resultNonProdRecords = new ArrayList<>(CollectionUtils.subtract(prodRecordsFromStatement, nonProdRecordsFromStatement));
resultProdRecords
将包含 prodRecordsFromStatement 中不在非 ProdRecordFromStatement 中的所有行。
resultNonProdRecords
将包含非 ProdRecordFromStatement 中所有不在 prodRecordsFromStatement 中的行。