简介
我正在构建一个过程,以合并一些大型分类的CSV文件。我目前正在考虑使用单元进行此操作。我设置合并的方式是使用实现可比接口的bean。
给定
简化的文件看起来像这样:
id,data
1,aa
2,bb
3,cc
bean看起来像这样(getters and setters commited):
public class Address implements Comparable<Address> {
@Parsed
private int id;
@Parsed
private String data;
@Override
public int compareTo(Address o) {
return Integer.compare(this.getId(), o.getId());
}
}
比较器看起来像这样:
public class AddressComparator implements Comparator<Address>{
@Override
public int compare(Address a, Address b) {
if (a == null)
throw new IllegalArgumentException("argument object a cannot be null");
if (b == null)
throw new IllegalArgumentException("argument object b cannot be null");
return Integer.compare(a.getId(), b.getId());
}
}
由于我不想读取内存中的所有数据,因此我想读取每个文件的顶部记录并执行一些比较逻辑。这是我简化的示例:
public class App {
private static final String INPUT_1 = "src/test/input/address1.csv";
private static final String INPUT_2 = "src/test/input/address2.csv";
private static final String INPUT_3 = "src/test/input/address3.csv";
public static void main(String[] args) throws FileNotFoundException {
BeanListProcessor<Address> rowProcessor = new BeanListProcessor<Address>(Address.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
List<FileReader> readers = new ArrayList<>();
readers.add(new FileReader(new File(INPUT_1)));
readers.add(new FileReader(new File(INPUT_2)));
readers.add(new FileReader(new File(INPUT_3)));
// This parses all rows, but I am only interested in getting 1 row as a bean.
for (FileReader fileReader : readers) {
parser.parse(fileReader);
List<Address> beans = rowProcessor.getBeans();
for (Address address : beans) {
System.out.println(address.toString());
}
}
// want to have a map with the reader and the first bean object
// Map<FileReader, Address> topRecordofReader = new HashMap<>();
Map<FileReader, String[]> topRecordofReader = new HashMap<>();
for (FileReader reader : readers) {
parser.beginParsing(reader);
String[] row;
while ((row = parser.parseNext()) != null) {
System.out.println(row[0]);
System.out.println(row[1]);
topRecordofReader.put(reader, row);
// all done, only want to get first row
break;
}
}
}
}
问题
给出了上面的示例,我如何以每行迭代并返回每行返回bean而不是解析整个文件的方式?
我正在寻找这样的东西(此不工作代码只是指我要寻找的解决方案):
for (FileReader fileReader : readers) {
parser.beginParsing(fileReader);
Address bean = null;
while (bean = parser.parseNextRecord() != null) {
topRecordofReader.put(fileReader, bean);
}
}
有两种方法可以读取迭代,而不是将所有内容加载到内存中,第一个是使用BeanProcessor
而不是BeanListProcessor
:
settings.setRowProcessor(new BeanProcessor<Address>(Address.class) {
@Override
public void beanProcessed(Address address, ParsingContext context) {
// your code to process the each parsed object here!
}
要在没有回调的情况下读取bean迭代(并执行其他一些常见的过程),我们创建了一个CSVROUTINES类(该类别从AbsTractRoutines延伸 - 此处更多示例):
File input = new File("/path/to/your.csv")
CsvParserSettings parserSettings = new CsvParserSettings();
//...configure the parser
// You can also use TSV and Fixed-width routines
CsvRoutines routines = new CsvRoutines(parserSettings);
for (Address address : routines.iterate(Address.class, input, "UTF-8")) {
//process your bean
}
希望这会有所帮助!