如何使用OpenCSV来解析多行记录



我正在尝试使用OpenCSV-解析类似的文件

CUST,Warren,Q,Darrow,8272 4th Street,New York,IL,76091
TRANS,1165965,2011-01-22 00:13:29,51.43
CUST,Erica,I,Jobs,8875 Farnam Street,Aurora,IL,36314
TRANS,8116369,2011-01-21 20:40:52,-14.83
TRANS,8116369,2011-01-21 15:50:17,-45.45
TRANS,8116369,2011-01-21 16:52:46,-74.6
TRANS,8116369,2011-01-22 13:51:05,48.55
TRANS,8116369,2011-01-21 16:51:59,98.53

我将使用Customer对象读取以"CUST"开头的记录。Customer对象将包含一个交易列表。

public class Customer {
private String firstName;
private String middleInitial;
private String lastName;
private String address;
private String city;
private String state;
private String zipCode;
List<Transaction> transactions;
...
}

我将使用Transaction对象读取以"TRANS"开头的记录。

public class Transaction {
private String accountNumber;
private Date transactionDate;
private Double amount;
...
}

一个客户可以有一个或多个Transaction。虽然,我能够使用CSVReader来实现这一点。我可以使用Annotations实现同样的功能吗?

CSV文件是列表,对吗?嗯,有些人喜欢列表中的列表。

来自文档

似乎OpenCSV只能处理单个";物理";CSV记录,似乎没有什么可以处理您的案件。但是,如果您可以逐个记录解析输入的CSV文档记录,则可以将解析组织为组解析,以便在组准备好后,您可以自己对其进行反序列化。

例如,

public static Stream<List<String[]>> readGroups(@WillClose final CSVReader csvReader, final Predicate<? super String[]> isGroupStart,
final Predicate<? super String[]> isGroupSpan) {
final Spliterator<List<String[]>> spliterator = new Spliterators.AbstractSpliterator<List<String[]>>(Long.MAX_VALUE, Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(final Consumer<? super List<String[]>> action) {
try {
final String[] head = csvReader.readNextSilently();
if ( !isGroupStart.test(head) ) {
throw new IOException("First record must delimit a group start");
}
final List<String[]> buffer = new ArrayList<>();
buffer.add(head);
@Nullable
String[] peeked;
while ( (peeked = csvReader.peek()) != null && !isGroupStart.test(peeked) ) {
if ( !isGroupSpan.test(peeked) ) {
throw new IOException("Not a group span");
}
csvReader.readNextSilently(); // discard the "peeked" state
buffer.add(peeked);
}
action.accept(buffer);
return peeked != null;
} catch ( final IOException ex ) {
throw new UncheckedIOException(ex);
}
}
};
return StreamSupport.stream(spliterator, false)
.onClose(() -> {
try {
csvReader.close();
} catch ( final IOException ex ) {
throw new UncheckedIOException(ex);
}
});
}

上面的方法可以从CSV:中生成两个字符串数组列表

CUST,Warren,Q,Darrow,8272 4th Street,New York,IL,76091
TRANS,1165965,2011-01-22 00:13:29,51.43
CUST,Erica,I,Jobs,8875 Farnam Street,Aurora,IL,36314
TRANS,8116369,2011-01-21 20:40:52,-14.83
TRANS,8116369,2011-01-21 15:50:17,-45.45
TRANS,8116369,2011-01-21 16:52:46,-74.6
TRANS,8116369,2011-01-22 13:51:05,48.55
TRANS,8116369,2011-01-21 16:51:59,98.53

只有这两个组,就可以将每个组反序列化为Customer:的实例

@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Customer {
final String firstName;
final String middleInitial;
final String lastName;
final String address;
final String city;
final String state;
final String zipCode;
final List<Transaction> transactions;
}
@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Transaction {
final String accountNumber;
final String id;
final LocalDateTime transactionDate;
final BigDecimal amount;
}
public final class CsvTest {
private static final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
@Test
public void testRead() {
try ( final Stream<List<String[]>> rawStream = Csv.readGroups(new CSVReader(new InputStreamReader(CsvTest.class.getResourceAsStream("customers.csv"))), CsvTest::isGroupStart, CsvTest::isGroupSpan) ) {
rawStream
.map(CsvTest::parseCustomer)
.forEachOrdered(System.out::println);
}
}
private static boolean isGroupStart(final String[] row) {
return row.length > 0 && row[0].equals("CUST");
}
private static boolean isGroupSpan(final String[] row) {
return row.length > 0 && row[0].equals("TRANS");
}
private static Customer parseCustomer(final List<String[]> group) {
final List<Transaction> transactions = group.subList(1, group.size())
.stream()
.map(rawTransaction -> {
final String accountNumber = rawTransaction[1];
final LocalDateTime transactionDate = LocalDateTime.parse(rawTransaction[2], dateTimeFormatter);
final BigDecimal amount = new BigDecimal(rawTransaction[3]);
return new Transaction(accountNumber, transactionDate, amount);
})
.collect(Collectors.collectingAndThen(Collectors.toList(), Collections::unmodifiableList));
final String[] rawCustomer = group.get(0);
final String firstName = rawCustomer[1];
final String middleInitial = rawCustomer[2];
final String lastName = rawCustomer[3];
final String address = rawCustomer[4];
final String city = rawCustomer[5];
final String state = rawCustomer[6];
final String zipCode = rawCustomer[7];
return new Customer(firstName, middleInitial, lastName, address, city, state, zipCode, transactions);
}
}

它向终端产生以下输出:

Customer(firstName=Warren, middleInitial=Q, lastName=Darrow, address=8272 4th Street, city=New York, state=IL, zipCode=76091, transactions=[Transaction(accountNumber=1165965, transactionDate=2011-01-22T00:13:29, amount=51.43)])
Customer(firstName=Erica, middleInitial=I, lastName=Jobs, address=8875 Farnam Street, city=Aurora, state=IL, zipCode=36314, transactions=[Transaction(accountNumber=8116369, transactionDate=2011-01-21T20:40:52, amount=-14.83), Transaction(accountNumber=8116369, transactionDate=2011-01-21T15:50:17, amount=-45.45), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:52:46, amount=-74.6), Transaction(accountNumber=8116369, transactionDate=2011-01-22T13:51:05, amount=48.55), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:51:59, amount=98.53)])

我想它的工作速度应该比OpenCSV中内置的反序列化快一点(+它只是更灵活,但很无聊(。但我还不确定如何改进上面的代码,以支持CSV标题,而不是硬编码的列位置。

相关内容

  • 没有找到相关文章

最新更新