和标题差不多。我正在编写代码,需要能够与BOM和非BOM文件一起工作。需要实现不同的解析选项,目前我正在实现对解析CSV文件的支持。
下面的代码是我正在处理的一个粗略的想法。如果需要,我可以提供一个最小的工作示例。
class LocalFileAccess {
// ...
// Opens an input stream to the file based on the path passed in constructor.
// Part of a larger interface, can't change the signature.
@Override
public InputStream getInputStream() throws FileNotFoundException {
File file = new File(this.path);
if (!file.isAbsolute()) {
file = getFile(this.base, this.path);
}
return new FileInputStream(file);
}
public void foo() {
try (BOMInputStream inputStream = new BOMInputStream(this.getInputStream())) {
Iterator<String[]> iterator = new CSVReaderBuilder(new InputStreamReader(inputStream, StandardCharsets.UTF_8).build().iterator();
String[] header = iterator.next(); // <- first value is prepended by BOM
} catch (...) { ... }
}
稍后在代码库中,当解析从Iterator获得的值时,头文件中的第一个值加上BOM,这会导致测试失败。最简单的方法是手动检查,但我宁愿保持我的代码干净。
将getInputStream()
的返回值包装在new BOMInputStream()
中可以修复它。然而,用this.getInputStream()
代替try-with-resources中的new BOMInputStream(this.getInputStream())
又打破了它:BOM通过了。
我已经尝试了在BOMInputStream中只包装getInputStream
的返回值的不同变体,在BOMInputStream中只包装带有资源的尝试中的InputStream,但无济于事。唯一的解决方案似乎是在一个BOMInputStream的资源尝试中包装getInputStream
的返回值,我不明白为什么。
为什么我需要在BOMInputStream中包装两次输入流?
编辑:澄清:我正在使用Apache Commons IO BOMInputStream。不希望我最后的评论暗示CommonsBOMInputStream
有什么问题(因为我不相信它们在没有BOM的情况下无法正确读取流),我决定测试它。如我所料,它完全能够读取带有或不带有BOM的文件:
源:
package com.technojeeves.opencsvbeans;
import com.opencsv.bean.CsvToBeanBuilder;
import org.apache.commons.io.input.BOMInputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.charset.StandardCharsets;
import java.util.List;
import java.io.IOException;
import java.io.Reader;
import java.io.InputStreamReader;
public class App {
public static void main(String[] args) {
try {
System.out.println(new App().read(Path.of(args[0])));
} catch (Throwable t) {
t.printStackTrace();
}
}
public List<Pojo> read(Path path) throws IOException {
try (Reader reader = new InputStreamReader(new BOMInputStream(Files.newInputStream(path)),
StandardCharsets.UTF_8)) {
return new CsvToBeanBuilder(reader).withType(Pojo.class).build().parse();
}
}
}
数据文件内容:
goose@t410:/tmp/opencsvbeans$ xxd pojo.csv
00000000: 706f 696e 742c 6e61 6d65 0a31 2c67 6f6f point,name.1,goo
00000010: 7365 0a32 2c64 7563 6b0a se.2,duck.
goose@t410:/tmp/opencsvbeans$ xxd pojo-bom.csv
00000000: efbb bf70 6f69 6e74 2c6e 616d 650a 312c ...point,name.1,
00000010: 676f 6f73 650a 322c 6475 636b 0a goose.2,duck.
运行并输出:
goose@t410:/tmp/opencsvbeans$ mvnt exec:java -Dexec.args=pojo-bom.csv
[INFO] Scanning for projects...
[INFO]
[INFO] -------------< com.technojeeves.opencsvbeans:opencsvbeans >-------------
[INFO] Building opencsvbeans 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ opencsvbeans ---
[name=goose,point=1, name=duck,point=2]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.189 s
[INFO] Finished at: 2022-12-12T11:31:02Z
[INFO] ------------------------------------------------------------------------
goose@t410:/tmp/opencsvbeans$ mvnt exec:java -Dexec.args=pojo.csv
[INFO] Scanning for projects...
[INFO]
[INFO] -------------< com.technojeeves.opencsvbeans:opencsvbeans >-------------
[INFO] Building opencsvbeans 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ opencsvbeans ---
[name=goose,point=1, name=duck,point=2]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.245 s
[INFO] Finished at: 2022-12-12T11:31:11Z
[INFO] ------------------------------------------------------------------------
goose@t410:/tmp/opencsvbeans$