我正在使用OpenCSV。
我有一个CSVReader
尝试解析 CSV 文件。
该文件具有引号字符"
和分隔符字符,
和转义字符也"
。
请注意,CSV 包含如下单元格:
"ballet 24"" classes"
""
它们实际上表示这些值:
ballet 24" classes
例:
"9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice lesson","Broad","0.00","0","1","3.00","0.00","0.00","0.00","7","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990795","ballet class","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet 24"" classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Computer","2329043291","Bing and Yahoo! search","22951990817","","Broad","0.00","0","1","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","1","7.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","4","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990874","zumba lessons","Broad","0.00","0","1","2.00","0.00","0.00","0.00","0","0","",""
我的问题是我无法为CSVReader
构造函数
指定转义字符的"
(即使其与引号字符相同)。
如果我这样做,CSVReader
就会发疯,它会将整个 CSV 行读取为单个 CSV 单元格。
有没有人遇到过这个错误以及如何解决它?!
如果您使用CsvReader的默认设置,它将起作用。
检查他们有的这个未解决的错误: sourceforge.net/p/opencsv/bugs/83:
实际上,它工作正常,只是不是你想的那样。其默认值为 逗号表示分隔符,引号表示引号字符,反斜杠表示 转义字符。但是,它理解两个连续的报价 作为转义引号字符的字符。所以,如果你只是去 默认情况下,它将正常工作。
默认情况下,它能够用双引号转义双引号,但您的"真实"转义字符必须仍然是其他字符。
所以以下工作:
CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
- 逗号作为分隔符 双引号
- 作为引号字符
- 破折号(任何其他字符)作为转义字符
但随后,需要修改您的字段"\"以转义转义字符。
CSVReader
不完全符合RFC4180。使用他们较新的 CSV 阅读器 (RFC4180解析器):
RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
new FileReader("input.csv"));
CSVReader reader = csvReaderBuilder
.withCSVParser(rfc4180Parser)
.build();
要读取格式化为 CSV 的字符串行,请执行以下操作:
String test = "ballet 24"" classes";
String[] columns = new RFC4180Parser().parseLine(test);
要使用阅读器(另一种方法是reader.readNext()
):
for (String[] line : reader.readAll()) {
for (String s : line) {
System.out.println(s);
}
}
有关更多详细信息,请参阅 http://opencsv.sourceforge.net/#rfc4180parser。
代码取自GeekPrompt
它不能通过CSVReader完成
from pyspark.sql.session import SparkSession
spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= """)