CSVReadeR语言 使用 " 表示转义字符时的错误



我正在使用OpenCSV。

我有一个CSVReader尝试解析 CSV 文件。
该文件具有引号字符"和分隔符字符,和转义字符也"

请注意,CSV 包含如下单元格:

"ballet 24"" classes"
""  

它们实际上表示这些值:

ballet 24" classes

例:

"9/6/2014","3170168","123652278","Computer","2329043290","Bing and Yahoo! search","22951990789","voice lesson","Broad","0.00","0","1","3.00","0.00","0.00","0.00","7","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990795","ballet class","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet 24"" classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043291","Bing and Yahoo! search","22951990797","ballet classes","Broad","0.00","0","1","1.00","0.00","0.00","0.00","0","0","",""
"9/6/2014","3170168","123652278","Computer","2329043291","Bing and Yahoo! search","22951990817","","Broad","0.00","0","1","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","1","7.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Smartphone","2329043293","Bing and Yahoo! search","22951990850","zumba classes","Broad","0.00","0","4","1.00","0.00","0.00","0.00","5","0","",""
"9/6/2014","3170168","123652278","Computer","2329043293","Bing and Yahoo! search","22951990874","zumba lessons","Broad","0.00","0","1","2.00","0.00","0.00","0.00","0","0","",""

我的问题是我无法为CSVReader构造函数
指定转义字符的"(即使其与引号字符相同)。
如果我这样做,CSVReader就会发疯,它会将整个 CSV 行读取为单个 CSV 单元格。

有没有人遇到过这个错误以及如何解决它?!

如果您使用CsvReader的默认设置,它将起作用。

检查他们有的这个未解决的错误: sourceforge.net/p/opencsv/bugs/83:

实际上,它工作正常,只是不是你想的那样。其默认值为 逗号表示分隔符,引号表示引号字符,反斜杠表示 转义字符。但是,它理解两个连续的报价 作为转义引号字符的字符。所以,如果你只是去 默认情况下,它将正常工作。

默认情况下,它能够用双引号转义双引号,但您的"真实"转义字符必须仍然是其他字符。

所以以下工作:

CSVReader reader = new CSVReader(new FileReader(App.class.getClassLoader().getResource("csv.csv").getFile()), ',','"','-');
  • 逗号作为分隔符
  • 双引号
  • 作为引号字符
  • 破折号(任何其他字符)作为转义字符
起初我把"\"作为转义字符,

但随后,需要修改您的字段"\"以转义转义字符。

CSVReader不完全符合RFC4180。使用他们较新的 CSV 阅读器 (RFC4180解析器):

RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReaderBuilder csvReaderBuilder = new CSVReaderBuilder(
    new FileReader("input.csv"));
CSVReader reader = csvReaderBuilder
    .withCSVParser(rfc4180Parser)
    .build();

要读取格式化为 CSV 的字符串行,请执行以下操作:

String test = "ballet 24"" classes";
String[] columns = new RFC4180Parser().parseLine(test);

要使用阅读器(另一种方法是reader.readNext()):

for (String[] line : reader.readAll()) {
  for (String s : line) {
    System.out.println(s);
  }
}

有关更多详细信息,请参阅 http://opencsv.sourceforge.net/#rfc4180parser。

代码取自GeekPrompt

它不能通过CSVReader完成

from pyspark.sql.session import SparkSession
spark = SparkSession(sc)
rdd = spark.read.csv("csv.csv", multiLine=True, header="False", encoding='utf-8', escape= """)

相关内容

  • 没有找到相关文章

最新更新