在Java中,如何处理需要解析的CSV内部的双引号



这是我想做的,

这是我的spend.csv文件:

"Date","Description","Detail","Amount"
"5/03/21","Cinema","Batman","7.90"
"15/02/20","Groceries","Potatoes","23.00"
"9/12/21","DIY","Wood Plates","33.99"
"9/07/22","Fuel","Shell","$56.00"
"23/08/19","Lamborghini","Aventador","800,000.00"

从表格视图:

csv 的表格视图

下面是我想要的名为spend.xml的输出文件:

<?xml version="1.0" encoding="UTF-8"?>
<SPEND>
<RECORD DATE="5/03/21">
<DESC>Cinema</DESC>
<DETAIL>Batman</DETAIL>
<AMOUNT>7.90</AMOUNT>
</RECORD>
<RECORD DATE="15/02/20">
<DESC>Groceries</DESC>
<DETAIL>Potatoes</DETAIL>
<AMOUNT>23.00</AMOUNT>
</RECORD>
<RECORD DATE="9/12/21">
<DESC>DIY</DESC>
<DETAIL>Wood Plates</DETAIL>
<AMOUNT>33.99</AMOUNT>
</RECORD>
<RECORD DATE="9/07/22">
<DESC>Fuel</DESC>
<DETAIL>Shell</DETAIL>
<AMOUNT>$56.00</AMOUNT>
</RECORD>
<RECORD DATE="23/08/19">
<DESC>Lamborghini</DESC>
<DETAIL>Aventador</DETAIL>
<AMOUNT>800,000.00</AMOUNT>
</RECORD>
</SPEND>

为了做到这一点,我在这里和那里找到了一些东西,并设法得到了这个:

public class Main {

public static void main(String[] args) throws FileNotFoundException {

List<String> headers = new ArrayList<String>(5);

File file = new File("spend.csv");
BufferedReader reader = null;

try {

DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();

Document newDoc = domBuilder.newDocument();
// Root element
Element rootElement = newDoc.createElement("XMLCreators");
newDoc.appendChild(rootElement);

reader = new BufferedReader(new FileReader(file));
int line = 0;

String text = null;
while ((text = reader.readLine()) != null) {

StringTokenizer st = new StringTokenizer(text, "", false);

int index = 0;


String[] rowValues = text.split(",");

if (line == 0) { // Header row
for (String col : rowValues) {
headers.add(col);
}
} else { // Data row
Element rowElement = newDoc.createElement("RECORDS");
rootElement.appendChild(rowElement);
for (int col = 0; col < headers.size(); col++) {
String header = headers.get(col);
String value = null;

if (col < rowValues.length) {
value = rowValues[col];
} else {
value = "";
}

Element curElement = newDoc.createElement(header);
curElement.appendChild(newDoc.createTextNode(value));
rowElement.appendChild(curElement);
}
}
line++;
}

ByteArrayOutputStream baos = null;
OutputStreamWriter osw = null;

try {
baos = new ByteArrayOutputStream();
osw = new OutputStreamWriter(baos);

TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
aTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
aTransformer.setOutputProperty(OutputKeys.METHOD, "xml");
aTransformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");

Source src = new DOMSource(newDoc);
Result result = new StreamResult(osw);
aTransformer.transform(src, result);

osw.flush();
System.out.println(new String(baos.toByteArray()));
} catch (Exception exp) {
exp.printStackTrace();
} finally {
try {
osw.close();
} catch (Exception e) {
}
try {
baos.close();
} catch (Exception e) {
}
}
} catch (Exception e) {
e.printStackTrace();
}

}
}

此时程序应在终端中打印XML文件;

遗憾的是,由于我的CSV文件中每个值都有双引号,我遇到了这个问题:

java org.w3c.dom.domexception invalid_character_err指定了无效或非法的xml字符

我想我遗漏了一些东西:


StringTokenizer st = new StringTokenizer(text, "", false);
int index = 0;
String[] rowValues = text.split(",");

我想在我的CSV中保留双引号,如果有人有想法,请随时告诉我!

在运行转换之前,请执行

String.replaceAll(""", "####")

然后运行转换,当转换完成时,将其反转并替换所有"####"在带双引号的字符串中

使用OpenCsv和Jackson的另一种可能方法:

public class FileProcessor {
public static void main(String[] args) throws IOException {
List<DataStructure> importList =  new CsvToBeanBuilder<DataStructure>(
new FileReader("pathIn"))
.withIgnoreEmptyLine(true)
.withType(DataStructure.class)
.build()
.parse();
ListLoader exportList = new ListLoader(importList);
XmlMapper xmlMapper = new XmlMapper();
xmlMapper.configure(ToXmlGenerator.Feature.WRITE_XML_DECLARATION, true)
.enable(SerializationFeature.INDENT_OUTPUT)
.writeValue(new File("pathOut"), exportList);
}
}

类来序列化每个元素:

@Data
public class DataStructure {
@CsvBindByName
@JacksonXmlProperty(isAttribute = true, localName = "DATE")
private String date;
@CsvBindByName
@JacksonXmlProperty(localName = "DESC")
private String description;
@CsvBindByName
@JacksonXmlProperty(localName = "DETAIL")
private String detail;
@CsvBindByName
@JacksonXmlProperty(localName = "AMOUNT")
private String amount;
}

要序列化完整列表的类:

@JacksonXmlRootElement(localName = "SPEND")
public class ListLoader {
@JacksonXmlElementWrapper(useWrapping = false)
@JacksonXmlProperty(localName = "RECORD")
private List<DataStructure> list;
public ListLoader(List<DataStructure> list){
this.list = list;
}
}

相关内容

  • 没有找到相关文章

最新更新