如何在爪哇中读写非英语字符(特殊字符,如马拉地语,泰米尔语,印地语等)?



从Excel文件中读取非英语字符,假设读取马拉地语,然后将该语言写入XML文件。当我从Excel中读取这种马拉地语语言并在Java代码中进行检查时,它准确地显示了马拉地语,但是当我通过Java代码将其写入XML时,我得到了一些与这种马拉地语语言相对应的符号。所以请建议我如何处理这种情况。请找到随附的代码。

public void excelToXML(String path) {
FileWriter fostream;
PrintWriter out = null;
String strOutputPath = "C:\Temp\";
try {
File file = new File(path);
InputStream inputStream = new FileInputStream(file);
Workbook wb = WorkbookFactory.create(inputStream);
List<String> sheetNames = new ArrayList<String>();
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
sheetNames.add(wb.getSheetName(i));
}
fostream = new FileWriter(strOutputPath + "\" + "iTicker" + ".xml");
out = new PrintWriter(new BufferedWriter(fostream));
// out.println("<?xml version="1.0" encoding="UTF-8"?>");
out.println("<?xml version="1.0" encoding="UTF-8" standalone="yes"?>");
out.println("<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">");
for (String sheetName : sheetNames) {
if(sheetName.equals("Sheet3")){
System.out.println(sheetName);
break;
}

Sheet sheet = wb.getSheet(sheetName);
boolean firstRow = true;
ArrayList<String> myStringArray = new ArrayList<String>();
Iterator<Cell> cells = sheet.getRow(0).cellIterator();
while (cells.hasNext()) {
myStringArray.add(cells.next().toString());
}
for (Row row : sheet) {
if (firstRow == true) {
firstRow = false;
continue;
}
if (!sheetName.equals("Sheet1")) {
out.println("t<element>");
}
for (int i = 0; i < myStringArray.size(); i++) {
if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty()
&& row.getCell(i).toString().length() > 0) {
if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){
out.println(formatElement("tt", myStringArray.get(i), formatCell(row.getCell(i))));
} else{
long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString());
out.println(formatElement("tt", myStringArray.get(i), String.valueOf(ePochValue)));
}
} else {
blankValues.add(sheetName +":" + "column header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank.");
}
}
if (!sheetName.equals("Sheet1")) {
out.println("t</element>");
}
}
}
out.write("</root>");
out.flush();
out.close();
if(blankValues != null && blankValues.size() >0){
FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values.");
}
} catch (Exception e) {
new DTHException(e.getMessage());
e.printStackTrace();
}
}
private static String formatCell(Cell cell)
{
if (cell == null) {
return "";
}
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BLANK:
return "";
case Cell.CELL_TYPE_BOOLEAN:
return Boolean.toString(cell.getBooleanCellValue());
case Cell.CELL_TYPE_ERROR:
return "*error*";
case Cell.CELL_TYPE_NUMERIC:
return df.format(cell.getNumericCellValue());
case Cell.CELL_TYPE_STRING:
return cell.getStringCellValue();
default:
return "<unknown value>";
}
}
private static String formatElement(String prefix, String tag, String value) {
StringBuilder sb = new StringBuilder(prefix);
sb.append("<");
sb.append(tag);
if (value != null && value.length() > 0) {
sb.append(">");
sb.append(value);
sb.append("</");
sb.append(tag);
sb.append(">");
} else {
sb.append("/>");
}
return sb.toString();
}

在下面的行中,我在检查此row.getCell(i)值时获得了确切的马拉地语值,但是在编写此值后获得了不同的输出。

out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i)));

你的代码有两个大问题。

1)您显然正在使用Windows(路径C:\Temp),但是 - 正如Axel Richter在评论中已经指出的那样 - 您正在使用输出文件的默认编码。直接使用文件名创建FileWriter会为您提供平台的默认编码,即 Windows ANSI for Windows。不是您想要的,因为稍后您将使用 UTF-8 作为编码编写 XML 标头声明。

您永远不应该依赖平台的默认编码。始终通过OutputStreamWriter显式编码创建 PrintWriter

,并FileOutputStream如下所示:
PrintWriter writer = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("iTicker.xml"), StandardCharsets.UTF_8)));

2) 像您那样手动编写 XML 是一种不好的做法。如果你这样做,你应该注意特殊字符,如"<"、">"和"&"。始终建议为此使用库,它会自动进行转义。例如,Java标准库的一部分是接口XMLStreamWriter的实现。

下面是一个易于使用的示例:

import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
public class WriteXml {
public static void main(String[] args) {
try {
File outFile = new File("iTicker.xml");
// Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding.
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));
XMLStreamWriter xmlWriter = 
XMLOutputFactory.newInstance().createXMLStreamWriter(out);
xmlWriter.writeStartDocument("UTF-8", "1.0");
xmlWriter.writeCharacters("n");
xmlWriter.writeStartElement("root");
xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance");     
xmlWriter.writeCharacters("n  ");
xmlWriter.writeStartElement("element");
// Some special characters and (I hope) some Marathi letters
xmlWriter.writeCharacters("<>&": मराठी वर्णमाला"); 
xmlWriter.writeEndElement(); // element
xmlWriter.writeCharacters("n");
xmlWriter.writeEndElement(); // root
xmlWriter.writeEndDocument();
xmlWriter.close(); // should be better in a finally block
out.close(); // should be better handled automatically by try-with-resources
} catch(Exception e) {
e.printStackTrace();
}
}
}

这将创建以下 XML:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
<element>&lt;&gt;&amp;": मराठी वर्णमाला</element>
</root>

相关内容

  • 没有找到相关文章