我有一组有重音ascii的数据。我想把重音转换成普通的英语字母。我用下面的代码实现:
import java.text.Normalizer;
import java.util.regex.Pattern;
public String deAccent(String str) {
String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("\p{InCombiningDiacriticalMarks}+");
return pattern.matcher(nfdNormalizedString).replaceAll("");
}
但是这个代码缺少的是排除字符,我不知道如何从转换中排除某些字符,例如,我想从单词d
不要使用规范化来删除重音!
例如,以下字母没有使用您的方法进行ascii化:
ł
đ
ħ
您可能还希望将连接符(如œ
)拆分为单独的字母(即oe
)。
试试这个:
private static final String TAB_00C0 = "" +
"AAAAAAACEEEEIIII" +
"DNOOOOO×OUUUÜYTs" + // <-- note an accented letter you wanted
// and preserved multiplication sign
"aaaaaaaceeeeiiii" +
"dnooooo÷ouuuüyty" + // <-- note an accented letter and preserved division sign
"AaAaAaCcCcCcCcDd" +
"DdEeEeEeEeEeGgGg" +
"GgGgHhHhIiIiIiIi" +
"IiJjJjKkkLlLlLlL" +
"lLlNnNnNnnNnOoOo" +
"OoOoRrRrRrSsSsSs" +
"SsTtTtTtUuUuUuUu" +
"UuUuWwYyYZzZzZzs";
public static String toPlain(String source) {
StringBuilder sb = new StringBuilder(source.length());
for (int i = 0; i < source.length(); i++) {
char c = source.charAt(i);
switch (c) {
case 'ß':
sb.append("ss");
break;
case 'Œ':
sb.append("OE");
break;
case 'œ':
sb.append("oe");
break;
// insert more ligatures you want to support
// or other letters you want to convert in a non-standard way here
// I recommend to take a look at: æ þ ð fl fi
default:
if (c >= 0xc0 && c <= 0x17f) {
c = TAB_00C0.charAt(c - 0xc0);
}
sb.append(c);
}
}
return sb.toString();
}