Java unicode 正则表达式与德语字符不匹配

这个问题是基于这个问题的。

我正在使用P{M}p{M}*来匹配所有字母（来自德语和法语）。

我选择这个正则表达式是为了避免定义每个 unicode 字符，例如： ^[a-zA-Z[\u00c0-\u01ff]]+[\']?(([-]?[a-zA-Z[\u00c0-\u01ff]]*[\s]?)|([\s]?[a-zA-Z[\u00c0-\u01ff]]*[-]?)){1,2}[a-zA-Z[\u00c0-\u01ff]]+$

但是，尽管使用了上一个问题中定义的 unicode 格式，但正则表达式不匹配ß或è等字符。

我正在使用 JDK 6。

我错过了什么。谢谢！

对"任何字母"使用 posix 字符类p{L}：

System.out.println("abcßè".matches("\p{L}+")); // true

使用 Java 6 这段代码

 public static void main(String[] args) {
       String str = "hello ß you";
       Pattern p = Pattern.compile("(:?\P{M}\p{M}*)+");
       Matcher matcher = p.matcher(str);
       System.out.println("replaced: '" + matcher.replaceAll("") + "'");
}

返回：替换："

"ß"匹配

相关内容

最新更新

热门标签：