Regex用于合并多个规则



我正在考虑优化我的字符串操作代码,并在可能的情况下将所有replaceAll合并为一个模式

-

  • 除去-
  • 以外的所有特殊字符
  • 将空格替换为-
  • 将连续的-压缩为一个-
  • 删除前尾-

我的代码-

public static String slugifyTitle(String value) {
String slugifiedVal = null;
if (StringUtils.isNotEmpty(value))
slugifiedVal = value
.replaceAll("[ ](?=[ ])|[^-A-Za-z0-9 ]+", "") // strips all special chars except -
.replaceAll("\s+", "-") // converts spaces to -
.replaceAll("--+", "-"); // replaces consecutive -'s with just one -
slugifiedVal = StringUtils.stripStart(slugifiedVal, "-"); // strips leading -
slugifiedVal = StringUtils.stripEnd(slugifiedVal, "-"); // strips trailing -
return slugifiedVal;
}

完成了工作,但显然看起来很劣质。

我的测试断言-

Heading with symbols *~!@#$%^&()_+-=[]{};',.<>?/ ==> heading-with-symbols

Heading with an asterisk* ==> heading-with-an-asterisk

Custom-id-&-stuff ==> custom-id-stuff

--Custom-id-&-stuff-- ==> custom-id-stuff

免责声明我不认为用正则表达式解决这个问题是错误的,也不认为这是一个客观上更好的方法。我只是提出一种可供思考的替代方法。

我倾向于反对用regex方法解决那些你不得不如何用regex解决的问题,因为这意味着你将来要努力维护那个解决方案。正则表达式有一个不透明的地方"只做这个"。很明显,当你知道要这样做的时候。

一些通常用正则表达式解决的问题,比如这个,可以用命令式代码来解决。它往往更冗长,但它使用简单、明显的代码结构;它更容易调试;而且可以更快,因为它不涉及整个"机器"。


static String slugifyTitle(String value) {
boolean appendHyphen = false;
StringBuilder sb = new StringBuilder(value.length());
// Go through value one character at a time...
for (int i = 0; i < value.length(); i++) {
char c = value.charAt(i);
if (isAppendable(c)) {
// We have found a character we want to include in the string.
if (appendHyphen) {
// We previously found character(s) that we want to append a single
// hyphen for.
sb.append('-');
appendHyphen = false;
}
sb.append(c);
} else if (requiresHyphen(c)) {
// We want to replace hyphens or spaces with a single hyphen.
// Only append a hyphen if it's not going to be the first thing in the output.
// Doesn't matter if this is set for trailing hyphen/whitespace,
// since we then never hit the "isAppendable" condition.
appendHyphen = sb.length() > 0;
} else {
// Other characters are simply ignored.
}
}
// You can lowercase when appending the character, but `Character.toLowerCase()`
// recommends using `String.toLowerCase` instead.
return sb.toString().toLowerCase(Locale.ROOT);
}
// Some predicate on characters you want to include in the output.
static boolean isAppendable(char c) {
return (c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9');
}
// Some predicate on characters you want to replace with a single '-'.
static boolean requiresHyphen(char c) {
return c == '-' || Character.isWhitespace(c);
}

(为了在这个答案中解释它,这段代码被过度注释了。去掉注释和不必要的东西,比如else,它实际上不是超级复杂的)。

考虑以下正则表达式部分:

  • -:[p{S}p{P}&&[^-]]+(字符类减法)以外的任何特殊字符
  • 任何一个或多个空格或连字符:[^-s]+(这将用于替换为单个-)
  • 你仍然需要删除前后连字符,这将是一个单独的后处理步骤。如果您愿意,可以使用^-+|-+$正则表达式。

因此,您只能将其减少到三次.replaceAll调用,以保持代码的准确性和可读性:

public static String slugifyTitle(String value) {
String slugifiedVal = null;
if (value != null && !value.trim().isEmpty())
slugifiedVal = value.toLowerCase()
.replaceAll("[\p{S}\p{P}&&[^-]]+", "") // strips all special chars except -
.replaceAll("[\s-]+", "-") // converts spaces/hyphens to -
.replaceAll("^-+|-+$", ""); // remove trailing/leading hyphens
return slugifiedVal;
}

参见Java演示:

List<String> strs = Arrays.asList("Heading with symbols *~!@#$%^&()_+-=[]{};',.<>?/",
"Heading with an asterisk*",
"Custom-id-&-stuff",
"--Custom-id-&-stuff--");
for (String str : strs)
System.out.println(""" + str + "" => " + slugifyTitle(str));
}

输出:

"Heading with symbols *~!@#$%^&()_+-=[]{};',.<>?/" => heading-with-symbols
"Heading with an asterisk*" => heading-with-an-asterisk
"Custom-id-&-stuff" => custom-id-stuff
"--Custom-id-&-stuff--" => custom-id-stuff

注意:如果您的字符串可以包含任何Unicode空格,将"[\s-]+"替换为"(?U)[\s-]+"

最新更新