一种正则表达式，用于生成具有单词边界的首字母缩略词，并删除单词前面的字符

Go Version

go version go1.16.7 linux/amd64

我正在做一个关于创建首字母缩略词的练习，我选择用正则表达式来做。

给我的一些测试用例如下:

input:    "Ruby on Rails",
expected: "ROR"

input:    "GNU Image Manipulation Program",
expected: "GIMP"
input:    "Complementary metal-oxide semiconductor",
expected: "CMOS"
input:    "Something - I made up from thin air",
expected: "SIMUFTA"
input:    "Halley's Comet",
expected: "HC"
input:    "The Road _Not_ Taken",
expected: "TRNT"

下面的代码能够通过很多简单的测试，其中如果第一个字母是大写的，那么提取该字母并从中提取首字母缩略词

Portable Network Graphics -> PNG

代码

// Package acronym creates an acronym based on Capitalized Letters
package acronym
import (
"regexp"
"strings"
)
// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
re := regexp.MustCompile(`b[A-Za-z]`)
abbreviation := strings.Join(re.FindAllString(s, -1), "")
return strings.ToUpper(abbreviation)
}

我唯一失败的测试是

=== RUN   TestAcronym
acronym_test.go:11: Acronym test [Halley's Comet], expected [HC], actual [HSC]
acronym_test.go:11: Acronym test [The Road _Not_ Taken], expected [TRNT], actual [TRT]
--- FAIL: TestAcronym (0.00s)

Regex101操场

Regex 101中指向Playground的链接

我无法弄清楚如何在`Halley's Comet`测试用例中只编译`HC`，并在`The Road _Not_ Taken`测试用例中获得`N`。
我必须保留小写字符`[a-z]`的原因之一是由于`Complementary metal-oxide semiconductor`的情况，也因为在某些测试用例

中的其他小写字符我实际上可以在regexp编译之前删除诸如`-`或`_`之类的字符，但我认为这不会使我的函数更通用(而不是通过测试)
我想知道如何删除字符`'`和`_`，以使缩略词功能更健壮?

您可以使用

// Abbreviate: creates an acronym for a full form string
func Abbreviate(s string) string {
var abbreviation = ""
re := regexp.MustCompile(`w'w|(?:_|b)([A-Za-z])`)
for _, match := range re.FindAllStringSubmatch(s, -1) {
abbreviation = abbreviation + match[1] 
}
return strings.ToUpper(abbreviation)
}

参见Go演示。细节:

w'w-字字符，'，字字符(为了避免在字字符之间匹配'，如果您有后续匹配的问题，请替换为b'w)
|-或
(?:_|b)-_或字边界
([A-Za-z])-第1组:一个ASCII字母(使用p{L}匹配任何Unicode字母)。

Go Version

Regex101操场

相关内容

最新更新

热门标签：