分割带重音的字符串和不带重音的查询



我想用一个不带重音的查询拆分一个带重音的字符串。

这是我现在的代码:

const sanitizer = (text: string): string => {
return text
.normalize("NFD")
.replace(/p{Diacritic}/gu, "")
.toLowerCase();
};
const splitter = (text: string, query: string): string[] => {
const regexWithQuery = new RegExp(`(${query})|(${sanitizer(query)})`, "gi");
return text.split(regexWithQuery).filter((value) => value);
};

下面是测试文件:

import { splitter } from "@/utils/arrayHelpers";
describe("arrayHelpers", () => {
describe("splitter", () => {
const cases = [
{
text: "pepe dominguez",
query: "pepe",
expectedArray: ["pepe", " dominguez"],
},
{
text: "pépé dominguez",
query: "pepe",
expectedArray: ["pépé", " dominguez"],
},
{
text: "pepe dominguez",
query: "pépé",
expectedArray: ["pepe", " dominguez"],
},
{
text: "pepe dominguez",
query: "pe",
expectedArray: ["pe", " pe", " dominguez"],
},
{
text: "pepe DOMINGUEZ",
query: "DOMINGUEZ",
expectedArray: ["pepe ", "DOMINGUEZ"],
},
];
it.each(cases)(
"should return an array of strings with 2 elements [pepe, dominguez]",
({ text, query, expectedArray }) => {
// When I call the splitter function
const textSplitted = splitter(text, query);
// Then I must have an array of two elements
expect(textSplitted).toStrictEqual(expectedArray);
}
);
});
});

问题在于第二种情况:

{
text: "pépé dominguez",
query: "pepe",
expectedArray: ["pépé", " dominguez"],
}

,因为经过消毒的查询pepe也是pepe,所以不在Pépé dominguez中。我不知道在这种情况下如何实现使splitter函数返回['pépé', 'dominguez']

我正在寻找原始文本的结果,而不是消毒文本

我想到的唯一选择是为您的字母保留一个可能选项的地图,然后动态构建查询:

// Get query with each letter being one of its options
const sanitizeQuery = (query) => {
const sanitizerMap = {
'e': ['é']
}
return query
.split('')
.map(l => 
sanitizerMap[l] !== undefined 
? `(?:${l}|${sanitizerMap[l].join('|')})` 
: l
)
.join('');
}
// Split text by a sanitzed query
const splitter = (text, query) => {
const regexWithQuery = new RegExp(`(${sanitizeQuery(query)})`, "gi");
return text.split(regexWithQuery).filter((value) => value);
};
// Test
const query = 'pepe';
console.log('Query Regex:', sanitizeQuery(query));
console.log('Output:', splitter('pépé dominguez', query));

您可以通过将字母的选项放在字符串中而不是数组中来优化这一点。

提示:regex中的?:意味着不会捕获结果。如果不使用,每一个匹配的字母都将出现在输出数组中。在这里阅读更多信息:什么是正则表达式中的非捕获组?

相关内容

  • 没有找到相关文章

最新更新