在PHP PCRE语法中，如何指定多代码点Unicode字符/ "emoji"？

代码：

var_dump(preg_replace('#x{1F634}#u', '', 'This is the sleeping emoji: 😴'));
var_dump(preg_replace('#x{1F1FB 1F1F3}#u', '', 'This is the Vietnam flag: 🇻🇳'));

预期输出：

string(28) "This is the sleeping emoji: "
string(33) "This is the Vietnam flag: "

实际输出：

string(28) "This is the sleeping emoji: "
string(34) "This is the Vietnam flag: 🇻🇳  "

分析：

单代码点表情符号已成功删除，但未检测到多代码点表情符号。

进行的研究：

请在上阅读以下内容：https://www.php.net/manual/en/regexp.reference.escape.php

；\x〃；，最多读取两个十六进制数字(字母可以是大写或小写(。在UTF-8模式中；\x｛…｝"；是允许的，其中大括号的内容是十六进制数字的字符串。它被解释为UTF-8字符，其代码号是给定的十六进制数。如果值大于127，原始十六进制转义序列\xhh将与两字节UTF-8字符匹配。

遗憾的是，它没有提到多代码点Unicode字符。

问题：

如何在PHP PCRE语法中指定多代码点表情符号/Unicode字符？

帮助说明：

它不是范围！我能够检测和移除范围。这是一个单个表情符号/Unicode字符，由多个"组成；代码点"；。这里有相当多的规定：https://www.unicode.org/Public/emoji/13.1/emoji-sequences.txt

你引用了一段话，其中说了类似x{...]"被解释为UTF-8字符"；。措辞有点奇怪，因为它是UTF-8中的Unicode代码点，而不是字符，但由于您需要两个代码点，因此还需要两个这样的序列：

var_dump(preg_replace('#x{1F1FB}x{1F1F3}#u', '', 'This is the Vietnam flag: 🇻🇳'));

相关内容

最新更新

热门标签：