在 Lua 5.1 中将可重复字符串匹配为"whole word"

我的环境：

路亚 5.1
绝对不能使用具有本机组件(如 C .so/.dll)的库
我可以运行任何任意的纯 Lua 5.1 代码，但我无法访问os和其他几个允许访问本机文件系统、shell 命令或类似内容的包，因此所有功能都必须在 Lua 本身中实现(仅)。
我已经设法拉入了LuLpeg。我可能会引入其他纯Lua库。

我需要编写一个函数，如果输入字符串将任意字母和数字序列匹配为重复一次或多次的整个单词，并且可能在整个匹配子字符串的开头或结尾有标点符号，则返回true。我使用"整个单词"与PCRE单词边界b的含义相同。

为了证明这个想法，这里有一个使用LuLpegre模块的错误尝试;它似乎适用于消极的前瞻，但不适用于消极的后视：

function containsRepeatingWholeWord(input, word)
return re.match(input:gsub('[%a%p]+', ' %1 '), '%s*[^%s]^0{"' .. word .. '"}+[^%s]^0%s*') ~= nil
end

下面是示例字符串和预期的返回值(引号是语法上的，就像在Lua解释器中键入一样，而不是字符串的文字部分;这样做是为了使尾随/前导空格明显)：

输入：" one !tvtvtv! two"，字：tv，返回值：true
输入："I'd"，单词：d，返回值：false
输入："tv"，单词：tv，返回值：true
输入：" tvtv! "，单词：tv，返回值：true
输入：" epon "，单词：nope，返回值：false
输入：" eponnope "，单词：nope，返回值：false
输入："atv"，单词：tv，返回值：false

如果我有一个完整的 PCRE 正则表达式库，我可以快速完成此操作，但我没有，因为我无法链接到 C，而且我没有找到任何 PCRE 或类似内容的纯 Lua 实现。

我不确定 LPEG 是否足够灵活(直接使用 LPEG 或通过其re模块)来做我想做的事，但我很确定内置的 Lua 函数不能做我想做的事，因为它不能处理重复的字符序列。(tv)+不适用于Lua的内置string:match函数和类似功能。

我一直在寻找有趣的资源，试图弄清楚如何做到这一点，但无济于事：

http://www.inf.puc-rio.br/~roberto/lpeg/re.html
http://www.lua.org/manual/5.2/manual.html#6.4.1
http://lua-users.org/wiki/FrontierPattern(不幸的是，我的口译器不支持)
http://lua-users.org/wiki/PatternsTutorial
http://www.gammon.com.au/lpeg
http://lua-users.org/wiki/StringRecipes
如何检查单词是否在Lua中的字符串中显示为整个单词

我认为该模式无法可靠地工作，因为%s*[^%s]^0部分匹配一系列可选的空格字符，后跟非空格字符，然后它尝试匹配重复的单词并失败。之后，它不会在字符串中向后或向前移动，并尝试在另一个位置匹配重复的单词。LPeg 和re的语义与大多数正则表达式引擎的语义非常不同，即使对于看起来相似的东西也是如此。

这是一个基于re的版本。该模式具有单个捕获(重复的单词)，因此如果找到重新重复的单词，匹配将返回字符串而不是数字。

function f(str, word)
local patt = re.compile([[
match_global <- repeated / ( [%s%p] repeated / . )+
repeated <- { %word+ } (&[%s%p] / !.) ]],
{ word = word })
return type(patt:match(str)) == 'string'
end

这有点复杂，因为香草re没有办法生成lpeg.B模式。

这是使用lpeg.B的lpeg版本。LuLPeg 也在这里工作。

local lpeg = require 'lpeg'
lpeg.locale(lpeg)
local function is_at_beginning(_, pos)
return pos == 1
end
function find_reduplicated_word(str, word)
local type, _ENV = type, math
local B, C, Cmt, P, V = lpeg.B, lpeg.C, lpeg.Cmt, lpeg.P, lpeg.V
local non_word = lpeg.space + lpeg.punct
local patt = P {
(V 'repeated' + 1)^1,
repeated = (B(non_word) + Cmt(true, is_at_beginning))
* C(P(word)^1)
* #(non_word + P(-1))
}
return type(patt:match(str)) == 'string'
end
for _, test in ipairs {
{ 'tvtv', true },
{ ' tvtv', true },
{ ' !tv', true },
{ 'atv', false },
{ 'tva', false },
{ 'gun tv', true },
{ '!tv', true },
} do
local str, expected = table.unpack(test)
local result = find_reduplicated_word(str, 'tv')
if result ~= expected then
print(result)
print(('"%s" should%s match but did%s')
:format(str, expected and "" or "n't", expected and "n't" or ""))
end
end

Lua模式足够强大。
这里不需要LPEG。

这是你的功能

function f(input, word)
return (" "..input:gsub(word:gsub("%%", "%%%%"), "").." "):find"%s%p*%z+%p*%s" ~= nil
end

这是对函数的测试

for _, t in ipairs{
{input = " one !tvtvtv! two", word = "tv", return_value = true},
{input = "I'd", word = "d", return_value = false},
{input = "tv", word = "tv", return_value = true},
{input = "   tvtv!  ", word = "tv", return_value = true},
{input = " epon ", word = "nope", return_value = false},
{input = " eponnope ", word = "nope", return_value = false},
{input = "atv", word = "tv", return_value = false},
} do
assert(f(t.input, t.word) == t.return_value)
end

相关内容

最新更新

热门标签：