文本操纵器:字符串位置移动



任务是构建文本操纵器:模拟一组文本操作命令的程序。给定一段输入文本和一串命令,输出突变的输入文本和光标位置。

从简单开始:

命令

h: move cursor one character to the left
l: move cursor one character to the right
r<c>: replace character under cursor with <c>

重复命令

# All commands can be repeated N times by prefixing them with a number.
# 
# [N]h: move cursor N characters to the left
# [N]l: move cursor N characters to the right
# [N]r<c>: replace N characters, starting from the cursor, with <c> and move the cursor

例子

# We'll use Hello World as our input text for all cases:
# 
#  Input: hhlhllhlhhll
# Output: Hello World
#           _
#           2
# 
#  Input: rhllllllrw
# Output: hello world
#               _
#               6
# 
#  Input: rh6l9l4hrw
# Output: hello world
#               _
#               6
# 
#  Input: 9lrL7h2rL
# Output: HeLLo WorLd
#            _
#            3
# 
#  Input: 999999999999999999999999999lr0
# Output: Hello Worl0
#                   _
#                  10
# 
#  Input: 999rsom
# Output: sssssssssss
#                   _
#                  10

我编写了以下代码,但出现错误:

class Editor():
def __init__(self, text):
self.text = text
self.pos = 0
def f(self, step):
self.pos += int(step)
def b(self, step):
self.pos -= int(step)
def r(self, char):
s = list(self.text)
s[self.pos] = char
self.text = ''.join(s)
def run(self, command):
command = list(command)
# if command not in ('F','B', 'R'):
#
while command:
operation = command.pop(0).lower()
if operation not in ('f','b','r'):
raise ValueError('command not recognized.')
method = getattr(self, operation)
arg = command.pop(0)
method(arg)
def __str__(self):
return self.text
# Normal run
text = 'abcdefghijklmn'
command = 'F2B1F5Rw'
ed = Editor(text)
ed.run(command)
print(ed)

我在代码中使用了"F"和"B"而不是"h"和"l",但问题是我缺少一个允许我定义可选"N"的部分。我的代码仅在操作后定义数字时才有效。 如何修复上面的代码以满足所有要求?

这个问题的关键是弄清楚如何解析命令字符串。在您的描述中,命令字符串包含一个可选数字,后跟以下三种可能性之一:

  • h
  • l
  • r,后跟一个字符

解析它的正则表达式将是(在线尝试):

(d*)(h|l|r.)
Explanation:
(d*)            Capture zero or more digits, 
(h|l|r.)    Capture either an h, or an l, or an r followed by any character 

re.findall()与此正则表达式结合使用,您可以获取匹配项列表,其中每个匹配项都是包含捕获组的tuple。例如,"rh6l9l4hrw"给出结果

[('', 'rh'), ('6', 'l'), ('9', 'l'), ('4', 'h'), ('', 'rw')]

因此,元组的第一个元素是表示N的字符串(如果不存在,则为空字符串),元组的第二个元素是命令。如果命令r,它将在其后包含替换字符。现在我们需要做的就是遍历此列表,并应用正确的命令。

我做了一些更改:

  1. 通过具有处理正确边界检查的 setter 的属性访问self.pos
  2. 创建对象时,将输入文本分解到列表中,因为无法像使用列表那样就地修改字符串。__str__()将列表联接回字符串。
  3. 通过只读属性访问self.text,该属性将self.__text列表联接到字符串中。
class Editor():
def __init__(self, text):
self.__text = [char for char in text]
self.__pos = 0

@property
def text(self):
return "".join(self.__text)

@property
def pos(self):
return self.__pos

@pos.setter
def pos(self, value):
self.__pos = max(0, min(len(self.text)-1, value))
def l(self, step):
self.pos = self.pos + step
def h(self, step):
self.pos = self.pos - step
def r(self, char, count=1):
# If count causes the cursor to overshoot the text, 
# modify count
count = min(count, len(self.__text) - self.pos)
self.__text[self.pos:self.pos+count] = char * count
self.pos = self.pos + count - 1 # Set position to last replaced character
def run(self, command):
commands = re.findall(r"(d*)(h|l|r.)", command)

for cmd in commands:
self.validate(cmd)
count = int(cmd[0] or "1") # If cmd[0] is blank, use count = 1
if cmd[1] == "h":
self.h(count)
elif cmd[1] == "l":
self.l(count)
elif cmd[1][0] == "r":
self.r(cmd[1][1], count)
def validate(self, cmd):
cmd_s = ''.join(cmd)
if cmd[0] and not cmd[0].isnumeric():
raise ValueError(f"Invalid numeric input {cmd[0]} for command {cmd_s}")
elif cmd[1][0] not in "hlr":
raise ValueError(f"Invalid command {cmd_s}: Must be either h or l or r")
elif cmd[1] == 'r' and len(cmd) == 1:
raise ValueError(f"Invalid command {cmd_s}: r command needs an argument")
def __str__(self):
return self.text

使用给定的输入运行以下命令:

commands = ["hhlhllhlhhll", "rhllllllrw", "rh6l9l4hrw", "9lrL7h2rL", "999999999999999999999999999lr0", "999rsom"]
for cmd in commands:
e = Editor("Hello World")
e.run(cmd)
uline = "        " + " " * e.pos + "^"
cline = "Cursor: " + " " * e.pos + str(e.pos)
print(f"Input: {cmd}nOutput: {str(e)}n{uline}n{cline}n")
Input: hhlhllhlhhll
Output: Hello World
^
Cursor:   2
Input: rhllllllrw
Output: hello world
^
Cursor:       6
Input: rh6l9l4hrw
Output: hello world
^
Cursor:       6
Input: 9lrL7h2rL
Output: HeLLo WorLd
^
Cursor:    3
Input: 999999999999999999999999999lr0
Output: Hello Worl0
^
Cursor:           10
Input: 999rsom
Output: sssssssssss
^
Cursor:           10

现在,如果你想在没有正则表达式的情况下做同样的事情,你只需要找到一种方法将输入命令字符串解析为那种元组列表,你可以使用与以前相同的逻辑来做实际的替换。

在这里,我将通过编写一个函数来做到这一点,该函数接受一个字符串,并返回其中所有命令的迭代器。生成的每个元素都将是一个元组,看起来像re.findall()返回的列表的一个元素。这将允许我们简单地将调用re.findall()替换为我们的自定义解析器:

def iter_command(self, command: str):
cmd = [[], []]
# The command is made of two segments: 
# 1. The number part
# 2. The letters "h|l|r." part of the regex
seg = 0 # Start with the first segment
for cpos, char in enumerate(command):
if seg == 0:
if "0" <= char <= "9":
# If the character is a number, append it to the first segment
cmd[seg].append(char)
elif char in "hlr":
# Else, if the character is h or l or r, move on to the next segment
seg = 1

if seg == 1:
if not cmd[seg] and char in "hlr":
# If this segment is empty and the character is h|l|r
cmd[seg] = [char] 
if char != "r":
# Convert our list of lists of characters to a tuple of strings and yield it
yield tuple(''.join(l) for l in cmd)
# Then reset cmd and seg to process the next command
cmd = [[], []]
seg = 0
else: # char == r
pass # So do one more iteration
elif cmd[seg] and cmd[seg][-1] == "r": # Command is r, so listening for any character
cmd[seg].append(char)
# Same yield tasks as before
yield tuple(''.join(l) for l in cmd)
cmd = [[], []]
seg = 0
else: # This is a character we don't care about
# So do nothing with it
if any(cmd):
yield tuple(''.join(l) for l in cmd)
cmd = [[], []]
seg = 0

现在,让我们针对之前的正则表达式进行测试:

commands = ["hhlhllhlhhll", "rhllllllrw", "rh6l9l4hrw", "9lrL7h2rL", "999999999999999999999999999lr0", "999rsom"]
for cmd in commands:
e = Editor("Hello World")
commands_custom = list(e.iter_command(cmd))
commands_regex = re.findall(r"(d*)(h|l|r.)", cmd)

print(commands_custom)
print(commands_regex)
print(cmd)
print(all(a == b for a, b in zip(commands_custom, commands_regex)))
print("")
[('', 'h'), ('', 'h'), ('', 'l'), ('', 'h'), ('', 'l'), ('', 'l'), ('', 'h'), ('', 'l'), ('', 'h'), ('', 'h'), ('', 'l'), ('', 'l')]
[('', 'h'), ('', 'h'), ('', 'l'), ('', 'h'), ('', 'l'), ('', 'l'), ('', 'h'), ('', 'l'), ('', 'h'), ('', 'h'), ('', 'l'), ('', 'l')]
hhlhllhlhhll
True
[('', 'rh'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'rw')]
[('', 'rh'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'l'), ('', 'rw')]
rhllllllrw
True
[('', 'rh'), ('6', 'l'), ('9', 'l'), ('4', 'h'), ('', 'rw')]
[('', 'rh'), ('6', 'l'), ('9', 'l'), ('4', 'h'), ('', 'rw')]
rh6l9l4hrw
True
[('9', 'l'), ('', 'rL'), ('7', 'h'), ('2', 'rL')]
[('9', 'l'), ('', 'rL'), ('7', 'h'), ('2', 'rL')]
9lrL7h2rL
True
[('999999999999999999999999999', 'l'), ('', 'r0')]
[('999999999999999999999999999', 'l'), ('', 'r0')]
999999999999999999999999999lr0
True
[('999', 'rs')]
[('999', 'rs')]
999rsom
True

而且,由于它们给出了相同的结果,我们只需要替换对re.findall()的调用:

def run(self, command):
-        commands = re.findall(r"(d*)(h|l|r.)", command)
+        commands = self.iter_command(command)
for cmd in commands:

@paddy给了你一个很好的建议,但看看你需要解析的字符串,在我看来,正则表达式很容易完成这项工作。对于解析后的部分,命令模式非常适合。毕竟,您有一个必须对初始字符串执行的操作(命令)列表。

在您的情况下,我认为使用此模式主要带来 3 个优势:

  • 每个Command表示应用于初始字符串的操作。这也意味着,例如,如果要为一系列操作添加快捷方式,则最终Command的数量保持不变,您只需调整解析步骤。另一个好处是您可以拥有命令的历史记录,并且通常设计更加灵活。

  • 所有Command共享一个公共接口:一个方法execute(),如有必要,一个方法unexecute()撤消execute()方法应用的更改。

  • Command将操作执行与解析问题分离。


至于实现,首先定义Commands,除了对接收方方法的调用外,不包含任何业务逻辑。

from __future__ import annotations
import functools
import re
import abc
from typing import Iterable
class ICommand(abc.ABC):
@abc.abstractmethod
def __init__(self, target: TextManipulator):
self._target = target
@abc.abstractmethod
def execute(self):
pass
class MoveCursorLeftCommand(ICommand):
def __init__(self, target: TextManipulator, counter):
super().__init__(target)
self._counter = counter
def execute(self):
self._target.move_cursor_left(self._counter)
class MoveCursorRightCommand(ICommand):
def __init__(self, target: TextManipulator, counter):
super().__init__(target)
self._counter = counter
def execute(self):
self._target.move_cursor_right(self._counter)
class ReplaceCommand(ICommand):
def __init__(self, target: TextManipulator, counter, replacement):
super().__init__(target)
self._replacement = replacement
self._counter = counter
def execute(self):
self._target.replace_char(self._counter, self._replacement)

然后您有命令的接收器,它是TextManipulator,包含更改文本和光标位置的方法。

class TextManipulator:
"""
>>> def apply_commands(s, commands_str): 
...     return TextManipulator(s).run_commands(CommandParser.parse(commands_str))
>>> apply_commands('Hello World', 'hhlhllhlhhll')
('Hello World', 2)
>>> apply_commands('Hello World', 'rhllllllrw')
('hello world', 6)
>>> apply_commands('Hello World', 'rh6l9l4hrw')
('hello world', 6)
>>> apply_commands('Hello World', '9lrL7h2rL')
('HeLLo WorLd', 3)
>>> apply_commands('Hello World', '999999999999999999999999999lr0')
('Hello Worl0', 10)
>>> apply_commands('Hello World', '999rsom')
Traceback (most recent call last):
ValueError: command 'o' not recognized.
>>> apply_commands('Hello World', '7l5r1')
('Hello W1111', 10)
>>> apply_commands('Hello World', '7l4r1')
('Hello W1111', 10)
>>> apply_commands('Hello World', '7l3r1')
('Hello W111d', 9)
"""
def __init__(self, text):
self._text = text
self._cursor_pos = 0
def replace_char(self, counter, replacement):
assert len(replacement) == 1
assert counter >= 0
self._text = self._text[0:self._cursor_pos] + 
replacement * min(counter, len(self._text) - self._cursor_pos) + 
self._text[self._cursor_pos + counter:]
self.move_cursor_right(counter - 1)
def move_cursor_left(self, counter):
assert counter >= 0
self._cursor_pos = max(0, self._cursor_pos - counter)
def move_cursor_right(self, counter):
assert counter >= 0
self._cursor_pos = min(len(self._text) - 1, self._cursor_pos + counter)
def run_commands(self, commands: Iterable[ICommand]):
for cmd in map(lambda partial_cmd: partial_cmd(target=self), commands):
cmd.execute()
return (self._text, self._cursor_pos)

除了接受部分命令迭代的run_commands方法之外,没有什么很难解释的。这些部分命令是在没有接收方对象的情况下启动的命令,其类型应为TextManipulator。你为什么要这样做?这是一种将解析与命令执行分离的可能方法。我决定用functools.partial来做,但你还有其他有效的选择。


最终,解析部分:

class CommandParser:
@staticmethod
def parse(commands_str: str):
def invalid_command(match: re.Match):
raise ValueError(f"command '{match.group(2)}' not recognized.")
get_counter_from_match = lambda m: int(m.group(1) or 1)
commands_map = { 
'h': lambda match: functools.partial(MoveCursorLeftCommand, 
counter=get_counter_from_match(match)), 
'l': lambda match: functools.partial(MoveCursorRightCommand, 
counter=get_counter_from_match(match)), 
'r': lambda match: functools.partial(ReplaceCommand, 
counter=get_counter_from_match(match), replacement=match.group(3))
}
parsed_commands_iter = re.finditer(r'(d*)(h|l|r(w)|.)', commands_str)
commands = map(lambda match: 
commands_map.get(match.group(2)[0], invalid_command)(match), parsed_commands_iter)

return commands
if __name__ == '__main__':
import doctest
doctest.testmod()

正如我在开头所说,在您的情况下,可以使用正则表达式进行解析,并且命令创建基于每个匹配项的第二个捕获组的第一个字母。原因是对于字符替换,第二个捕获组也包括要替换的字符。访问commands_mapmatch.group(2)[0]作为密钥并返回部分Command。如果在映射中找不到该操作,则会引发ValueError异常。每个Command的参数都是从re.Match对象推断出来的。


只需将所有这些代码片段放在一起,您就有一个有效的解决方案(以及doctest执行的文档字符串提供的一些测试)。

在某些情况下,这可能是一个过于复杂的设计,所以我并不是说这是正确的方法(例如,如果您正在编写一个简单的工具,则可能不是)。您可以避免Command的部分,而只采用解析解决方案,但我发现这是该模式的一个有趣(替代)应用程序。

最新更新