如何在python doctest结果字符串中包含特殊字符(tab、换行符)

给定以下python脚本：

# dedupe.py
import re
def dedupe_whitespace(s,spacechars='t '):
    """Merge repeated whitespace characters.
    Example:
    >>> dedupe_whitespace(r"GreenttGround")  # doctest: +REPORT_NDIFF
    'GreentGround'
    """
    for w in spacechars:
        s = re.sub(r"("+w+"+)", w, s)
    return s

该函数在python解释器中按预期工作：

$ python
>>> import dedupe
>>> dedupe.dedupe_whitespace('PurplettHaze')
'PurpletHaze'
>>> print dedupe.dedupe_whitespace('BluettSky')
Blue    Sky

然而，doctest示例失败了，因为在与结果字符串进行比较之前，选项卡字符被转换为空格：

>>> import doctest, dedupe
>>> doctest.testmod(dedupe)

给出

Failed example:
    dedupe_whitespace(r"Green           Ground")  #doctest: +REPORT_NDIFF
Differences (ndiff with -expected +actual):
    - 'Green  Ground'
    ?       -
    + 'Green Ground'

如何在doctest-heredoc字符串中编码制表符，以便适当地执行测试结果比较？

我已经使用文档字符串的文字字符串表示法实现了这一点：

def join_with_tab(iterable):
    r"""
    >>> join_with_tab(['1', '2'])
    '1t2'
    """
    return 't'.join(iterable)
if __name__ == "__main__":
    import doctest
    doctest.testmod()

这是原始的heredoc字符串表示法（r"""）：

# filename: dedupe.py
import re,doctest
def dedupe_whitespace(s,spacechars='t '):
    r"""Merge repeated whitespace characters.
    Example:
    >>> dedupe_whitespace('BlackttGround')  #doctest: +REPORT_NDIFF
    'BlacktGround'
    """
    for w in spacechars:
        s = re.sub(r"("+w+"+)", w, s)
    return s
if __name__ == "__main__":
    doctest.testmod()

TL；DR:转义反斜杠，即在未修改的字符串中使用\n或\t，而不是n或t；

你可能不想让你的文档字符串是原始的，因为那样你就不能使用任何Python字符串转义，包括那些你可能想要的。

对于支持使用普通转义的方法，只需对反斜杠字符转义中的反斜杠进行转义，这样Python对其进行解释后，就会留下一个文字反斜杠，后面跟着doctest可以解析的字符。

这基本上是YatharhROCK的答案，但有点明确。您可以使用原始字符串或双转义。但为什么呢？

您需要字符串文字来包含有效的Python代码，当对其进行解释时，该代码就是您要运行/测试的代码。这两者都有效：

#!/usr/bin/env python
def split_raw(val, sep='n'):
  r"""Split a string on newlines (by default).
  >>> split_raw('alphanbetangamma')
  ['alpha', 'beta', 'gamma']
  """
  return val.split(sep)

def split_esc(val, sep='n'):
  """Split a string on newlines (by default).
  >>> split_esc('alpha\nbeta\ngamma')
  ['alpha', 'beta', 'gamma']
  """
  return val.split(sep)
import doctest
doctest.testmod()

使用原始字符串的效果和双转义（转义斜杠）的效果都会在字符串中留下两个字符，斜杠和n。这段代码被传递给Python解释器，它将"slash then"表示字符串文本中的"换行符"。

用你喜欢的。

您必须设置NORMALIZE_WHITESPACE~~或者，或者，捕获输出并将其与预期值进行比较：~~

def dedupe_whitespace(s,spacechars='t '):
    """Merge repeated whitespace characters.
    Example:
    >>> output = dedupe_whitespace(r"BlackttGround")  #doctest: +REPORT_NDIFF
    >>> output == 'BlacktGround'
    True
    """

来自doctest文档部分如何识别Docstring示例？：

使用8列制表符将所有硬制表符扩展为空格停止。测试代码生成的输出中的选项卡不会被修改。由于示例输出中的任何硬选项卡都被展开，这意味着如果代码输出包括硬选项卡，则doctest如果NORMALIZE_WHITESPACE选项或指令有效。或者，测试可以重写以捕获输出并将其与预期值进行比较测试的一部分。源中选项卡的处理已完成通过反复试验，并已被证明是最不容易出错的处理它们的方式。可以使用不同的算法通过编写自定义CCD_ 8类来处理选项卡。

编辑：我的错误是，我对文档的理解是相反的。在传递给dedupe_whitespace的字符串参数和下一行进行比较的字符串文字处，制表符被扩展到8个空格，因此output包含：

"Black Ground"

并与进行比较

"Black        Ground"

如果不编写自己的DocTestParser或测试重复数据消除后的空格而不是制表符，我找不到克服这一限制的方法。

我通过转义预期字符串中的制表符使其工作

>>> function_that_returns_tabbed_text()
'\t\t\tsometext\t\t'

而不是

>>> function_that_returns_tabbed_text()
tttsometexttt

相关内容

最新更新

热门标签：