将转义的 Unicode 替换为 ELISP



通过在emacs中调用谷歌词典api, http://www.google.com/dictionary/json?callback=cb&q=word&sl=en&tl=en&restrict=pr%%2Cde&client=te我可以得到如下回复

"entries": [{
    "type": "example",
    "terms": [{
        "type": "text",
        "text": "his grandfatherx27s x3cemx3ewordsx3c/emx3e had been meant kindly",
        "language": "en"
    }]
}]

如您所见,"文本"中有转义的 unicode。我想在如下所示的函数中转换它们。

(defun unescape-string (string)
    "Return unescape unicode string"
    ...
)
(unescape-string "his grandfatherx27s x3cemx3ewordsx3c/emx3e")
=> "his grandfathers's <em>words</em>"
(insert #x27)'
(insert #x27)'
(insert #x3c)<
(insert #x3e)>

这是我尝试过的

  • replace-regexp-in-string
  • 自定义替换,如 http://www.emacswiki.org/emacs/ElispCookbook#toc33

但是,我想我不知道如何将"\x123"替换为相应的 unicode 到缓冲区或字符串中。

提前致谢

似乎是

最简单的方法:

(read (princ ""his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e had been meant kindly""))
;; "his grandfather's ώm>words</em> had been meant kindly"

同样有趣的是,Emacs 解析x3ce而不是x3c。我不确定这是错误还是预期行为。我一直认为不应该在x之后阅读超过两个字符......

如果你仍然想使用read + princ组合,你需要放一个反斜杠来防止 Emacs 解析更多字符,如下所示:x3ce 。或者这里有一些我可以快速想到的东西:

(defun replace-c-escape-codes (input)
  (replace-regexp-in-string 
   "\\x[[:xdigit:]][[:xdigit:]]"
   (lambda (match)
     (make-string 1 (string-to-number (substring match 2) 16)))
   input))
(replace-c-escape-codes "his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e")
"his grandfather's <em>words</em>"

最新更新