"Illegal byte sequence"在 bash 中，我如何找到它？

当我尝试在osx（bash）上对文本文件进行排序时，我收到以下错误：

排序：字符串比较失败：非法字节序列sort：设置 LC_ALL='C' 以解决此问题。sort：比较的字符串是'\363\272\331DR\371'和'201310'。

网络上充满了按照错误建议设置LC_ALL的建议。但是，我想找到这个非法字节序列的位置。

我正在解析来自许多第三方的数据，在将数据写入单个文件之前进行各种规范化，该文件最终上传到数据库。此数据中不应有花哨的字符，此错误告诉我过程中某处存在损坏。但是，我找不到它！

我

试图将文件"拆分"成越来越小的部分，这样我就可以直观地找到角色，但我做不到。我无法 grep 它，在 vim 或崇高的文本中找到它。

任何想法我如何找到这种腐败的定位？

这个对你有帮助吗？

grep -n $(echo -e '363272331DR371') filename

要自动执行操作，您可以考虑修改源代码

coreutils-8.23/lib/xmemcoll.c

static void
collate_error (int collation_errno,
               char const *s1, size_t s1len,
               char const *s2, size_t s2len)
{
  error (0, collation_errno, _("string comparison failed"));
  error (0, 0, _("Set LC_ALL='C' to work around the problem."));
  error (exit_failure, 0,
         _("The strings compared were %s and %s."),
         quotearg_n_style_mem (0, locale_quoting_style, s1, s1len),
         quotearg_n_style_mem (1, locale_quoting_style, s2, s2len));
}

至少通过这种方式，您可以轻松地编写一个垃圾箱文件来放入所有这些行以供检查。

相关内容

最新更新

热门标签：