识别并替换给定文本文件中的选择性空间

我是sed及其功能的新手。我需要有选择地用"在文件中，文件的内容如下。"我不想替换里面的空间"但是所有其他空间都需要被替换。

文件内容

my data "this is my very first encounter with sed"  "valuable" - - "c l e a r"

使用的图案使用sed将空间替换为"-彭定康的//，/g'

实际输出

my,data,"this,is,my,very,first,encounter,with,sed",,"valuable",-,-,"c,l,e,a,r"

预期输出

my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

下面的sed脚本带有来自bash的注释字符串：

<<<'my data "this is my very first encounter with sed"  "valuable" - - "c l e a r"' sed -E '
# Split input with each character on its own line
s/./&n/g;
# Add a newline on the end to separate output from input
s/$/n/;
# Each line has one character
# Add a leading character that stores "state"
# There are two states available - in quoting or not in quoting
# The state character is space when we are not in quotes
# The state character is double quote when we are in quotes
s/^/ /;
# For each character in input
:again; {
# Substitute a space that is not in quotes for a comma
s/^  / ,/
# When quotes is encountered and we are not in quotes
/^ "/{
# Change state to quotes
s//""/
b removed_quotes
} ; {
# When quotes is encountered and we are in quotes
# then we are no longer in quotes
s/^""/ "/
} ; : removed_quotes
# Preserve state as the first character
# Add the parsed character to the output on the end
# Preserve the rest
s/^(.)(.)n(.*)/132/;
# If end of input was not reached, then parse another character.
/^.n/!b again;
};
# Remove the leading state character with the newline
s///;
'

输出：

my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

和oneliner，因为谁读到这些评论：

sed -E 's/./&n/g;s/$/n/;s/^/ /;:a;s/^  / ,/;/^ "/{s//""/;bq;};s/^""/ "/;:q;s/^(.)(.)n(.*)/132/;/^.n/!ba;s///'

我认为s命令替换字符串中的换行n是posix不需要的扩展。在解析时，可以使用另一个唯一字符而不是换行符来分隔输入。无论如何，我用GNU sed测试了它。

正如评论中所提到的，这更适合实际的CSV解析器，而不是试图使用正则表达式来拼凑一些东西，尤其是sed相当基本的正则表达式。

perl中使用有用的Text:：AutoCSV模块的一行代码(通过您的操作系统包管理器或最喜欢的CPAN客户端安装(：

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(sep_char=>" ", out_sep_char=>",")->write' < input.txt
my,data,"this is my very first encounter with sed",,valuable,-,-,"c l e a r"

使用GNU awk for FPAT:

$ awk -v FPAT='[^ ]*|"[^"]+"' -v OFS=',' '{$1=$1} 1' file
my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

您的输入是CSV，其中CCD_；字符"而不是传统的"；逗号"；其中有问题的字符是空白的，而您只是试图将其转换为逗号分隔的CSV。请参阅What'；使用awk高效解析CSV的最稳健方法是什么？有关以上操作以及使用awk解析CSV的更多信息。

awk 'BEGIN {RS=ORS="""} NR%2 {gsub(" ",",")} {print}' file

在开始时，将双引号设置为记录分隔符
对于奇数记录，即引号外的记录，请用逗号全局替换任何空格
打印每条记录

这可能对你有用(GNU sed(：

sed -E ':a;s/^((("[^"]*")*[^" ]*)*) /1,/;ta' file

替换，由零个或多个双引号字符串组成的组，后跟零个或更多非空格字符零个或更长时间，后跟空格，组后跟逗号，重复直到失败。

相关内容

最新更新

热门标签：