我是sed及其功能的新手。我需要有选择地用"在文件中,文件的内容如下。"我不想替换里面的空间"但是所有其他空间都需要被替换。
文件内容
my data "this is my very first encounter with sed" "valuable" - - "c l e a r"
使用的图案使用sed将空间替换为"-彭定康的//,/g'
实际输出
my,data,"this,is,my,very,first,encounter,with,sed",,"valuable",-,-,"c,l,e,a,r"
预期输出
my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"
下面的sed脚本带有来自bash的注释字符串:
<<<'my data "this is my very first encounter with sed" "valuable" - - "c l e a r"' sed -E '
# Split input with each character on its own line
s/./&n/g;
# Add a newline on the end to separate output from input
s/$/n/;
# Each line has one character
# Add a leading character that stores "state"
# There are two states available - in quoting or not in quoting
# The state character is space when we are not in quotes
# The state character is double quote when we are in quotes
s/^/ /;
# For each character in input
:again; {
# Substitute a space that is not in quotes for a comma
s/^ / ,/
# When quotes is encountered and we are not in quotes
/^ "/{
# Change state to quotes
s//""/
b removed_quotes
} ; {
# When quotes is encountered and we are in quotes
# then we are no longer in quotes
s/^""/ "/
} ; : removed_quotes
# Preserve state as the first character
# Add the parsed character to the output on the end
# Preserve the rest
s/^(.)(.)n(.*)/132/;
# If end of input was not reached, then parse another character.
/^.n/!b again;
};
# Remove the leading state character with the newline
s///;
'
输出:
my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"
和oneliner,因为谁读到这些评论:
sed -E 's/./&n/g;s/$/n/;s/^/ /;:a;s/^ / ,/;/^ "/{s//""/;bq;};s/^""/ "/;:q;s/^(.)(.)n(.*)/132/;/^.n/!ba;s///'
我认为s
命令替换字符串中的换行n
是posix不需要的扩展。在解析时,可以使用另一个唯一字符而不是换行符来分隔输入。无论如何,我用GNU sed测试了它。
正如评论中所提到的,这更适合实际的CSV解析器,而不是试图使用正则表达式来拼凑一些东西,尤其是sed
相当基本的正则表达式。
perl
中使用有用的Text::AutoCSV模块的一行代码(通过您的操作系统包管理器或最喜欢的CPAN客户端安装(:
$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(sep_char=>" ", out_sep_char=>",")->write' < input.txt
my,data,"this is my very first encounter with sed",,valuable,-,-,"c l e a r"
使用GNU awk for FPAT:
$ awk -v FPAT='[^ ]*|"[^"]+"' -v OFS=',' '{$1=$1} 1' file
my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"
您的输入是CSV,其中CCD_;字符"而不是传统的";逗号";其中有问题的字符是空白的,而您只是试图将其转换为逗号分隔的CSV。请参阅What';使用awk高效解析CSV的最稳健方法是什么?有关以上操作以及使用awk解析CSV的更多信息。
awk 'BEGIN {RS=ORS="""} NR%2 {gsub(" ",",")} {print}' file
- 在开始时,将双引号设置为记录分隔符
- 对于奇数记录,即引号外的记录,请用逗号全局替换任何空格
- 打印每条记录
这可能对你有用(GNU sed(:
sed -E ':a;s/^((("[^"]*")*[^" ]*)*) /1,/;ta' file
替换,由零个或多个双引号字符串组成的组,后跟零个或更多非空格字符零个或更长时间,后跟空格,组后跟逗号,重复直到失败。