如何在bash脚本中将sed应用于grep结果



我想用一些sed命令修改csv文件,但只修改与某些正则表达式匹配的行。

我有一个grep命令,它在脚本中运行良好:

#!/usr/bin/bash   
   
egrep  '^[A-Z][a-z]*,2018' happiness.csv

以及正确工作的所需sed命令:

#!/usr/bin/bash   
       
sed  -re '
 s/(^|,)(,|$)/1NULL2/g; s/(^|,)(,|$)/1NULL2/g
 s/[a-z]/U&/g
 s/([0-9]+.[0-9]{2})[0-9]+/1/g
  
' happiness.csv

当我把它们组合在一个脚本中时,grep命令会被省略,脚本只运行sed命令:

#!/usr/bin/bash   
   
egrep  '^[A-Z][a-z]*,2018' happiness.csv
sed  -re '
 s/(^|,)(,|$)/1NULL2/g; s/(^|,)(,|$)/1NULL2/g
 s/[a-z]/U&/g
 s/([0-9]+.[0-9]{2})[0-9]+/1/g
  
' happiness.csv

样本数据:

Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Confidence in national government,Democratic Quality,Delivery Quality,Standard deviation of ladder by country-year,Standard deviation/Mean of ladder by country-year,GINI index (World Bank estimate),"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014"
Afghanistan,2008,3.723589897,7.168690205,0.450662315,50.79999924,0.718114316,0.177888572,0.88168633,0.517637193,0.25819549,0.61207211,-1.929689646,-1.655084372,1.774661899,0.476599723,,,,,,,,,,
Afghanistan,2009,4.401778221,7.333789825,0.55230844,51.20000076,0.678896368,0.200178429,0.850035429,0.583925605,0.23709242,0.611545205,-2.044092655,-1.635024786,1.722687602,0.391361743,,,0.441905767,0.286315262,,,,,,
Afghanistan,2018,4.75838089,7.386628628,0.539075196,51.59999847,0.60012722,0.13435255,0.706766069,0.61826545,0.275323808,0.299357414,-1.991810083,-1.617176056,1.878621817,0.394802749,,,0.327318162,0.275832713,,,,,,
Afghanistan,2011,3.83171916,7.415018559,0.521103561,51.91999817,0.495901406,0.172136664,0.731108546,0.611387312,0.267174691,0.307385713,-1.919018269,-1.616221189,1.78535974,0.465942234,,,0.336764246,,,,,,,
Afghanistan,2012,3.782937527,7.517126083,0.520636737,52.24000168,0.530935049,0.244272724,0.775619805,0.710384727,0.267919123,0.435440153,-1.842995763,-1.40407753,1.798283219,0.47536689,,,0.344539613,,,,,,,
Afghanistan,2013,3.572100401,7.522237778,0.48355186,52.56000137,0.577955365,0.070402659,0.8232041,0.620584846,0.273328096,0.482847273,-1.879708767,-1.403035522,1.223689914,0.342568725,,,0.304368466,,,,,,,
Afghanistan,2014,3.130895615,7.516955376,0.525568426,52.88000107,0.508514047,0.113184482,0.871241987,0.531691492,0.374860734,0.409047514,-1.773256779,-1.312502503,1.395396113,0.445685923,,,0.413973927,,,,,,,
Afghanistan,2015,3.982854605,7.500538826,0.528597236,53.20000076,0.388927579,0.089090675,0.880638301,0.553553164,0.339276046,0.260557145,-1.84436357,-1.29159379,2.16061759,0.542479634,,,0.59691757,,,,,,,
Albania,2018,4.220168591,7.497038364,0.559071779,53,0.522566199,0.051364917,0.793245554,0.564952672,0.348332286,0.324989557,-1.855426311,-1.392712831,1.796219468,0.42562741,,,0.418629497,,,,,,,

期望输出:

COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-2014" 
AFGHANISTAN,2018,2.69,7.49,-0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
ALBANIA,2018,4.63,9.07,-0.82,65.80,0.52,-0.01,0.87,0.55,0.24,0.30,-0.04,-0.42,1.76,0.38,NULL,0.30,NULL,NULL,NULL,NULL,0.24,0.23,NULL,       ARGENTINA,2018,5.48,9.16,-0.83,66.19,0.52,-0.16,0.86,0.64,0.27,NULL,0.04,-0.26,1.91,0.34,NULL,0.30,0.61,0.11,NULL,NULL,0.24,0.23,NULL,

您可以使用与egrep中相同的正则表达式进行搜索,并确保对所有替换命令进行分组:

sed -nE '1p; /^[A-Z][a-z]*,2018/ {
s/(^|,)(,|$)/1NULL2/g; s/(^|,)(,|$)/1NULL2/g
s/[a-z]+/U&/g
s/([0-9]+.[0-9]{2})[0-9]+/1/gp
}' happiness.csv
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL

我不是bash专业人士,但这应该有效:

#!/usr/bin/bash   
grep_res=$(egrep  '^[Aa]+.*,2018' happiness.csv)
echo "$grep_res" | sed  -re '
s/(^|,)(,|$)/1NULL2/g; s/(^|,)(,|$)/1NULL2/g
s/[a-z]/U&/g
s/([0-9]+.[0-9]{2})[0-9]+/1/g
' 

它所做的是将grep的输出保存在grep_res变量中,然后将其提供给sed命令

这是使用标准Linux awk(gawk(脚本的相同解决方案。

包括处理第一线。

脚本.awk

{ $0 = toupper($0);}   #Upper case each incoming line
/^[A-Z]*,2018/ || NR == 1 {    # deal with first line or matching with /^[A-Z]*,2018/
  $0 = gensub(/([,])([,]|$)/, "\1NULL\2", "g", $0); # replace ,, with ,NULL,
  $0 = gensub(/([,])([,]|$)/, "\1NULL\2", "g", $0); # replace remaining ,, with ,NULL,
  $0 = gensub(/([0-9]+.[0-9])([0-9])([0-9])*/, "\1\2", "g", $0); # trim decimal point numbers
  print $0; # print output line
}

正在运行

 awk -f script.awk happiness.csv

输出

$ awk -f script.awk input.csv
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-19","MOST PEOPLE CAN BE TRUSTED, 
WVS ROUND 1994-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-20"
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL