Linux,bash:确定特定列中具有特定内容的单元格的行号

  • 本文关键字:单元格 bash Linux linux bash csv
  • 更新时间 :
  • 英文 :


确定特定列中具有特定内容的单元格的行号。

备注:

  • 列的标题算作一行
  • 列中的空字段算作一行
  • csv的字段用逗号分隔

给定:

给出了以下csv文件:file.csv

col_o2g,col_dgjdhu,col_of_interest,,
1234567890,tg75fjksfh,$kj56hahb,,
dsewsf,1234567890,,,
khhhdg,5gfj578fj,1234567890,,
,57ijf6ehg,46h%sgf,,
ubthfgfv,zts576fufj,256hf%(",,

给定变量:

# col variable
col=col_of_interest
# variable with the value of the field of interest
value_of_interest=1234567890
# output variable
# thats he part I am looking for
wanted_line_number= 

我所拥有的:

LINE_CNT=$(awk '-F[t ]*,[t ]*' -vcol=${col} '
FNR==1 {
for(i=1; i<=NF; ++i) {
if($i == col) {
col = i;
break;
}
}
if(i>NF) {
exit 1;
}
}
FNR>1 {
if($col) maxc=FNR;
}
END{
print maxc;
}' file.csv)
echo line count of lines from column $col  
echo "$LINE_CNT"

想要输出:

echo "The wanted line number are:"
echo $wanted_line_number

输出:4

我一直在试图破解你的问题,所以让我知道我做得对不对。我想在你的情况下,你不知道csv文件中有多少列,也不知道第一行是否是标题。

对于第二句话,我没有自动解决方案,所以您需要根据输入参数提供第1行是否为标头。

让我给你看一个测试用例

]$ more test.csv
col_1,col_2,col_3,col_4
1234567890,tg75fjksfh,kj56hahb,dkdkdkd
dsewsf,1234567890,,dkdkdk
khhhdg,5gfj578fj,1234567890,akdkdkd
ubthfgfv,zts576fufj,256hf,,

然后,您想知道csv中感兴趣列的位置,以及感兴趣值所在的行。这里是我的示例脚本(可以改进(。请记住,我已经将test.csv文件的示例编码到了脚本中。

$ cat check_csv.sh
column_of_interest=$1
value_of_interest=$2
with_header=$3
# check which column is the one
if [[ $with_header = "Y" ]];
then
num_cols=$(cat test.csv | awk --field-separator="," "{ print NF }" | head -n 1)
echo "csv contains $num_cols columns"
to_rows=$(cat test.csv |  head -n 1 | tr ',' 'n')
iteration=0
for i in $(cat test.csv |  head -n 1 | tr ',' 'n')
do
iteration=$(expr $iteration + 1)
counter=$(echo $i | egrep -i "$column_of_interest" | wc -l)
#echo $i
#echo $counter
if [ $counter -eq 1 ]
then
echo "Column of interest $i is located on number $iteration"
export my_col_is=$iteration;
fi
done
# fine line that ccontains the value of interest
iteration=0
while IFS= read -r line
do
iteration=$(expr $iteration + 1 )
if [[ $iteration -gt 1 ]];
then
#echo $line
is_there=$(echo $line | awk -v temp=$my_col_is -F ',' '{print $temp}' | egrep -i "$value_of_interest"| wc -l)
#echo $is_there
if [ $is_there -gt 0 ];
then
echo "Value of interest $value_of_interest is present on line $iteration"
fi
fi
done < test.csv
fi

运行该示例时,我想知道哪一列是col_2(位置(,以及它出现的行,即该列的值1234567890。我使用一个选项来识别文件是否具有标题

$ more test.csv
col_1,col_2,col_3,col_4
1234567890,tg75fjksfh,kj56hahb,dkdkdkd
dsewsf,1234567890,,dkdkdk
khhhdg,5gfj578fj,1234567890,akdkdkd
ubthfgfv,zts576fufj,256hf,,
$ ./check_csv.sh col_2 1234567890 Y
csv contains 4 columns
Column of interest col_2 is located on number 2
Value of interest 1234567890 is present on line 3

具有复制的行

$ more test.csv
col_1,col_2,col_3,col_4
1234567890,tg75fjksfh,kj56hahb,dkdkdkd
dsewsf,1234567890,,dkdkdk
khhhdg,5gfj578fj,1234567890,akdkdkd
ubthfgfv,zts576fufj,256hf,,
dsewsf,1234567890,,dkdkdk
dsewsf,1234567890,,dkdkdk
$ ./check_csv.sh col_2 1234567890 Y
csv contains 4 columns
Column of interest col_2 is located on number 2
Value of interest 1234567890 is present on line 3
Value of interest 1234567890 is present on line 6
Value of interest 1234567890 is present on line 7
$

如果你想处理没有头的文件,你只需要将代码复制到处理没有head -1的文件,但在这种情况下,你无法获得列的名称,也不知道在哪里可以找到它们。

col="col_of_interest"
value_of_interest="1234567890"
awk -v FS="," -v coi="$col" -v voi="$value_of_interest" 
'NR==1{
for(i=1; i<=NF; i++){
if(coi==$i){
y=i
}
}
next
}
{if($y==voi){print NR}}' file

输出:

4

参见:GNU awk:字符串操作函数(拆分(,awk中的数组,8个强大的awk内置变量–FS、OFS、RS、ORS、NR,NF、FILENAME、FNR和man awk

file=./input.csv
d=,
# get column number for col_of_interest
c=$(head -n1 "$file" | grep -oE "[^$d]+" | grep -niw "$col" | cut -d: -f1)
# print column with cut and get line numbers for 1234567890
[ "$c" -gt 0 ] && wanted_line_number=$(cut -d$d -f$c "$file" | grep -niw "$value_of_interest" | cut -d: -f1)
printf "The wanted line number are: %bn" $wanted_line_number

最新更新