我正在尝试根据标头动态读取CSV文件中的值。以下是我的输入文件的外观。
File 1:
name,city,age
john,New York,20
jane,London,30
或
File 2:
name,age,city,country
john,20,New York,USA
jane,30,London,England
我可能没有遵循实现这一点的最佳方法,但我尝试了以下代码。
#!/bin/bash
{
read -r line
line=`tr ',' ' ' <<< $line`
while IFS=, read -r `$line`
do
echo $name
echo $city
echo $age
done
} < file.txt
我希望上面的代码将头的值作为变量名读取。我知道对于输入文件,列的顺序可能不同。但是,我希望这些文件在输入文件中有name、city和age列。这是正确的方法吗?如果是这样的话,如果上面的代码失败并出现错误",那么修复方法是什么;line7:名称:找不到命令";。
该问题是由backticks引起的。Bash将评估内容,并用刚评估的命令的输出替换回溯。
您可以简单地在读取命令后使用变量来实现您想要的:
#!/bin/bash
{
read -r line
line=`tr ',' ' ' <<< $line`
echo "$line"
while IFS=, read -r $line ; do
echo "person: $name -- $city -- $age"
done
} < file.txt
代码上的一些注释:
backtick语法是传统语法,现在首选使用
$(...)
来评估命令。新语法更加灵活。您可以使用
set -euo pipefail
启用自动脚本失败(请参阅此处(。如果遇到错误,这将使脚本停止。您的代码当前对无效标头数据非常敏感:使用类似的文件
n ame,age,city,country
john,20,New York,USA
jane,30,London,England
您的脚本(或者更确切地说是我答案开头的版本(将运行时没有错误,但输出无效。
引用变量以防止不必要的拆分也是一种很好的做法。
为了使其更加健壮,您可以按以下方式对其进行更改:
#!/bin/bash
set -euo pipefail
# -e and -o pipefail will make the script exit
# in case of command failure (or piped command failure)
# -u will exit in case a variable is undefined
# (in you case, if the header is invalid)
{
read -r line
readarray -d, -t header < <(printf "%s" "$line")
# using an array allows to detect if one of the header entries
# contains an invalid character
# the printf is needed because bash would add a newline to the
# command input if using heredoc (<<<).
while IFS=, read -r "${header[@]}" ; do
echo "$name"
echo "$city"
echo "$age"
done
} < file.txt
一种稍有不同的方法可以让awk
在给定任何一个输入文件的情况下处理所需输出的字段分离和排序。下面的awk
将所需的输出顺序存储在BEGIN
规则中设置的f[]
(字段(数组中。然后,在文件(FNR==1
(的第一行,删除数组a[]
,并用当前文件的标题填充。在这一点上,您只需在f[]
数组中按顺序循环字段名称,并从当前行输出相应的字段,例如
awk -F, '
BEGIN { f[1]="name"; f[2]="city"; f[3]="age" } # desired order
FNR==1 { # on first line read header
delete a # clear a array
for (i=1; i<=NF; i++) # loop over headings
a[$i] = i # index by heading, val is field no.
next # skip to next record
}
{
print "" # optional newline between outputs
for (i=1; i<=3; i++) # loop over desired field order
if (f[i] in a) # validate field in a array
print $a[f[i]] # output fields value
}
' file1 file2
示例使用/输出
对于您在file1
和file2
中显示的内容,您将具有:
$ awk -F, '
> BEGIN { f[1]="name"; f[2]="city"; f[3]="age" } # desired order
> FNR==1 { # on first line read header
> delete a # clear a array
> for (i=1; i<=NF; i++) # loop over headings
> a[$i] = i # index by heading, val is field no.
> next # skip to next record
> }
> {
> print "" # optional newline between outputs
> for (i=1; i<=3; i++) # loop over desired field order
> if (f[i] in a) # validate field in a array
> print $a[f[i]] # output fields value
> }
> ' file1 file2
john
New York
20
jane
London
30
john
New York
20
jane
London
30
尽管字段顺序不同,但两个文件的读取和处理方式相同。如果您还有其他问题,请告诉我。
如果使用Bash verison≥4.2,则可以使用关联数组捕获任意数量的字段,并将其名称作为关键字:
#!/usr/bin/env bash
# Associative array to store columns names as keys and and values
declare -A fields
# Array to store columns names with index
declare -a column_name
# Array to store row's values
declare -a line
# Commands block consuming CSV input
{
# Read first line to capture column names
IFS=, read -r -a column_name
# Proces records
while IFS=, read -r -a line; do
# Store column values to corresponding field name
for ((i=0; i<${#column_name[@]}; i++)); do
# Fills fields' associative array
fields["${column_name[i]}"]="${line[i]}"
done
# Dump fields for debug|demo purpose
# Processing of each captured value could go there instead
declare -p fields
done
} < file.txt
带有文件1 的样本输出
declare -A fields=([country]="USA" [city]="New York" [age]="20" [name]="john" )
declare -A fields=([country]="England" [city]="London" [age]="30" [name]="jane" )
对于没有关联数组的旧Bash版本,请使用索引列名:
#!/usr/bin/env bash
# Array to store columns names with index
declare -a column_name
# Array to store values for a line
declare -a value
# Commands block consuming CSV input
{
# Read first line to capture column names
IFS=, read -r -a column_name
# Proces records
while IFS=, read -r -a value; do
# Print record separator
printf -- '--------------------------------------------------n'
# Print captured field name and value
for ((i=0; i<"${#column_name[@]}"; i++)); do
printf '%-18s: %sn' "${column_name[i]}" "${value[i]}"
done
done
} < file.txt
输出:
--------------------------------------------------
name : john
age : 20
city : New York
country : USA
--------------------------------------------------
name : jane
age : 30
city : London
country : England