读取CSV文件,并在Powershell中逐步添加特定列的值



我想使用powershell脚本读取CSV文件中特定列的所有行。然后逐步添加值,直到当前行,然后在新列中更新总和。

例如:在下面的示例中,我想读取列COLUMNB中的所有值,然后逐步添加它并在PROG_COLB中更新它同样地,我要对column

做同样的操作CSV文件示例:

"RANK","COLUMNB","COLUMNC"
"5","510","10"
"4","500","60"
"3","120","100"
"0","600","200"
"-1","1000","40"

预期输出:

"RANK","COLUMNB","COLUMNC","PROG_COLB","PROG_COLC"
"5","510","10","510","10"
"4","500","60","1010","70"
"3","120","100","1130","170"
"0","600","200","1730","370"
"-1","1000","40","2730","410"

我尝试了一些事情,但无法得到预期的输出,我能够读取一个特定的列,我设法把它放在哈希表中,然后我卡住了。任何关于如何解开这个谜题的建议都将大有帮助。

$csv =Import-Csv .stats.csv 
$hash_COLB = @{​​​​}​​​​
$hash_COLC = @{​​​​}​​​​
foreach($item in $csv)
{​​​​
[int]$prog_COLB=[int]$prog_COLB+[int]$item.COLUMNB
[int]$prog_COLC=[int]$prog_COLC+[int]$item.COLUMNC
$hash_COLB.add($item.RANK,$prog_COLB)
$hash_COLC.add($item.RANK,$prog_COLC)
}​​​​
$hash_COLB
$hash_COLC

可以为COLUMNB和COLUMNC中值的运行计数创建两个变量。

$runningCount_B = 0
$runningCount_C = 0
$result = Import-Csv -Path 'D:Testsample.csv' | ForEach-Object {
$runningCount_B += [int]$_.COLUMNB
$runningCount_C += [int]$_.COLUMNC
# output a new object with all columns from the original, and added running counts
$_ | Select-Object *, @{Name = 'PROG_COLB'; Expression = {$runningCount_B}}, 
@{Name = 'PROG_COLC'; Expression = {$runningCount_C}}
}
# output on console screen
$result | Format-Table -AutoSize
# output to new CSV
$result | Export-Csv -Path 'D:Testnew_sample.csv' -NoTypeInformation

输出为CSV:

"RANK","COLUMNB","COLUMNC","PROG_COLB","PROG_COLC"
"5","510","10","510","10"
"4","500","60","1010","70"
"3","120","100","1130","170"
"0","600","200","1730","370"
"-1","1000","40","2730","410"

最直接的(IMO)和易于阅读的解决方案使用ForEach-Object遍历行并输出具有计算新行的PSCustomObject

$sumColB = 0
$sumColC = 0
$newCsv = $csv | ForEach-Object {
$sumColB += $_.COLUMNB
$sumColC += $_.COLUMNC
# This is the output of this loop iteration. PowerShell automatically adds it to $newCsv.
[PSCustomObject]@{
RANK      = $_.RANK
COLUMNB   = $_.COLUMNB
COLUMNC   = $_.COLUMNC
PROG_COLB = $sumColB
PROG_COLC = $sumColC
}
}

另一种方法是使用带有计算属性的Select-Object,这种方法可读性较差,但具有可重用性的优点。

Function Get-ColumnSum( [ref] $var, $columnName ) { 
$var.Value += $_.$columnName   # Add the value of the column named by $columnName
$var.Value                     # output
}
$sumB = 0
$sumC = 0
$newCsv = $csv | Select-Object *, 
@{ name = 'PROG_COLB'; expression = { Get-ColumnSum ([ref] $sumB) COLUMNB } }, 
@{ name = 'PROG_COLC'; expression = { Get-ColumnSum ([ref] $sumC) COLUMNC } }

首先,我们定义一个函数Get-ColumnSum,它在引用传递的变量中累积给定列的值并输出当前的和。

然后在Select-Object调用中,我们从计算列的脚本块中调用该函数。

通过引用传递变量是必要的,因为PowerShell的作用域规则。如果在嵌套作用域中修改变量,则只会修改该变量的副本。通过传递引用,我们可以直接修改给定的变量。

最新更新