筛选多个 CSV 文本并创建新文件



我有大约 2500 个 CSV 文件,每个文件在文件大小方面约为 20MB。我正在尝试从每个文件中过滤掉某些行并将其保存到新文件中。

所以,如果我有:

File 1 :
Row1
Row2
Row3
File 2 : 
Row2
Row3 
and so on..

如果我过滤所有文件并选择"Row2"作为过滤器文本,则新文件夹应包含所有文件,其中只有与过滤器文本匹配的行。

浏览一些论坛,我想出了以下内容,可能有助于我过滤行,但我不确定如何递归地做到这一点,而且我也不知道这是否是一种足够快的方法。任何帮助,不胜感激。

Get-Content "C:Path to file" | Where{$_ -match "Rowfiltertext*"} | Out-File "Path to Out file"

我正在使用Windows,所以我想Powershell类型的解决方案在这里将是最好的。

要过滤的文本将始终位于第一列中。

谢谢 西丹特

以下是在(文本)文件中搜索字符串的两种快速方法:

1) 使用开关

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:PathToTheCsvFiles'
$outputPath    = 'X:FilteredCsv.txt'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
# iterate through the lines in the file and output the ones that match the search pattern
switch -Regex -File $_.FullName {
$searchPattern { $_ }
}
} | Set-Content -Path $outputPath  # add -PassThru to also show on screen

2) 使用选择字符串

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:PathToTheCsvFiles'
$outputPath    = 'X:FilteredCsv.txt'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem  -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
($_ | Select-String -Pattern $searchPattern).Line
} | Set-Content -Path $outputPath  # add -PassThru to also show on screen

如果您想为每个原始文件输出一个新的 csv 文件,

用:

3)使用开关

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:PathToTheCsvFiles'
$outputPath    = 'X:FilteredCsv'
if (!(Test-Path -Path $outputPath -PathType Container)) {
$null = New-Item -Path $outputPath -ItemType Directory
}
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
# create a full target filename for the filtered output csv
$outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
# iterate through the lines in the file and output the ones that match the search pattern
$result = switch -Regex -File $_.FullName {
$searchPattern { $_ }
}
$result | Set-Content -Path $outFile  # add -PassThru to also show on screen
}

4) 使用选择字符串

$searchPattern = [regex]::Escape('Rowfiltertext')  # for safety escape regex special characters
$sourcePath    = 'X:PathToTheCsvFiles'
$outputPath    = 'X:FilteredCsv'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem  -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
# create a full target filename for the filtered output csv
$outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
($_ | Select-String -Pattern $searchPattern).Line | Set-Content -Path $outFile  # add -PassThru to also show on screen
}

希望有帮助

Re. "fast enough method":
Get-Content 非常慢。 你可以使用"System.IO.StreamReader"代替,即将完整的文件内容读取成一个字符串,然后将这个字符串分成行,依此类推,例如:

[System.IO.FileStream]$objFileStream = New-Object System.IO.FileStream($Csv.FullName, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
[System.IO.StreamReader]$objStreamReader = New-Object System.IO.StreamReader($objFileStream, [System.Text.Encoding]::UTF8)
$strFileContent = ($objStreamReader.ReadToEnd())
$objStreamReader.Close()
$objStreamReader.Dispose()
$objFileStream.Close()
$objFileStream.Dispose()
[string[]]$arrFileContent = $strFileContent -split("`r`n")

最新更新