我正在尝试存储一个文本文件字符串,该字符串的开头和结尾使其成为原始文本文件的子字符串。我是Powershell的新手,所以我的方法很简单/粗糙。基本上,我的方法是:
- 从字符串的开头大致得到我想要的内容
- 担心以后修剪掉我不想要的东西
我的最小可复制示例如下:
# selectStringTest.ps
$inputFile = Get-Content -Path "C:testtest3Copy of 31832_226140__0001-00006.txt"
# selected text string needs to span from $refName up to $boundaryName
[string]$refName = "001 BARTLETT"
[string]$boundaryName = "001 BEECH"
# a rough estimate of the text file lines required
[int]$lines = 200
if (Select-String -InputObject $inputFile -pattern $refName) {
Write-Host "Selected shortened string found!"
# this selects the start of required string but with extra text
[string]$newFileStart = $inputFile | Select-String $refName -CaseSensitive -SimpleMatch -Context 0, $lines
}
else {
Write-Host "Selected string NOT FOUND."
}
# tidy up the start of the string by removing rubbish
$newFileStart = $newFileStart.TrimStart('> ')
# this is the kind of thing I want but it doesn't work
$newFileStart = $newFileStart - $newFileStart.StartsWith($boundaryName)
$newFileStart | Out-File tempOutputFile
事实上:输出开始正确,但我无法删除包括$boundaryName
及其后的文本
原始文本文件是OCR生成的(光学字符识别(,因此格式不均匀。奇怪的地方有换行符。因此,当涉及到定界时,我的选择是有限的。
我不确定我的if (Select-String -InputObject $inputFile -pattern $refName)
是否有效。它似乎工作正常。总体设计似乎很粗糙。在这方面,我在猜测我需要多少行。最后,我尝试了各种方法来修剪$boundaryName
的字符串,但都没有成功。为此:
- string.split((不实用
- 将数组中的空格替换为换行符&循环到$boundaryName的元素是可能的,但我不知道如何在将数组返回到字符串之前终止数组
如有任何建议,我们将不胜感激。
x2 200个列表单个Copy of 31832_226140__0001-00006.txt
文件的缩写内容为:
文本文件的开头
________________
BARTLETT-BEDGGOOD
PENCARROW COMPOSITE ROLL
PAGE 6
PAGE 7
PENCARROW COMPOSITE ROLL
BEECH-BEST
www.
.......................
001 BARTLETT. Lois Elizabeth
文本文件中间
............. 15 St Ronans Av. Lower Hutt Marned 200 BEDGGOOD. Percy Lloyd
............15 St Ronans Av, Lower Mutt. Coachbuild
001 BEECH, Margaret ..........
文本文件结束
..............312 Munita Rood Eastbourne, Civil Eng 200 BEST, Dons Amy .........
..........50 Man Street, Wamuomata, Marned
SO NON
要跨换行使用regex,需要将文件作为单个字符串读取。Get-Content -Raw
会这么做。这假设您不希望包含refName和boundaryName的行包含在输出中
$c = Get-Content -Path '.beech.txt' -Raw
$refName = "001 BARTLETT"
$boundaryName = "001 BEECH"
if ($c -match "(?smi).*$refName.*?`r`n(.*)$boundaryName.*?`r`n.*") {
$result = $Matches[1]
}
$result
更多信息,请访问https://stackoverflow.com/a/12573413/447901
这与您想要的有多接近?
function Process-File {
param (
[Parameter(Mandatory = $true, Position = 0)]
[string]$HeadText,
[Parameter(Mandatory = $true, Position = 1)]
[string]$TailText,
[Parameter(ValueFromPipeline)]
$File
)
Process {
$Inside = $false;
switch -Regex -File $File.FullName {
#'^s*$' { continue }
"(?i)^s*$TailText(?<Tail>.*)`$" { $Matches.Tail; $Inside = $false }
'^(?<Line>.+)$' { if($Inside) { $Matches.Line } }
"(?i)^s*$HeadText(?<Head>.*)`$" { $Matches.Head; $Inside = $true }
default { continue }
}
}
}
$File = 'Copy of 31832_226140__0001-00006.txt'
#$Path = $PSScriptRoot
$Path = 'C:testtest3'
$Result = Get-ChildItem -Path "$Path$File" | Process-File '001 BARTLETT' '001 BEECH'
$Result | Out-File -FilePath "$PathSpanText.txt"
这是输出:
. Lois Elizabeth
............. 15 St Ronans Av. Lower Hutt Marned 200 BEDGGOOD. Percy Lloyd
............15 St Ronans Av, Lower Mutt. Coachbuild
, Margaret ..........