Powershell匹配属性,然后选择性地组合对象以创建第三个对象



我有一个解决方案,但我认为这不是最好的方法,因为它需要很长时间,所以我正在寻找一种更快/更好/更智能的方法。

我有多个从.csv文件中提取的pscustomObject对象。每个对象至少有一个公共属性。一个相对较小(对象中约200-300个项目/行),但另一个相当大(约60000-100000个项目)。其中一个的内容可能与另一个内容匹配,也可能不匹配。

我需要找到两个对象在特定属性上的匹配位置,然后将每个对象的属性组合为一个具有所有或大多数属性的对象。

代码的一个示例片段(不准确,但它应该可以工作-请参阅示例数据的图像):数据表

Write-Verbose "Pulling basic Fruit data together"
$Purchase = import-csv "C:Purchase.csv"
$Selling = import-csv "C:Selling.csv"
Write-Verbose "Combining Fruit names and removing duplicates"
$Fruits = $Purchase.Fruit
$Fruits += $Selling.Fruit
$Fruits = $Fruits | Sort-Object -Unique
$compareData = @()
Foreach ($Fruit in $Fruits) {
$IndResults = @()
$IndResults = [pscustomobject]@{
#Adding Purchase and Selling data
Farmer = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Farmer
Region = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Region
Water = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Water
Market = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Market
Cost = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Cost
Tax = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Tax
}
Write-Verbose "Loading Individual results into response"
$CompareData += $IndResults
}
Write-Output $CompareData

我认为问题是这样的:

Farmer = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Farmer

如果我理解这一点,它就是在每次经过这一行时都要查看$Purchase对象。我正在寻找一种方法来加快整个过程,而不是让它在每次比赛尝试时查看整个对象。

使用此Join-Object:

$Purchase | Join $Selling -On Fruit | Format-Table

结果(使用Simon Catlin的数据):

Fruit      Farmer  Region     Water Market  Cost Tax
-----      ------  ------     ----- ------  ---- ---
Apple      Adam    Alabama    1     MarketA 10   0.1
Cherry     Charlie Cincinnati 2     MarketC 20   0.2
Damson     Daniel  Derby      3     MarketD 30   0.3
Elderberry Emma    Eastbourne 4     MarketE 40   0.4
Fig        Freda   Florida    5     MarketF 50   0.5

使用Join Object

http://ramblingcookiemonster.github.io/Join-Object/

Join-Object -Left $purchase -Right $selling -LeftJoinProperty fruit -RightJoinProperty fruit -Type OnlyIfInBoth | ft

当我试图将人力资源系统的员工数据与AD林中的员工数据进行整合时,遇到了这个问题。由于有成千上万的行,这个过程需要很长时间。

我最终放弃了自定义对象,转而使用老式的哈希表。

然后,哈希表条目本身包含一个子哈希表和数据。在您的实例中,外部哈希将以$fruit为关键字,子哈希包含各种属性,例如:farmregion等。

相比之下,哈希表是闪电般的快。遗憾的是PowerShell在这方面进展缓慢。

如果你需要更多信息,大声喊出来。

26/01示例代码。。。假设我正确理解需求:


采购。CSV:

Fruit,Farmer,Region,Water
Apple,Adam,Alabama,1
Cherry,Charlie,Cincinnati,2
Damson,Daniel,Derby,3
Elderberry,Emma,Eastbourne,4 
Fig,Freda,Florida,5

出售。CSV

Fruit,Market,Cost,Tax
Apple,MarketA,10,0.1
Cherry,MarketC,20,0.2
Damson,MarketD,30,0.3
Elderberry,MarketE,40,0.4
Fig,MarketF,50,0.5

代码

[String]       $Local:strPurchaseFile    = 'c:temppurchase.csv';
[String]       $Local:strSellingFile     = 'c:tempselling.csv';
[HashTable]    $Local:objFruitHash       = @{};
[System.Array] $Local:objSelectStringHit = $null;
[String]       $Local:strFruit           = '';
if ( (Test-Path -LiteralPath $strPurchaseFile -PathType Leaf) -and (Test-Path -LiteralPath $strSellingFile -PathType Leaf) ) {
#
# Populate data from purchase file.
#
foreach ( $objSelectStringHit in (Select-String -LiteralPath $strPurchaseFile -Pattern '^([^,]+),([^,]+),([^,]+),([^,]+)$' | Select-Object -Skip 1) ) {
$objFruitHash[ $objSelectStringHit.Matches[0].Groups[1].Value ] = @{ 'Farmer' = $objSelectStringHit.Matches[0].Groups[2].Value;
       'Region' = $objSelectStringHit.Matches[0].Groups[3].Value;
       'Water'  = $objSelectStringHit.Matches[0].Groups[4].Value;
     };
} #foreach-purchase-row
#
# Populate data from selling file.
#
foreach ( $objSelectStringHit in (Select-String -LiteralPath $strSellingFile -Pattern '^([^,]+),([^,]+),([^,]+),([^,]+)$' | Select-Object -Skip 1) ) {
$objFruitHash[ $objSelectStringHit.Matches[0].Groups[1].Value ] += @{ 'Market' = $objSelectStringHit.Matches[0].Groups[2].Value;
        'Cost'   = [Convert]::ToDecimal( $objSelectStringHit.Matches[0].Groups[3].Value );
        'Tax'    = [Convert]::ToDecimal( $objSelectStringHit.Matches[0].Groups[4].Value );
      };
} #foreach-selling-row
#
# Output data.  At this point, you could now build a PSCustomObject.
#
foreach ( $strFruit in ($objFruitHash.Keys | Sort-Object) ) {
Write-Host -Object ( '{0,-15}{1,-15}{2,-15}{3,-10}{4,-10}{5,10:C}{6,10:P}' -f 
$strFruit,
$objFruitHash[$strFruit]['Farmer'],
$objFruitHash[$strFruit]['Region'],
$objFruitHash[$strFruit]['Water'],
$objFruitHash[$strFruit]['Market'],
$objFruitHash[$strFruit]['Cost'],
$objFruitHash[$strFruit]['Tax']
);
} #foreach
} else {
Write-Error -Message 'File error.';
} #else-if

我需要自己做类似的事情。我想取两个系统数组对象并对它们进行比较,从而得出匹配项,而不必每次都操作输入数据。这是我使用的方法,尽管我意识到这是低效的,但对于我必须处理的大约200张记录来说,这是即时的。

我试着把我正在做的事情(用户和他们的新旧家庭目录)翻译成农民、水果和市场等,所以我希望这有意义!

$Purchase = import-csv "C:Purchase.csv"
$Selling = import-csv "C:Selling.csv"
$compareData = @()
foreach ($iPurch in $Purchase) {
foreach ($iSell in $Selling) {
if ($iPurch.fruit -match $iSell.fruit) {
write-host "Match found between $($iPurch.Fruit) and $($iSell.Fruit)"
$hash = @{
Fruit           =   $iPurch.Fruit
Farmer          =   $iPurch.Farmer
Region          =   $iPurch.Region
Water           =   $iPurch.Water
Market          =   $iSell.Market
Cost            =   $iSell.Cost
Tax             =   $iSell.Tax
}
$Build = New-Object PSObject -Property $hash
$Total = $Total + 1
$compareData += $Build
}
}
}
Write-Host "Processed $Total records"

最新更新