按特定值匹配数据集中的实例记录



我有一个解决方案,它运行良好,但性能较差,运行需要一些时间。让我们从最初的两个查询(均为双联接(返回的内容开始:

第一组数据看起来是这样的——让我们把它们称为line_items。正如您将看到的,line_items没有dh_first_name键/值。


[
[
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "101.0",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "11.62",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133590.0",
pbbname: "CUSTOMER",
opl_amount: "79.0",
ops_type: "P",
ops_stop_id: 269780,
ops_order_id: 133590,
ops_driver1: 104,
ops_delivered_time: null
},
{
pb_id: "133220.0",
pbbname: "CUSTOMER",
opl_amount: "625.0",
ops_type: "D",
ops_stop_id: 269011,
ops_order_id: 133220,
ops_driver1: 62,
ops_delivered_time: "2021-04-01T12:35:00.000-05:00"
},
{
pb_id: "133357.0",
pbbname: "CUSTOMER",
opl_amount: "550.0",
ops_type: "D",
ops_stop_id: 269290,
ops_order_id: 133357,
ops_driver1: 92,
ops_delivered_time: "2021-04-01T09:38:00.000-05:00"
},
{
pb_id: "133219.0",
pbbname: "CUSTOMER",
opl_amount: "1267.06",
ops_type: "P",
ops_stop_id: 269008,
ops_order_id: 133219,
ops_driver1: 43,
ops_delivered_time: null
},
{
pb_id: "133577.0",
pbbname: "CUSTOMER",
opl_amount: "150.0",
ops_type: "P",
ops_stop_id: 269754,
ops_order_id: 133577,
ops_driver1: 94,
ops_delivered_time: null
},
{
pb_id: "133503.0",
pbbname: "CUSTOMER",
opl_amount: "79.0",
ops_type: "P",
ops_stop_id: 269592,
ops_order_id: 133503,
ops_driver1: 104,
ops_delivered_time: null
},
{
pb_id: "133643.0",
pbbname: "HALLMARK CARDS BERMAN BLAKE",
opl_amount: "79.0",
ops_type: "P",
ops_stop_id: 269895,
ops_order_id: 133643,
ops_driver1: 104,
ops_delivered_time: null
}
]
]

现在,让我们来看看第二个双联接的下一组数据,即line_stops。它看起来像这样:


[
{
pb_id: "133633.0",
pbbname: "CUSTOMER",
pb_net_rev: "250.0",
ops_driver1: 59,
ops_stop_id: 269869,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-02T13:07:00.000-05:00"
},
{
pb_id: "133127.0",
pbbname: "CUSTOMER",
pb_net_rev: "1147.0",
ops_driver1: 102,
ops_stop_id: 268801,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: null
},
{
pb_id: "133144.0",
pbbname: "CUSTOMER",
pb_net_rev: "650.0",
ops_driver1: 71,
ops_stop_id: 268836,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-01T14:38:00.000-05:00"
},
{
pb_id: "133144.0",
pbbname: "CUSTOMER",
pb_net_rev: "650.0",
ops_driver1: 71,
ops_stop_id: 268837,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: null
},
{
pb_id: "133188.0",
pbbname: "CUSTOMER",
pb_net_rev: "700.0",
ops_driver1: 71,
ops_stop_id: 268924,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-01T08:04:00.000-05:00"
},
]

我目前正在做的是循环浏览它们,并根据这些values对它们进行匹配。

ops_stop_id, ops_driver_1, pb_id

如果这三个匹配,那么我需要在特定的驱动程序名称下构建它们,该名称只能来自具有dh_first_name的实例。这个数据结构完成后看起来是这样的:

{
FIRST LAST: [
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "101.0",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133599.0",
pbbname: "CUSTOMER",
opl_amount: "11.62",
ops_type: "P",
ops_stop_id: 269802,
ops_order_id: 133599,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133536.0",
pbbname: "CUSTOMER",
opl_amount: "45.0",
ops_type: "P",
ops_stop_id: 269665,
ops_order_id: 133536,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133536.0",
pbbname: "CUSTOMER",
opl_amount: "5.18",
ops_type: "P",
ops_stop_id: 269665,
ops_order_id: 133536,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133522.0",
pbbname: "CUSTOMER",
opl_amount: "150.0",
ops_type: "P",
ops_stop_id: 269637,
ops_order_id: 133522,
ops_driver1: 11,
ops_delivered_time: null
},
{
pb_id: "133619.0",
pbbname: "CUSTOMER",
pb_net_rev: "550.0",
ops_driver1: 11,
ops_stop_id: 269841,
dh_first_name: "FIRST",
dh_last_name: "LAST",
ops_delivered_time: "2021-04-02T11:41:00.000-05:00"
}
],

您将看到两个记录的混合,匹配的参数组织正确。

这就是我目前解决问题的方式!


merger = {}
line_items.each do |lines, i|
line_stops.each do |stops|
if (lines.ops_stop_id == stops.ops_stop_id && lines.ops_driver1 == stops.ops_driver1 && lines.pb_id == stops.pb_id)
stops_arr.push(stops)
merger[stops.dh_first_name + ' ' + stops.dh_last_name] = (merger[stops.dh_first_name + ' ' + stops.dh_last_name] ||= []) << lines
end
end
end
line_stops.each do |stops|
if (!stops_arr.include?(stops))
stops_arr.push(stops)
merger[stops.dh_first_name + ' ' + stops.dh_last_name] = (merger[stops.dh_first_name + ' ' + stops.dh_last_name] ||= []) << stops
end
end

这太慢了,我认为这条线是罪魁祸首:

(merger[stops.dh_first_name + ' ' + stops.dh_last_name] ||= []) << stops

代码的时间复合体是O(lines.size * stops.size)

这是我关于O(lines.size + stops.size)的时间复杂的建议

def merge_key(stops)
stops.dh_first_name + ' ' + stops.dh_last_name
end
# Note that hash_key code below maybe not good enough
def hash_key(lines)
"#{lines.ops_stop_id} #{lines.ops_driver1} #{lines.pb_id}"
end
merger = Hash.new { |hash, key| hash[key] = [] }
stops_hash = Hash.new
# O(line_stops.size)
line_stops.each do |stops|
merge_key = merge_key(stops)
next if merger.hash_key?(merge_key) # since in your code, you not add dup stops, right ?
merger[merge_key] << stops
stops_hash[hash_key(stops)] = merge_key
end
# O(line_items.size)
line_items.each do |lines, i|
if merge_key = stops_hash[hash_key(lines)]
merger[merge_key].unshift(lines) # since in your code, lines add before stops, right ? 
end
end

相关内容

最新更新