Ruby 搜索数组中的关键字



我正在解析 ruby 脚本中的大型 CSV 文件,需要从某些搜索键中找到最接近的标题匹配项。搜索键可能是一个或多个值,并且这些值可能不完全匹配,如下所示(应该接近)

search_keys = ["big", "bear"]

一个包含我需要搜索的数据的大数组,只想在title列上搜索:

array = [
          ["id", "title",            "code", "description"],
          ["1",  "once upon a time", "3241", "a classic story"],
          ["2",  "a big bad wolf",   "4235", "a little scary"],
          ["3",  "three big bears",  "2626", "a heart warmer"]
        ]

在这种情况下,我希望它返回行["3", "three big bears", "2626", "a heart warmer"]因为这与我的搜索键最接近。

我希望它从给定的搜索键中返回最接近的匹配项。

是否有任何我可以使用的助手/库/宝石?以前有人这样做过吗??

我担心,这项任务应该在数据库级别或类似级别处理给任何搜索引擎,在应用程序中获取数据并跨列/行等进行搜索没有意义,应该很昂贵。 但就目前而言,这是一个简单的方法:)

array = [
          ["id", "title",            "code", "description"],
          ["1",  "once upon a time", "3241", "a classic story"],
          ["2",  "a big bad wolf",   "4235", "a little scary"],
          ["3",  "three big bears",  "2626", "a heart warmer"]
        ]

h = {}
search_keys = ["big", "bear"]
array[1..-1].each do |rec|
  rec_id = rec[0].to_i
  search_keys.each do |key|
    if rec[1].include? key
      h[rec_id] = h[rec_id] ? (h[rec_id]+1) : 1
    end
  end
end
closest = h.keys.first
h.each do |rec, count| 
  closest = rec if h[closest] < h[rec]
end
array[closest] # => desired output :)

我认为你可以自己做,不需要使用任何宝石!这可能接近您的需求;在数组中搜索键,并为每个找到的元素设置排名。

result = []
array.each do |ar|
    rank = 0
    search_keys.each do |key|
        if ar[1].include?(key)
            rank += 1
        end
    end
    if rank > 0
        result << [rank, ar]
    end 
end

这段代码可以写得比上面更好,但我想向你展示细节。

这有效。将查找并返回匹配*行的数组作为result

*匹配的行 = ID、标题、代码或描述与提供的任何seach_keys匹配的行,包括部分搜索,例如"熊"中的"熊"

result = []
array.each do |a|
    a.each do |i|
        search_keys.each do |k|
            result << a if i.include?(k)
        end
    end
end
result.uniq!

你可能可以用更简洁的方式写它......

array = [
          ["id", "title",            "code", "description"],
          ["1",  "once upon a time", "3241", "a classic story"],
          ["2",  "a big bad wolf",   "4235", "a little scary"],
          ["3",  "three big bears",  "2626", "a heart warmer"]
        ]
search_keys = ["big", "bear"]

def sift(records, target_field, search_keys)
    # find target_field index
    target_field_index = nil
    records.first.each_with_index do |e, i|
        if e == target_field
            target_field_index = i
            break
        end
    end
    if target_field_index.nil?
        raise "Target field was not found"
    end
    # sums up which records have a match and how many keys they match
    # key => val = record => number of keys matched
    counter = Hash.new(0) # each new hash key is init'd with value of 0
    records.each do |record| # look at all our given records
        search_keys.each do |key| # check each search key on the field
            if record[target_field_index].include?(key)
                counter[record] += 1 # found a key, init to 0 if required and increment count
            end
        end
    end
    # find the result with the most search key matches
    top_result = counter.to_a.reduce do |top, record|
        if record[1] > top[1] # [0] = record, [1] = key hit count
            top = record # set to new top
        end
        top # continue with reduce
    end.first # only care about the record (not the key hit count)
end

puts "Top result: #{sift array, 'title', search_keys}"
# => Top result: ["3", "three big bears", "2626", "a heart warmer"]
这是我

的单行镜头

p array.find_all {|a|a.join.scan(/#{search_keys.join("|")}/).length==search_keys.length}
=>[["3", "three big bears", "2626", "a heart warmer"]]

按匹配数顺序获取所有行

p array.drop(1).sort_by {|a|a.join.scan(/#{search_keys.join("|")}/).length}.reverse

任何人都知道如何组合最后一个解决方案,以便删除不包含任何键的行并保持简洁?

最新更新