你能使用MongoDB map/reduce来迁移数据吗?



我有一个大集合,我想通过填充字段来修改所有文档。

一个简单的例子可能是缓存每个帖子的评论计数:

class Post
  field :comment_count, type: Integer
  has_many :comments
end
class Comment
  belongs_to :post
end

我可以串行运行它,如下所示:

Post.all.each do |p|
  p.udpate_attribute :comment_count, p.comments.count
end

但它需要 24 小时才能运行(大型集合)。我想知道 mongo 的地图/减少是否可以用于此? 但我还没有看到一个很好的例子。

我想你会映射出评论集合,然后将减少的结果存储在帖子集合中。 我走在正确的轨道上吗?

你可以使用 MongoDB map/reduce 来"帮助"迁移数据,不幸的是,您无法使用它来进行完全的服务器端迁移。您走在正确的轨道上,基本思想是:

  1. 将每个注释映射到 emit(post_id, {comment_count: 1}) ---> {_id: post_id, 值: {comment_count: 1}}
  2. 归约到值 {comment_count: N} N 是 {_id: post_id, 值: {comment_count: N}} 的计数--->总和
  3. 指定输出选项 {reduce: 'posts'} 以减少映射/减少comment_counts的结果回到帖子集合中

经过一番广泛的调查,我发现你可以接近,但是有一个问题会阻止您进行完全的服务器端迁移。归约的结果的形状为 {_id: KEY, value: MAP_REDUCE_VALUE}。我们现在坚持这种形状,似乎没有办法绕过它。所以你既不能得到这个形状之外的完整原始文档作为输入来减少(实际上,你会丢失这个形状之外的数据),也不会由于 reduce 而更新此形状之外的文档。因此,帖子集合的"最终"更新必须通过客户端以编程方式完成。看起来解决这个问题将是一个很好的修改请求。

下面找到一个工作示例,演示在Ruby中使用MongoDB map/reduce来计算所有comment_counts。然后,我以编程方式使用 map_reduce_results 集合来更新帖子集合中的comment_count。reduce函数从尝试使用out:{reduce:'posts'}中剥离

出来

你可以通过一些实验来验证我的答案,或者如果你愿意,我可以根据要求发布不工作的完全服务器端尝试,配有固定模型。希望这有助于理解Ruby中的MongoDB映射/reduce。

test/unit/comment_test.rb

require 'test_helper'
class CommentTest < ActiveSupport::TestCase
  def setup
    @map_reduce_results_name = 'map_reduce_results'
    delete_all
  end
  def delete_all
    Post.delete_all
    Comment.delete_all
    Mongoid.database.drop_collection(@map_reduce_results_name)
  end
  def dump(title = nil)
    yield
    puts title
    Post.all.to_a.each do |post|
      puts "#{post.to_json} #{post.comments.collect(&:text).to_json}"
    end
  end
  def generate
    (2+rand(2)).times do |p|
      post = Post.create(text: 'post_' + p.to_s)
      comments = (2+rand(3)).times.collect do |c|
        Comment.create(text: "post_#{p} comment_#{c}")
      end
      post.comments = comments
    end
  end
  def generate_and_migrate(title = nil)
    dump(title + ' generate:') { generate }
    dump(title + ' migrate:') { yield }
  end
  test "map reduce migration" do
    generate_and_migrate('programmatic') do
      Post.all.each do |p|
        p.update_attribute :comment_count, p.comments.count
      end
    end
    delete_all
    generate_and_migrate('map/reduce') do
      map = "function() { emit( this.post_id, {comment_count: 1} ); }"
      reduce = <<-EOF
        function(key, values) {
          var result = {comment_count: 0};
          values.forEach(function(value) { result.comment_count += value.comment_count; });
          return result;
        }
      EOF
      out = @map_reduce_results_name #{reduce: 'posts'}
      result_coll = Comment.collection.map_reduce(map, reduce, out: out)
      puts "#{@map_reduce_results_name}:"
      result_coll.find.each do |doc|
        p doc
        Post.find(doc['_id']).update_attribute :comment_count, doc['value']['comment_count'].to_i
      end
    end
  end
end

测试输出(对JSON和Ruby检查的混合感到抱歉)

Run options: --name=test_map_reduce_migration
# Running tests:
programmatic generate:
{"_id":"4fcae3bde4d30b21e2000001","comment_count":null,"text":"post_0"} ["post_0 comment_0","post_0 comment_1","post_0 comment_2"]
{"_id":"4fcae3bde4d30b21e2000005","comment_count":null,"text":"post_1"} ["post_1 comment_1","post_1 comment_0","post_1 comment_2","post_1 comment_3"]
{"_id":"4fcae3bde4d30b21e200000a","comment_count":null,"text":"post_2"} ["post_2 comment_1","post_2 comment_3","post_2 comment_0","post_2 comment_2"]
programmatic migrate:
{"_id":"4fcae3bde4d30b21e2000001","comment_count":3,"text":"post_0"} ["post_0 comment_0","post_0 comment_1","post_0 comment_2"]
{"_id":"4fcae3bde4d30b21e2000005","comment_count":4,"text":"post_1"} ["post_1 comment_1","post_1 comment_0","post_1 comment_2","post_1 comment_3"]
{"_id":"4fcae3bde4d30b21e200000a","comment_count":4,"text":"post_2"} ["post_2 comment_1","post_2 comment_3","post_2 comment_0","post_2 comment_2"]
map/reduce generate:
{"_id":"4fcae3bee4d30b21e200000f","comment_count":null,"text":"post_0"} ["post_0 comment_0","post_0 comment_1"]
{"_id":"4fcae3bee4d30b21e2000012","comment_count":null,"text":"post_1"} ["post_1 comment_2","post_1 comment_0","post_1 comment_1"]
{"_id":"4fcae3bee4d30b21e2000016","comment_count":null,"text":"post_2"} ["post_2 comment_0","post_2 comment_1","post_2 comment_2","post_2 comment_3"]
map_reduce_results:
{"_id"=>BSON::ObjectId('4fcae3bee4d30b21e200000f'), "value"=>{"comment_count"=>2.0}}
{"_id"=>BSON::ObjectId('4fcae3bee4d30b21e2000012'), "value"=>{"comment_count"=>3.0}}
{"_id"=>BSON::ObjectId('4fcae3bee4d30b21e2000016'), "value"=>{"comment_count"=>4.0}}
map/reduce migrate:
{"_id":"4fcae3bee4d30b21e200000f","comment_count":2,"text":"post_0"} ["post_0 comment_0","post_0 comment_1"]
{"_id":"4fcae3bee4d30b21e2000012","comment_count":3,"text":"post_1"} ["post_1 comment_2","post_1 comment_0","post_1 comment_1"]
{"_id":"4fcae3bee4d30b21e2000016","comment_count":4,"text":"post_2"} ["post_2 comment_0","post_2 comment_1","post_2 comment_2","post_2 comment_3"]
.
Finished tests in 0.072870s, 13.7231 tests/s, 0.0000 assertions/s.
1 tests, 0 assertions, 0 failures, 0 errors, 0 skips

相关内容

  • 没有找到相关文章

最新更新