rails - 导出一个巨大的CSV文件会消耗生产中的所有RAM - rails - Exporting a huge CSV file consumes all RAM in production 小贝子编程网

所以我的应用程序导出了一个 11.5 MB 的 CSV 文件，并且基本上使用了所有永远不会释放的 RAM。

CSV的数据取自数据库，在上面提到的情况下，整个东西都被导出。

我以以下方式使用 Ruby 2.4.1 标准 CSV 库：

export_helper.rb：

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
data = Model.scope1(param).scope2(param).includes(:model1, :model2)
data.each do |item|
file << [
item.method1,
item.method2,
item.methid3
]
end
# repeat for other models - approx. 5 other similar loops
end

然后在控制器中：

generator = ExportHelper::ReportGenerator.new
generator.full_report
respond_to do |format|
format.csv do
send_file(
"#{Rails.root}/full_report.csv",
filename: 'full_report.csv',
type: :csv,
disposition: :attachment
)
end
end

在单个请求之后，Puma 进程加载整个服务器 RAM 的 55%，并保持这种状态，直到最终完全耗尽内存。

例如，在本文中，生成一个百万行 75 MB CSV 文件只需要 1 MB 的 RAM。但不涉及数据库查询。

服务器具有 1015 MB RAM + 400 MB 交换内存。

所以我的问题是：

究竟是什么消耗了这么多内存？是 CSV 生成还是与数据库的通信？
我是否做错了什么并丢失了内存泄漏？还是只是图书馆的工作方式？
有没有办法在不重新启动 puma worker 的情况下释放内存？

提前感谢！

而不是each你应该使用find_each，这是专门针对这种情况的，因为它将批量实例化模型并在之后释放它们，而each将一次实例化所有这些模型。

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
Model.scope1(param).find_each do |item|
file << [
item.method1
]
end
end

此外，在将 CSV 发送到浏览器之前，您应该流式传输 CSV，而不是将其写入内存或磁盘：

format.csv do
headers["Content-Type"] = "text/csv"
headers["Content-disposition"] = "attachment; filename="full_report.csv""
# streaming_headers
# nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
headers['X-Accel-Buffering'] = 'no'
headers["Cache-Control"] ||= "no-cache"
# Rack::ETag 2.2.x no longer respects 'Cache-Control'
# https://github.com/rack/rack/commit/0371c69a0850e1b21448df96698e2926359f17fe#diff-1bc61e69628f29acd74010b83f44d041
headers["Last-Modified"] = Time.current.httpdate
headers.delete("Content-Length")
response.status = 200
header = ['Method 1', 'Method 2']
csv_options = { col_sep: ";" }
csv_enumerator = Enumerator.new do |y|
y << CSV::Row.new(header, header).to_s(csv_options)
Model.scope1(param).find_each do |item|
y << CSV::Row.new(header, [item.method1, item.method2]).to_s(csv_options)
end
end
# setting the body to an enumerator, rails will iterate this enumerator
self.response_body = csv_enumerator
end

除了使用find_each 之外，您还应该尝试使用 ActiveJob 在后台作业中运行ReportGenerator代码。当后台作业在单独的进程中运行时，当它们被终止时，内存会释放回操作系统。

所以你可以尝试这样的事情：

用户请求一些报告(CSV，PDF，Excel(
一些控制器调用ReportGeneratorJob，并向用户显示确认
执行作业并发送一封包含下载链接/文件的电子邮件。

请注意，您可以轻松改进 ActiveRecord 端，但是当通过 Rails 发送响应时，它将全部进入 Response 对象的内存缓冲区：https://github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/http/response.rb#L110

您还需要利用实时流功能将数据直接传递给客户端，而无需缓冲：https://guides.rubyonrails.org/action_controller_overview.html#live-streaming-of-arbitrary-data

rails - 导出一个巨大的CSV文件会消耗生产中的所有RAM

相关内容

最新更新

热门标签：