如何在c#中处理大内存列表时避免垃圾排序问题

我们在文章-标签关系方面遇到了一些SQL性能问题，所以我们决定将文章/标签保留在内存中，这给了我们很大的提升，但是当整个列表被删除并被新的列表(300万条以上的记录)所取代时，它现在给我们带来了令人头疼的垃圾收集问题。

下面是一段代码:

private readonly IContextCreator _contextCreator;
private volatile  static List<TagEngineCacheResponse> _cachedList = new List<TagEngineCacheResponse>();
private readonly int KEYWORD_GROUP_NAME = 1;
private static BitmapIndex.BitmapIndex _bitMapIndex = new BitmapIndex.BitmapIndex();
public TagEngineService(IContextCreator contextCreator)
{
_contextCreator = contextCreator;
}
public async Task RepopulateEntireCacheAsync()
{
using (var ctx = _contextCreator.PortalContext())
{
var cmd = ctx.Database.Connection.CreateCommand();
cmd.CommandText = BASE_SQL_QUERY;
await ctx.Database.Connection.OpenAsync();
var reader = await cmd.ExecuteReaderAsync();
var articles = ((IObjectContextAdapter)ctx)
.ObjectContext
.Translate<TagEngineCacheResponse>(reader).ToList();
//recreate bitmap indexes
BitmapIndex.BitmapIndex tempBitmapIndex = new BitmapIndex.BitmapIndex();
int recordRow = 0;
foreach (var record in articles)
{
tempBitmapIndex.Set(new BIKey(KEYWORD_GROUP_NAME, record.KeywordId), recordRow);
recordRow++;
}

_cachedList = articles;
_bitMapIndex = tempBitmapIndex;

}
}

类定义:

public class TagEngineCacheResponse
{
public int ArticleId { get; set; }
public int KeywordId { get; set; }
public DateTime PublishDate { get; set; }
public int  ViewCountSum { get; set; }
}

可以看到，当重新创建缓存时，_cachedList被替换为一个新列表，并且准备对旧列表进行垃圾收集。此时，GC的cpu时间会跳转到60-90%，持续2-3秒。

有什么想法如何改进这段代码以避免GC问题吗?

我猜这个列表每个对象大约需要44字节，对于3m个对象大约需要130Mb。这是有点大，但不是令人难以置信的。

的一些建议:

该列表远远超过了小对象堆(SOH)的87k限制，因此将在大对象堆(LOH)上分配它。这只在第2代收集，第2代收集可能很昂贵。为了避免这种情况，建议尽可能地避免对gen2对象进行重新分配，即分配一次，然后尽可能地重用它们。

您可以从数据库中以较小的块获取列表，然后就地更新列表。确保每个块都在SOH的限制范围内。您可以考虑锁定列表以确保在更新时不被访问，或者保留两个交替的列表，其中更新一个，然后切换"活动"列表。

您正在为TagEngineCacheResponse使用一个类，这将导致分配大量对象。虽然它们足够小，可以放在SOH上，但如果不走运的话，它们可能会存活足够长的时间，被放在第2代堆上。虽然未引用对象对GC时间的影响不大，但使用值类型可能更好，可以避免这个问题。配置文件，以确保它确实有帮助。

相关内容

最新更新

热门标签：