我有一个Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
(Haswell
)处理器。AFAIK,mem_load_uops_retired.l3_miss
,统计DRAMdemand
(即non-prefetch
)的数据读访问次数。offcore_response.demand_data_rd.l3_miss.local_dram
,顾名思义,统计针对DRAM的demand
数据读取的次数。. 因此,这两个事件似乎是等效的(或至少几乎)相同的)。但是根据下面的基准测试,前一个事件的频率要低得多
1)在C
循环中初始化包含1000个元素的全局数组:
Performance counter stats for '/home/ahmad/Simple Progs/loop':
1,363 mem_load_uops_retired.l3_miss
1,543 offcore_response.demand_data_rd.l3_miss.local_dram
0.000749574 seconds time elapsed
0.000778000 seconds user
0.000000000 seconds sys
2)在Evince中打开PDF文档:
Performance counter stats for '/opt/evince-3.28.4/bin/evince':
936,152 mem_load_uops_retired.l3_miss
1,853,998 offcore_response.demand_data_rd.l3_miss.local_dram
4.346408203 seconds time elapsed
1.644826000 seconds user
0.103411000 seconds sys
3)运行Wireshark 5秒:
Performance counter stats for 'wireshark':
5,161,671 mem_load_uops_retired.l3_miss
8,126,526 offcore_response.demand_data_rd.l3_miss.local_dram
15.713828395 seconds time elapsed
0.904280000 seconds user
0.693906000 seconds sys
4)在Inkscape中对图像运行模糊滤镜:
Performance counter stats for 'inkscape':
13,852,121 mem_load_uops_retired.l3_miss
23,475,970 offcore_response.demand_data_rd.l3_miss.local_dram
25.355643897 seconds time elapsed
7.244404000 seconds user
1.019895000 seconds sys
所有四个在基准测试中,offcore_response.demand_data_rd.l3_miss.local_dram
几乎是的两倍和mem_load_uops_retired.l3_miss
一样频繁。这个合理吗? 为什么?请告诉我,如果基准测试太复杂粗粒度和
就我目前所知,下表显示了这两个事件在Haswell上的区别: