如何使用perf_event_open()测量dtlb命中和dtlb未命中



我想测量缓存未命中率和dtlb未命中率。我已经完成了第一部分。

但我找不到如何设置配置以获得dtlb未命中和dtlb命中。当我测量缓存未命中时,我确实喜欢这样:

pe.type = PERF_TYPE_HARDWARE;
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_CACHE_MISSES;

perf中没有"直接"PMU事件,这将帮助您测量dTLB hits。内存加载和存储有单独的dTLB miss事件,当您运行以下命令时,您可以看到这些事件

sudo perf list | grep 'Hardware cache'
dTLB-load-misses                                   [Hardware cache event]
dTLB-loads                                         [Hardware cache event]
dTLB-store-misses                                  [Hardware cache event]
dTLB-stores                                        [Hardware cache event]

这里已经提到了每一个事件的含义。它们取决于您使用的微体系结构,这在dTLB-hits的计算中很重要。

例如,您希望对事件dTLB-load-misses、的发生情况进行采样

pe.type = PERF_TYPE_HW_CACHE;
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_CACHE_DTLB <<  0 | PERF_COUNT_HW_CACHE_OP_READ <<  8 | PERF_COUNT_HW_CACHE_RESULT_MISS << 16;

如果你想测量事件dTLB-loads的发生,

pe.type = PERF_TYPE_HW_CACHE;
pe.size = sizeof(struct perf_event_attr);
pe.config = PERF_COUNT_HW_CACHE_DTLB <<  0 | PERF_COUNT_HW_CACHE_OP_READ <<  8 | PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16;

对于测量dTLB-store-missesdTLB-stores,您需要在上述配置中将PERF_COUNT_HW_CACHE_OP_READ替换为PERF_COUNT_HW_CACHE_OP_WRITE

当测量任何硬件缓存事件时,配置应始终采用-形式

pe.config = (perf_hw_cache_id << 0) | (perf_hw_cache_op_id << 8) | (perf_hw_cache_op_result_id << 16) 

这里提到了perf_hw_cache_idperf_hw_cache_op_idperf_hw_cache_op_result_id的含义和不同的"枚举"值。

理想情况下,根据您的需求,您希望针对单个工作负载一起测量上述四个事件,因此下面显示了如何一起测量dTLB-load-missesdTLB-loads的示例-

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <sys/ioctl.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
#include <asm/unistd.h>
#include <errno.h>
#include <stdint.h>
#include <inttypes.h>

struct read_format {
uint64_t nr;
struct {
uint64_t value;
uint64_t id;
} values[];
};
int main(int argc, char* argv[]) {
struct perf_event_attr pea;
int fd1, fd2;
uint64_t id1, id2;
uint64_t val1, val2;
char buf[4096];
struct read_format* rf = (struct read_format*) buf;
int i;
memset(&pea, 0, sizeof(struct perf_event_attr));
pea.type = PERF_TYPE_HW_CACHE;
pea.size = sizeof(struct perf_event_attr);
pea.config = PERF_COUNT_HW_CACHE_DTLB <<  0 | PERF_COUNT_HW_CACHE_OP_READ <<  8 | PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16;
pea.disabled = 1;
pea.exclude_kernel = 1;
pea.exclude_hv = 1;
pea.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
fd1 = syscall(__NR_perf_event_open, &pea, 0, -1, -1, 0);
ioctl(fd1, PERF_EVENT_IOC_ID, &id1);
memset(&pea, 0, sizeof(struct perf_event_attr));
pea.type = PERF_TYPE_HW_CACHE;
pea.size = sizeof(struct perf_event_attr);
pea.config = PERF_COUNT_HW_CACHE_DTLB <<  0 | PERF_COUNT_HW_CACHE_OP_READ <<  8 | PERF_COUNT_HW_CACHE_RESULT_MISS << 16;;
pea.disabled = 1;
pea.exclude_kernel = 1;
pea.exclude_hv = 1;
pea.read_format = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
fd2 = syscall(__NR_perf_event_open, &pea, 0, -1, fd1 /*!!!*/, 0);
ioctl(fd2, PERF_EVENT_IOC_ID, &id2);

ioctl(fd1, PERF_EVENT_IOC_RESET, PERF_IOC_FLAG_GROUP);
ioctl(fd1, PERF_EVENT_IOC_ENABLE, PERF_IOC_FLAG_GROUP);
sleep(10);
ioctl(fd1, PERF_EVENT_IOC_DISABLE, PERF_IOC_FLAG_GROUP);

read(fd1, buf, sizeof(buf));
for (i = 0; i < rf->nr; i++) {
if (rf->values[i].id == id1) {
val1 = rf->values[i].value;
} else if (rf->values[i].id == id2) {
val2 = rf->values[i].value;
}
}
printf("dTLB-loads: %"PRIu64"n", val1);
printf("dTLB-load-misses: %"PRIu64"n", val2);
return 0;

这里提到了使用perf_event_open监控多个事件时涉及的一些想法,上面的程序就是从中复制的。

相关内容

  • 没有找到相关文章

最新更新