C语言 AArch64汇编中的原子链表LIFO，在ldxr / stxr之间使用加载或存储

我使用 ARMv8 64 位程序集为共享内存上下文实现了 LIFO。

LIFO 在开头插入一个节点，每个节点结构的第一个属性必须是下一个指针。

这是用于实现 LIFO 的原子插入和删除的正确程序集吗？

它使用 LL/SC 在 LDXR 和 STXR 之间具有额外的负载或存储来读取 head-next> 或将指针存储到新节点中。

typedef union {
void * head[1];
}lifo;
int atomic_lifo_init(lifo * h) {
if (h) {
h->head[0]=NULL;
}
}
inline void *
atomic_lifo_delete (lifo *h)
{
void    *ret = NULL;
/*sa_ignore UNUSED_VAR*/
void *  tmp = NULL;

asm volatile ("n"
"2:  ldxr      %0,[%2] n" //load the head from h, which points the 1st node of list to local ret pointer.
"    cbz     %0, 3f n" //check if the lifo is empty.
"    ldr      %1,[%0] n" //store in tmp the 2nd node by derefencing the ret (h->head points 1st node. value of each node is next node as 1st attribute of node structure is pointing next.)
"    stxr     %w1, %1,[%2] n" //update h->head with tmp.
"    cbnz     %w1, 2b n" //continue if failed
"3:              n"
: "=&r" (ret), "=&r" (tmp)
: "r" (h)
: "memory"
);
return(ret);
}

/*
* atomic_lifo_insert()
*      Put an element on a list, protected against SMP
*/
void
atomic_lifo_insert (lifo *h, void *__new)
{
/*sa_ignore UNUSED_VAR*/
void * next = NULL;
void * flag = NULL;
asm volatile (" n"
"1: ldxr      %1,[%2] n" //load head[0] from h,which points 1st node to local next pointer
"   str      %1,[%3] n" //store the local next pointer to value of __new, as 1st attribute of the any node is next (convention used here). so __new's next is pointing current 1st node.
"   stxr    %w0, %3,[%2] n" //update the h->head with 
__next.
"   cbnz    %w0, 1b n" //if stxr is failure try again.
: "=&r" (flag), "=&r" (next)
: "r" (h), "r" (__new)
: "memory"
);
}

我真的是ARM组装的新手，所以任何帮助都非常感谢。

您的内联 asm 约束看起来是正确的，这应该按照您的预期方式编译。您可能可以使用"+m" (*h)让编译器选择寻址模式，而不是使用"r"(h)和[%2]对其进行硬编码。

就排序而言，ldr在 ldxr 之后进行依赖排序(如 C11memory_order_consume(，因此可以工作。

但是，insert中的 LL/SC 之间的str可能直到stxr将地址发布到其他线程之后才可见。所以我认为你需要stlxr(发布商店(在insert.

在LDXR/STXR之间进行额外的加载或存储是不安全的。维基百科的 LL/SC 文章提到，如果您在 LL 和 SC 之间进行任何加载或存储，某些实现可能会虚假失败。 Wiki表示PowerPC特别允许这样做。但 AArch64 通常明确没有：根据 ARM 参考手册(参见 @James Greenhalgh 的评论(：

LoadExcl/StoreExcl循环只有在[...]在独占负载和存储独占之间，没有显式内存访问。

可能有一些 AArch64 CPU 会在其中创建无限循环stxr故障，但可能还有其他 IT 可以工作。如果它在实践中适用于您测试的 CPU，那么看看是否有任何文档可以支持它可能是个好主意。

如果new_节点恰好与头节点位于同一缓存行(或 LL/SC 独占块(中，则这很可能是一个问题。确保你测试了这种情况，或者以某种方式让它变得不可能，如果这在你关心的微架构上完全有效的话。

除此之外，我认为您的整体算法看起来是正确的，所以如果您已经测试过它并发现它可以工作，那么这可能很好。

但是，我还没有真正仔细考虑过您的算法;我也没有任何设计或使用原始LL/SC的经验。我原则上知道它们是如何工作的，但我不准备说这绝对是正确的。从我所做的一点点思考中，我没有看到任何问题，但这并不意味着没有任何问题。

相关内容

最新更新

热门标签：