程序集:创建链接节点的阵列

首先，如果这个问题不合适，因为我没有提供任何代码或没有自己思考，我很抱歉，我会删除这个问题。

对于赋值，我们需要创建一个节点数组来模拟链表。每个节点都有一个整数值和一个指向列表中下一个节点的指针。这是我的.DATA部分

.DATA
linked_list DWORD 5 DUP (?) ;We are allowed to assume the linked list will have 5 items
linked_node STRUCT
    value BYTE  ?
    next BYTE ?
linked_node ENDS

我不确定我是否正确定义了我的STRUCT，因为我不确定next的类型应该是什么。此外，我对如何处理这个问题感到困惑。要将节点插入linked_list，我应该能够写入mov [esi+TYPE linked_list*ecx]，对吗？当然，我每次都需要inc ecx。我困惑的是如何执行mov linked_node.next, "pointer to next node"是否有某种运算符允许我将指向数组中下一个索引的指针设置为等于linked_node.next？还是我想得不对？任何帮助都将不胜感激！

用您熟悉的语言来思考您的设计。最好是C，因为C中的指针和值是直接映射到asm的概念。

假设您希望通过存储指向head元素的指针来跟踪您的链表。

#include <stdint.h>  // for int8_t
struct node {
    int8_t next;  // array index.  More commonly, you'd use  struct node *next;
                  // negative values for .next are a sentinel, like a NULL pointer, marking the end of the list
    int8_t val;
};
struct node storage[5];  // .next field indexes into this array
uint8_t free_position = 0;  // when you need a new node, take index = free_position++;
int8_t head = -1;  // start with an empty list

有一些技巧可以减少角点情况，比如让列表头是一个完整的节点，而不仅仅是一个引用（指针或索引）。您可以将其视为第一个元素，而不必到处检查空列表大小写。

无论如何，给定一个节点引用int8_t p（其中p是链表代码中指向列表节点的指针的标准变量名），下一个节点是storage[p.next]。下一个节点的val是storage[p.next].val。

让我们看看这在asm中是什么样子。NASM手册介绍了它的宏系统如何帮助您使使用全局结构的代码更具可读性，但我还没有为此做任何宏方面的工作。您可以为NEXT和VAL或其他什么定义宏，使用0和1，所以您可以说[storage + rdx*2 + NEXT]。甚至是一个带参数的宏，所以你可以说[NEXT(rdx*2)]。如果你不小心，你可能会发现的代码读起来更加混乱。

section .bss
storage: resw 5   ;; reserve 5 words of zero-initialized space
free_position: db 0   ;; uint8_t free_position = 0;

section .data
head: db -1       ;; int8_t head = -1;
section .text

; p is stored in rdx.  It's an integer index into storage
;  We'll access  storage  directly, without loading it into a register.
;  (normally you'd have it in a reg, since it would be space you got from malloc/realloc)
     ; lea rsi, [rel storage]   ;; If you want RIP-relative addressing. 
     ;; There is no [RIP+offset + scale*index] addressing mode, because global arrays are for tiny / toy programs.
    test    edx, edx
    js  .err_empty_list       ;; check for p=empty list (sign-bit means negative)
    movsx   eax, byte [storage + 2*rdx]   ;; load p.next into eax, with sign-extension
    test    eax, eax
    js  .err_empty_list       ;; check that there is a next element
    movsx   eax, byte [storage + 2*rax + 1]  ;;  load storage[p.next].val, sign extended into eax
        ;; The final +1 in the effective address is because the val byte is 2nd.
        ;; you could have used a 3rd register if you wanted to keep p.next around for future use
    ret  ;; or not, if this is just the middle of some larger function

.err_empty_list:   ; .symbol is a local symbol, doesn't have to be unique for the whole file
    ud2   ; TODO: report an error instead of running an invalid insns

请注意，我们可以通过符号扩展到32位reg而不是完整的64位rax来进行较短的指令编码。如果值为负数，我们就不会使用rax作为地址的一部分。我们只是使用movsx来将寄存器的其余部分清零，因为mov al, [storage + 2*rdx]会将rax的高56位保留为旧内容。

另一种方法是使用movzx eax, byte [...] / test al, al，因为8位test的编码和执行速度与32位test指令一样快。此外，在AMD推土机系列CPU上，作为负载的movzx的延迟比movsx低一个周期（尽管它们仍然采用整数执行单元，而不像Intel完全由负载端口处理movsx/zx）。

无论哪种方式，movsx或movzx都是加载8位数据的好方法，因为您可以避免在写入部分reg后读取完整reg的问题，和/或错误依赖（根据reg高位的先前内容，即使您知道您已经将其归零，CPU硬件仍必须跟踪它）。除非你知道你没有针对英特尔哈斯韦尔之前的版本进行优化，否则你不必担心部分寄存器写入。Haswell做双重记账或其他事情来避免额外的uop，以便在阅读时将部分值与旧的全值合并。AMD CPU、P4和Silvermont不会将部分regs与完整regs分开跟踪，所以你只需要担心错误的依赖性。

还要注意，您可以加载打包在一起的next和val，如

.search_loop:
    movzx    eax,  word [storage + rdx*2]   ; next in al,  val in ah
    test     ah, ah
    jz   .found_a_zero_val
    movzx    edx, al        ; use .next for the next iteration
    test     al, al
    jns   .search_loop
    ;; if we get here, we didn't find a zero val
    ret
.found_a_zero_val:
    ;; do something with the element referred to by `rdx`

注意我们无论如何都必须使用movzx，因为一个有效地址中的所有寄存器都必须具有相同的大小。（所以word [storage + al*2]不起作用。）

相反，这可能更有用，在将next放入al，将val放入ah之后，用单个存储来存储节点的两个字段，如mov [storage + rdx*2], ax或其他什么，可能来自不同的源。（在这种情况下，如果您在另一个寄存器中还没有movzx，您可能希望使用常规字节加载，而不是movzx）。这没什么大不了的：不要为了避免进行双字节存储而使代码变得难以阅读或更复杂。至少，直到您发现存储端口uop是某个循环中的瓶颈。

在数组中使用索引而不是指针可以节省大量空间，尤其是在指针占用8字节的64位系统中。如果您不需要释放单个节点（即，数据结构只会增长，或者在删除时一次删除所有节点），那么新节点的分配器是微不足道的：只需将它们固定在数组的末尾和realloc(3)。或者使用c++std::vector。

有了这些构建块，您应该能够实现常见的链表算法。只需使用mov [storage + rdx*2], al或其他什么存储字节。

如果你需要关于如何用干净的算法实现链表的想法，这些算法可以用尽可能少的分支处理所有特殊情况，请看看这个Codereview问题。这是针对Java的，但我的回答非常C风格。其他答案也有一些不错的技巧，其中一些是我在回答时借鉴的。（例如，使用虚拟节点可以避免分支来处理插入作为新头的特殊情况）。

相关内容

最新更新

热门标签：