banner
Zein

Zein

x_id

Assembly/Machine Instructions

Assembly refers to the mnemonic of the instruction set, determined by the combination of compiler + style + instruction set architecture (ISA), with no unified standard:
Compiler:

  1. MASM: Only supports Windows platform, the only compiler that perfectly supports on-demand compilation, does not support output in bin format.
  2. NASM: Cross-platform, supports multiple output formats (bin/coff/omf/elf/…)

Style:

  1. Intel style
  2. AT&T style

Mainstream instruction set architectures:

  1. x86-64/x64/amd64/Intel64
  2. ARM64/AArch64
  3. RISC-V
  4. MIPS

x86 Assembly#

Taking the Intel style assembler as an example

Memory and Addressing Modes#

.DATA Declaration of Static Data Area#

Data type modifier primitives:
) DB: Byte, 1 Byte
) DW: Word, 2 Bytes
) DD: Double Word, 4 Bytes
) There are only one-dimensional arrays in assembly, there are no two-dimensional or multi-dimensional arrays. A one-dimensional array is actually a contiguous area in memory. Additionally, DUP and string constants are also two methods to declare arrays.

.DATA
var     DB 64    ; Declare a one-byte variable var and initialize it to 64
var2    DB ?     ; Declare an uninitialized one-byte variable var2, its initial value is undefined until explicitly assigned in the program
        DB 10    ; Declare a one-byte constant with a value of 10, The Byte's location is var2 + 1. Since there is no label, this value cannot be accessed through a label 
X       DW ?     ; Declare an uninitialized variable X of one word size (16 bits), initial value is undefined until explicitly assigned in the program. 
Y       DD 30000 ; Declare a 4-byte value, referred to as location Y, initialized to 30000.

Z       DD 1, 2, 3      ; Declare array Z, each element is 4 bytes in size, initialized to 1, 2, 3. The Z label indicates the starting position of data storage Z + index * 4 is the element address
bytes   DB 10 DUP(?)    ; Declare array bytes, each element is 1 byte in size; 10 DUP(?) indicates declaring 10 uninitialized bytes
arr     DD 100 DUP(0)   ; Declare array arr, each element is 4 bytes in size; 100 DUP(0) indicates declaring 100 elements initialized to 0
str     DB 'hello',0    ; Declare 6 bytes starting at the address str, initialized to hello and the null (0) byte.

Addressing Memory#

MOV moves data between memory and registers (default moves 32-bit data), accepts two parameters: the first parameter is the destination, the second is the source. The source and destination are actually addresses; [register] indicates referencing the value of the register as an address; [var] indicates referencing the address represented by the symbol var;

; Examples of valid addressing: Adding [] indicates that what is stored is the address, referencing the address points to the memory content
mov eax, [ebx]        ; Move the 4 bytes in memory at the address contained in EBX into EAX
mov [var], ebx        ; Move the contents of EBX into the 4 bytes at memory address var. (Note, var is a 32-bit constant).
mov eax, [esi-4]      ; Move 4 bytes at memory address ESI + (-4) into EAX
mov [esi+eax], cl     ; Move the contents of CL into the byte at address ESI+EAX
mov edx, [esi+4*ebx]  ; Move the 4 bytes of data at address ESI+4*EBX into EDX
; Examples of invalid addressing:
mov eax, [ebx-ecx]      ; Can only add values of registers, cannot subtract
mov [eax+esi+edi], ebx  ; At most only 2 registers can participate in address calculation

Size Directives#

Modifiers for pointer types: indicate the number of bits representing the data
) BYTE PTR - 1 Byte
) WORD PTR - 2 Bytes
) DWORD PTR - 4 Bytes

mov BYTE PTR [ebx], 2   ; Move 2 into the single byte at the address stored in EBX.
mov WORD PTR [ebx], 2   ; Move the 16-bit integer representation of 2 into the 2 bytes starting at the address in EBX.
mov DWORD PTR [ebx], 2  ; Move the 32-bit integer representation of 2 into the 4 bytes starting at the address in EBX.

Instructions#

CategoryInstructionDescriptionExample
Data MovementmovCopies the value of the source operand to the destination operandmov ax, bx ; Copies the value of bx register to ax register
pushPushes data onto the stackpush ax ; Pushes the value of ax register onto the stack
popPops the top data from the stack to the destination operandpop bx ; Pops the top value and stores it into bx register
leaLoads effective address, storing the address into the destination registerlea ax, [bx + 4] ; Stores the effective address of [bx + 4] into ax
Arithmetic/Logical OperationsaddPerforms additionadd ax, bx ; Adds bx to ax register
subPerforms subtractionsub ax, bx ; Subtracts the value of bx from ax
incIncrements the operand by 1inc ax ; Increases the value of ax register by 1
decDecrements the operand by 1dec bx ; Decreases the value of bx register by 1
imulSigned multiplicationimul ax, bxax = ax * bx, signed multiplication
idivSigned divisionidiv bx ; Divides ax by bx, result stored in ax and dx
andBitwise AND operationand ax, bx ; Stores the result of bitwise AND operation of ax and bx back into ax
orBitwise OR operationor ax, bx ; Stores the result of bitwise OR operation of ax and bx back into ax
xorBitwise XOR operationxor ax, bx ; Stores the result of bitwise XOR operation of ax and bx back into ax
notBitwise NOT operationnot ax ; Inverts all bits of ax register
negNegation operationneg ax ; Inverts the value of ax register (i.e., adds the opposite number)
shlLeft shift operationshl ax, 1 ; Left shifts ax by 1 bit, the shifted-out bit is discarded
shrRight shift operationshr bx, 1 ; Right shifts bx by 1 bit, the shifted-out bit is discarded
Control FlowjmpUnconditional jumpjmp label ; Jumps to label
je / jzJump if equal (je: jump if equal, jz: jump if zero)je label ; Jumps to label if the zero flag is set (indicating equality)
jneJump if not equaljne label ; Jumps to label if the zero flag is not set (indicating inequality)
jgJump if greaterjg label ; Jumps to label if greater
jlJump if lessjl label ; Jumps to label if less
cmpCompares two operands (by setting the flag register)cmp ax, bx ; Compares the values of ax and bx (sets the flag register)
callCalls a procedure, jumps to the subroutine and pushes the return address onto the stackcall subroutine ; Calls the subroutine
retReturns from the procedure, pops the return address and jumps back to the calling pointret ; Returns from the current procedure to the calling point

Calling Convention#

Subroutine (function) calls must adhere to a common protocol that specifies how to call and how to return from the procedure. For example, given a set of calling convention rules, programmers can determine how to pass parameters to it without looking at the definition of the subfunction. Furthermore, given a set of calling convention rules, high-level language compilers can call assembly functions and high-level language functions mutually as long as they follow these rules.

C Language Calling Convention#

There are various calling conventions. C language calling convention is the most widely used. Following this convention allows assembly code to be safely called by C/C++ and also allows calling C library functions from assembly code.

  1. Strongly relies on hardware stack support (hardware-supported stack)
  2. Based on push, pop, call, ret instructions
  3. Subroutine parameters are passed via the stack: Registers are saved on the stack, and local variables used by the subroutine are also placed on the stack.

Most high-level procedural languages implemented on most processors use calling conventions similar to this. The calling convention is divided into two parts. The first part is for the caller, and the second part is for the callee. It is important to emphasize that incorrect use of these rules will lead to stack corruption, and the program will quickly encounter errors; therefore, extra care is needed when implementing the calling convention in your own subroutines.

image

Caller Rules#

**The caller must save the context before calling the subroutine:

  1. The registers saved by the caller** caller-saved registers: EAX, ECX, EDX; these registers may be modified by the callee, so save them and restore the stack state after the call ends.

  2. Push the parameters to be passed to the subroutine onto the stack: Parameters are pushed onto the stack in reverse order (the last parameter is pushed first). Since the stack grows downwards, the first parameter will be stored at the lowest address (this feature allows for variable-length parameter lists).

  3. Use the call instruction to call the subroutine (function): call will automatically push the return address onto the stack and then start executing the subroutine code. After the subroutine returns (after the call execution ends), the callee will place the return value in the EAX register, from which the caller can read. To restore the machine state, the caller needs to:
    a. Remove the passed parameters from the stack
    b. Restore the registers saved by the caller (EAX, ECX, EDX) — pop them from the stack; the caller can assume that other register values have not been modified.

; Save context
push eax
push ecx
push edx

push [var] ; Push last parameter first
push 216   ; Push the second parameter
push eax   ; Push first parameter last

call _myFunc ; Call the function (assume C naming); call will automatically push the return address onto the stack

; Restore context.
add esp, 24  ; Clean up stack space, restore stack pointer, remove the 6 passed parameters, stack pointer moves to a higher address by 24
mov result, eax   ; Read the return value into the result variable

Callee Rules#

1) Push the value of the original stack frame base register EBP onto the stack, then copy ESP to EBP; as the new base address of the subroutine stack frame**
2) Allocate space for local variables on the stack; the stack grows downwards, so as variables are allocated, the stack pointer continuously decreases.
3) Caller-saved registers callee-saved — push them onto the stack. This includes EBX, EDI, ESI; these registers are the responsibility of the callee to save and restore.

) Execute the subroutine code, store the return value in EAX; when the subroutine returns:
a. Restore the registers that should be saved by the callee (EDI, ESI) — pop them from the stack.
b. Release local variables
c. Restore the caller's base pointer EBP — pop it from the stack.
d. Finally, execute ret, returning to the caller (caller)

.486           ; Instruct the assembler to use what instruction set architecture for compilation
.MODEL FLAT   ; Flat memory model

.CODE         ; Marks the beginning of the code segment in the assembly file

PUBLIC _myFunc ; Externally visible/linkable/non-private function
_myFunc PROC
  ;1)
  push ebp
  mov  ebp, esp
  ;2)
  sub esp, 4      ; Allocate space for a local variable (4 bytes)
  ;3)
  push edi        ; Save register value, EDI will be modified
  push esi        ; Save register value, ESI will be modified
  ;) Subroutine body
  mov eax, [ebp+8]   ; Move the value of the first parameter into EAX
  mov esi, [ebp+12]  ; Move the value of the second parameter into ESI  
  mov edi, [ebp+16]  ; Move the value of the third parameter into EDI
  
  mov [ebp-4], edi   ; Store EDI into the local variable
  add [ebp-4], esi   ; Add ESI to the local variable
  add eax, [ebp-4]   ; Add the contents of the local variable to EAX as the final result
  ;)a
  pop esi          ; Restore the value of register ESI
  pop edi          ; Restore the value of register EDI
  ;)b
  mov esp, ebp     ; Release local variables
  ;)c
  pop ebp           ; Restore the caller's base pointer value
  ;)d
  ret               ; Pop the return address from the stack, jump to the return address to continue executing the caller's code
_myFunc ENDP

END             ; Marks the end of the program


RISC-V Assembly#

ARM Assembly#

Differences Between Intel and AT&T Syntax Styles#

AT&T syntaxIntel syntax
insn source, destinationinsn destination, source

Memory Operations#

AT&T memory addressing uses ()
Intel memory addressing uses []

AT&T syntaxIntel syntax
movl -12(%rbp), %eaxmov eax, DWORD PTR -12[rbp]

Addressing#

AT&T syntax: disp(base, index, scale)
Intel syntax: [base + index*scale + disp]

The final address is base + disp + index * scale

AT&T syntaxIntel syntax
movl -12(%rbp), %eaxmov eax, DWORD PTR -12[rbp]
leaq 0(,%rax,4), %rdxlea rdx, 0[0+rax*8]

Output in Specific Assembly Format#

objdump disassembly: By default, objdump disassembles in AT&T format on Linux, add -M to specify the output format.

gcc -c test.c               // First compile with gcc to create a binary file
objdump -d test.o           // Default outputs AT&T format
objdump -M intel -d test.o  // Can specify output in intel format

You can also choose to output a specific style of assembly compiler, then generate the assembly file for viewing.

gcc -S test.c              //  Default outputs AT&T format
gcc -S -masm=intel test.c  //  Outputs intel syntax format

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.