Here is the traditional Hello World program that uses Linux System calls, for a 64-bit installation:
# ---------------------------------------------------------------------------------------- # Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only. # To assemble and run: # # gcc -c hello.s && ld hello.o && ./a.out # # or # # gcc -nostdlib hello.s && ./a.out # ---------------------------------------------------------------------------------------- .global _start .text _start: # write(1, message, 13) mov $1, %rax # system call 1 is write mov $1, %rdi # file handle 1 is stdout mov $message, %rsi # address of string to output mov $13, %rdx # number of bytes syscall # invoke operating system to do the write # exit(0) mov $60, %rax # system call 60 is exit xor %rdi, %rdi # we want return code 0 syscall # invoke operating system to exit message: .ascii "Hello, world\n"
$ gcc -c hello.s && ld hello.o && ./a.out Hello, World
If you are using a different operating system, such as macOS or Windows, the system call numbers and the registers used will likely be different.
Generally you will want to use a C library. Here's Hello World again:
# ---------------------------------------------------------------------------------------- # Writes "Hola, mundo" to the console using a C library. Runs on Linux or any other system # that does not use underscores for symbols in its C library. To assemble and run: # # gcc hola.s && ./a.out # ---------------------------------------------------------------------------------------- .global main .text main: # This is called by C library's startup code mov $message, %rdi # First integer (or pointer) parameter in %rdi call puts # puts(message) ret # Return to C library code message: .asciz "Hola, mundo" # asciz puts a 0 byte at the end
$ gcc hola.s && ./a.out Hola, mundo
The 64-bit calling conventions are a bit more detailed, and they are explained fully in the AMD64 ABI Reference. You can also get info on them at Wikipedia. The most important points are (again, for 64-bit Linux, not Windows):
(%rsp)
, the first memory parameter is at
8(%rsp)
, etc.This program prints the first few fibonacci numbers, illustrating how registers have to be saved and restored:
# ----------------------------------------------------------------------------- # A 64-bit Linux application that writes the first 90 Fibonacci numbers. It # needs to be linked with a C library. # # Assemble and Link: # gcc fib.s # ----------------------------------------------------------------------------- .global main .text main: push %rbx # we have to save this since we use it mov $90, %ecx # ecx will countdown to 0 xor %rax, %rax # rax will hold the current number xor %rbx, %rbx # rbx will hold the next number inc %rbx # rbx is originally 1 print: # We need to call printf, but we are using eax, ebx, and ecx. printf # may destroy eax and ecx so we will save these before the call and # restore them afterwards. push %rax # caller-save register push %rcx # caller-save register mov $format, %rdi # set 1st parameter (format) mov %rax, %rsi # set 2nd parameter (current_number) xor %rax, %rax # because printf is varargs # Stack is already aligned because we pushed three 8 byte registers call printf # printf(format, current_number) pop %rcx # restore caller-save register pop %rax # restore caller-save register mov %rax, %rdx # save the current number mov %rbx, %rax # next number is now current add %rdx, %rbx # get the new next number dec %ecx # count down jnz print # if not done counting, do some more pop %rbx # restore rbx before returning ret format: .asciz "%20ld\n"
$ gcc fib.s && ./a.out 0 1 1 2 3 ... 420196140727489673 679891637638612258 1100087778366101931 1779979416004714189
This 64-bit program is a very simple function that takes in three 64-bit integer parameters
and returns the maximum value. It shows how to extract integer parameters: They will have
been pushed on the stack so that on entry to the function, they will be in rdi, rsi, and rdx,
respectively. The return value is an integer so it gets returned in rax
.
# ----------------------------------------------------------------------------- # A 64-bit function that returns the maximum value of its three 64-bit integer # arguments. The function has signature: # # int64_t maxofthree(int64_t x, int64_t y, int64_t z) # # Note that the parameters have already been passed in rdi, rsi, and rdx. We # just have to return the value in rax. # ----------------------------------------------------------------------------- .globl maxofthree .text maxofthree: mov %rdi, %rax # result (rax) initially holds x cmp %rsi, %rax # is x less than y? cmovl %rsi, %rax # if so, set result to y cmp %rdx, %rax # is max(x,y) less than z? cmovl %rdx, %rax # if so, set result to z ret # the max will be in eax
Here is a C program that calls the assembly language function.
/* * callmaxofthree.c * * A small program that illustrates how to call the maxofthree function we wrote in * assembly language. */ #include <stdio.h> #include <inttypes.h> int64_t maxofthree(int64_t, int64_t, int64_t); int main() { printf("%ld\n", maxofthree(1, -4, -7)); printf("%ld\n", maxofthree(2, -6, 1)); printf("%ld\n", maxofthree(2, 3, 1)); printf("%ld\n", maxofthree(-2, 4, 3)); printf("%ld\n", maxofthree(2, -6, 5)); printf("%ld\n", maxofthree(2, 4, 6)); return 0; }
To assemble, link and run this two-part program:
$ gcc -std=c99 callmaxofthree.c maxofthree.s && ./a.out 1 2 3 4 5 6
You know that in C, main
is just a plain old function,
and it has a couple parameters of its own:
int main(int argc, char** argv)
Here is a program that uses this fact to simply echo the commandline arguments to a program, one per line:
# ----------------------------------------------------------------------------- # A 64-bit program that displays its commandline arguments, one per line. # # On entry, %rdi will contain argc and %rsi will contain argv. # ----------------------------------------------------------------------------- .global main .text main: push %rdi # save registers that puts uses push %rsi sub $8, %rsp # must align stack before call mov (%rsi), %rdi # the argument string to display call puts # print it add $8, %rsp # restore %rsp to pre-aligned value pop %rsi # restore registers puts used pop %rdi add $8, %rsi # point to next argument dec %rdi # count down jnz main # if not done counting keep going ret format: .asciz "%s\n"
$ gcc echo.s && ./a.out 25782 dog huh $$ ./a.out 25782 dog huh 9971 $ gcc echo.s && ./a.out 25782 dog huh '$$' ./a.out 25782 dog huh $$
Note that as far as the C Library is concerned, command line
arguments are always strings. If you want to treat them as integers,
call atoi
. Here's a little program to compute xy.
Another feature of this example is that it shows how to restrict values
to 32-bit ones.
# ----------------------------------------------------------------------------- # A 64-bit command line application to compute x^y. # # Syntax: power x y # x and y are integers # ----------------------------------------------------------------------------- .global main .text main: push %r12 # save callee-save registers push %r13 push %r14 # By pushing 3 registers our stack is already aligned for calls cmp $3, %rdi # must have exactly two arguments jne error1 mov %rsi, %r12 # argv # We will use ecx to count down form the exponent to zero, esi to hold the # value of the base, and eax to hold the running product. mov 16(%r12), %rdi # argv[2] call atoi # y in eax cmp $0, %eax # disallow negative exponents jl error2 mov %eax, %r13d # y in r13d mov 8(%r12), %rdi # argv call atoi # x in eax mov %eax, %r14d # x in r14d mov $1, %eax # start with answer = 1 check: test %r13d, %r13d # we're counting y downto 0 jz gotit # done imul %r14d, %eax # multiply in another x dec %r13d jmp check gotit: # print report on success mov $answer, %rdi movslq %eax, %rsi xor %rax, %rax call printf jmp done error1: # print error message mov $badArgumentCount, %edi call puts jmp done error2: # print error message mov $negativeExponent, %edi call puts done: # restore saved registers pop %r14 pop %r13 pop %r12 ret answer: .asciz "%d\n" badArgumentCount: .asciz "Requires exactly two arguments\n" negativeExponent: .asciz "The exponent may not be negative\n"
$ ./power 2 19 524288 $ ./power 3 -8 The exponent may not be negative $ ./power 1 500 1
atoi
to strtol
.
Floating-point arguments go int the xmm registers. Here is a simple function for summing the values in a double array:
# ----------------------------------------------------------------------------- # A 64-bit function that returns the sum of the elements in a floating-point # array. The function has prototype: # # double sum(double[] array, unsigned length) # ----------------------------------------------------------------------------- .global sum .text sum: xorpd %xmm0, %xmm0 # initialize the sum to 0 cmp $0, %rsi # special case for length = 0 je done next: addsd (%rdi), %xmm0 # add in the current array element add $8, %rdi # move to next array element dec %rsi # count down jnz next # if not done counting, continue done: ret # return value already in xmm0
A C program that calls it:
/* * callsum.c * * Illustrates how to call the sum function we wrote in assembly language. */ #include <stdio.h> double sum(double[], unsigned); int main() { double test[] = { 40.5, 26.7, 21.9, 1.5, -40.5, -23.4 }; printf("%20.7f\n", sum(test, 6)); printf("%20.7f\n", sum(test, 2)); printf("%20.7f\n", sum(test, 0)); printf("%20.7f\n", sum(test, 3)); return 0; }
$ gcc callsum.c sum.s && ./a.out 26.7000000 67.2000000 0.0000000 89.1000000
The text section is read-only on most operating systems, so you might find the need for a data section. On most operating systems, the data section is only for initialized data, and you have a special .bss section for uninitialized data. Here is a program that averages the command line arguments, expected to be integers, and displays the result as a floating point number.
# ----------------------------------------------------------------------------- # 64-bit program that treats all its command line arguments as integers and # displays their average as a floating point number. This program uses a data # section to store intermediate results, not that it has to, but only to # illustrate how data sections are used. # ----------------------------------------------------------------------------- .globl main .text main: dec %rdi # argc-1, since we don't count program name jz nothingToAverage mov %rdi, count # save number of real arguments accumulate: push %rdi # save register across call to atoi push %rsi mov (%rsi,%rdi,8), %rdi # argv[rdi] call atoi # now rax has the int value of arg pop %rsi # restore registers after atoi call pop %rdi add %rax, sum # accumulate sum as we go dec %rdi # count down jnz accumulate # more arguments? average: cvtsi2sd sum, %xmm0 cvtsi2sd count, %xmm1 divsd %xmm1, %xmm0 # xmm0 is sum/count mov $format, %rdi # 1st arg to printf mov $1, %rax # printf is varargs, there is 1 non-int argument sub $8, %rsp # align stack pointer call printf # printf(format, sum/count) add $8, %rsp # restore stack pointer ret nothingToAverage: mov $error, %rdi xor %rax, %rax call printf ret .data count: .quad 0 sum: .quad 0 format: .asciz "%g\n" error: .asciz "There are no command line arguments to average\n"
Perhaps surprisingly, there's nothing out of the ordinary required to implement recursive functions. You just have to be careful to save registers, as usual. Here's an example. In C:
uint64_t factorial(unsigned n) { return (n <= 1) ? 1 : n * factorial(n-1); }
# ---------------------------------------------------------------------------- # A 64-bit recursive implementation of the function # # uint64_t factorial(unsigned n) # # implemented recursively # ---------------------------------------------------------------------------- .globl factorial .text factorial: cmp $1, %rdi # n <= 1? jnbe L1 # if not, go do a recursive call mov $1, %rax # otherwise return 1 ret L1: push %rdi # save n on stack (also aligns %rsp!) dec %rdi # n-1 call factorial # factorial(n-1), result goes in %rax pop %rdi # restore n imul %rdi, %rax # n * factorial(n-1), stored in %rax ret
An example caller:
/* * An application that illustrates calling the factorial function defined elsewhere. */ #include <stdio.h> #include <inttypes.h> uint64_t factorial(unsigned n); int main() { for (unsigned i = 0; i < 20; i++) { printf("factorial(%2u) = %lu\n", i, factorial(i)); } }
The XMM registers can do arithmetic on floating point values one opeation at a time or multiple operations at a time. The operations have the form:
operation xmmregister_or_memorylocation, xmmregister
For floating point addition, the instructions are:
addpd — do 2 double-precision additions addps — do just one double-precision addition, using the low 64-bits of the register addsd — do 4 single-precision additions addss — do just one single-precision addition, using the low 32-bits of the register
TODO - show a function that processes an array of floats, 4 at a time.
The XMM registers can also do arithmetic on integers. The instructions have the form:
operation xmmregister_or_memorylocation, xmmregister
For integer addition, the instructions are:
paddb — do 16 byte additions paddw — do 8 word additions paddd — do 4 dword additions paddq — do 2 qword additions paddsb — do 16 byte additions with signed saturation (80..7F) paddsw — do 8 word additions with unsigned saturation (8000..7FFF) paddusb — do 16 byte additions with unsigned saturation (00..FF) paddusw — do 8 word additions with unsigned saturation (00..FFFF)
TODO - SHOW AN EXAMPLE
First, please read Eli Bendersky's article That overview is more complete than my brief notes.
When a function is called the caller will first put the parameters in the correct
registers then issue the call
instruction. Additional parameters beyond those
covered by the registers will be pushed on the stack prior to the call. The call instruction puts
the return address on the top of stack. So if you have the function
long example(long x, long y) { long a, b, c; b = 7; return x * b + y; }
Then on entry to the function, x will be in %edi, y will be in %esi, and the return address will be on the top of the stack. Where can we put the local variables? An easy choice is on the stack itself, though if you have enough regsters, use those.
If you are running on a machine that respect the standard ABI, you can leave %rsp where it is and access the "extra parameters" and the local variables directly from %rsp for example:
+----------+ rsp-24 | a | +----------+ rsp-16 | b | +----------+ rsp-8 | c | +----------+ rsp | retaddr | +----------+ rsp+8 | caller's | | stack | | frame | | ... | +----------+
So our function looks like this:
.text .globl example example: movl $7, -16(%rsp) mov %rdi, %rax imul 8(%rsp), %rax add %rsi, %rax ret
If our function were to make another call, you would have to adjust %rsp to get out of the way at that time.
On Windows you can't use this scheme because if an interrupt were to occur, everything above the stack pointer gets plastered. This doesn't happen on most other operating systems because there is a "red zone" of 128 bytes past the stack pointer which is safe from these things. In this case, you can make room on the stack immediately:
example: sub $24, %rsp
so our stack looks like this:
+----------+ rsp | a | +----------+ rsp+8 | b | +----------+ rsp+16 | c | +----------+ rsp+24 | retaddr | +----------+ rsp+32 | caller's | | stack | | frame | | ... | +----------+
Here's the function now. Note that we have to remember to replace the stack pointer before returning!
.text .globl example example: sub $24, %rsp movl $7, 8(%rsp) mov %rdi, %rax imul 8(%rsp), %rax add %rsi, %rax add $24, %rsp ret