GNU Assembler Examples

GAS, the GNU Assembler, is the default assembler for the GNU Operating System. It works on many different architectures and supports several assembly language syntaxes. These examples are only for operating systems using the Linux kernel and an x86-64 processor, however.

Getting Started

Here is the traditional Hello World program that uses Linux System calls, for a 64-bit installation:

hello.s

# ----------------------------------------------------------------------------------------
# Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only.
# To assemble and run:
#
#     gcc -c hello.s && ld hello.o && ./a.out
#
# or
#
#     gcc -nostdlib hello.s && ./a.out
# ----------------------------------------------------------------------------------------

        .global _start

        .text
_start:
        # write(1, message, 13)
        mov     $1, %rax                # system call 1 is write
        mov     $1, %rdi                # file handle 1 is stdout
        mov     $message, %rsi          # address of string to output
        mov     $13, %rdx               # number of bytes
        syscall                         # invoke operating system to do the write

        # exit(0)
        mov     $60, %rax               # system call 60 is exit
        xor     %rdi, %rdi              # we want return code 0
        syscall                         # invoke operating system to exit
message:
        .ascii  "Hello, world\n"

$ gcc -c hello.s && ld hello.o && ./a.out
Hello, World

If you are using a different operating system, such as macOS or Windows, the system call numbers and the registers used will likely be different.

Working with the C Library

Generally you will want to use a C library. Here's Hello World again:

hola.s

# ----------------------------------------------------------------------------------------
# Writes "Hola, mundo" to the console using a C library. Runs on Linux or any other system
# that does not use underscores for symbols in its C library. To assemble and run:
#
#     gcc hola.s && ./a.out
# ----------------------------------------------------------------------------------------

        .global main

        .text
main:                                   # This is called by C library's startup code
        mov     $message, %rdi          # First integer (or pointer) parameter in %rdi
        call    puts                    # puts(message)
        ret                             # Return to C library code
message:
        .asciz "Hola, mundo"            # asciz puts a 0 byte at the end

$ gcc hola.s && ./a.out
Hola, mundo

Calling Conventions for 64-bit C Code

The 64-bit calling conventions are a bit more detailed, and they are explained fully in the AMD64 ABI Reference. You can also get info on them at Wikipedia. The most important points are (again, for 64-bit Linux, not Windows):

From left to right, pass as many parameters as will fit in registers. The order in which registers are allocated, are:
- For integers and pointers, rdi, rsi, rdx, rcx, r8, r9.
- For floating-point (float, double), xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7
Additional parameters are pushed on the stack, right to left, and are removed by the caller after the call.
After the parameters are pushed, the call instruction is made, so when the called function gets control, the return address is at (%rsp), the first memory parameter is at 8(%rsp), etc.
THE STACK POINTER %RSP MUST BE ALIGNED TO A 16-BYTE BOUNDARY BEFORE MAKING A CALL. Fine, but the process of making a call pushes the return address (8 bytes) on the stack, so when a function gets control, %rsp is not aligned. You have to make that extra space yourself, by pushing something or subtracting 8 from %rsp.
The only registers that the called function is required to preserve (the calle-save registers) are: rbp, rbx, r12, r13, r14, r15. All others are free to be changed by the called function.
The callee is also supposed to save the control bits of the XMCSR and the x87 control word, but x87 instructions are rare in 64-bit code so you probably don't have to worry about this.
Integers are returned in rax or rdx:rax, and floating point values are returned in xmm0 or xmm1:xmm0.

This program prints the first few fibonacci numbers, illustrating how registers have to be saved and restored:

fib.s

# -----------------------------------------------------------------------------
# A 64-bit Linux application that writes the first 90 Fibonacci numbers.  It
# needs to be linked with a C library.
#
# Assemble and Link:
#     gcc fib.s
# -----------------------------------------------------------------------------

        .global main

        .text
main:
        push    %rbx                    # we have to save this since we use it

        mov     $90, %ecx               # ecx will countdown to 0
        xor     %rax, %rax              # rax will hold the current number
        xor     %rbx, %rbx              # rbx will hold the next number
        inc     %rbx                    # rbx is originally 1
print:
        # We need to call printf, but we are using eax, ebx, and ecx.  printf
        # may destroy eax and ecx so we will save these before the call and
        # restore them afterwards.

        push    %rax                    # caller-save register
        push    %rcx                    # caller-save register

        mov     $format, %rdi           # set 1st parameter (format)
        mov     %rax, %rsi              # set 2nd parameter (current_number)
        xor     %rax, %rax              # because printf is varargs

        # Stack is already aligned because we pushed three 8 byte registers
        call    printf                  # printf(format, current_number)

        pop     %rcx                    # restore caller-save register
        pop     %rax                    # restore caller-save register

        mov     %rax, %rdx              # save the current number
        mov     %rbx, %rax              # next number is now current
        add     %rdx, %rbx              # get the new next number
        dec     %ecx                    # count down
        jnz     print                   # if not done counting, do some more

        pop     %rbx                    # restore rbx before returning
        ret
format:
        .asciz  "%20ld\n"

$ gcc fib.s && ./a.out
                   0
                   1
                   1
                   2
                   3
                 ...
  420196140727489673
  679891637638612258
 1100087778366101931
 1779979416004714189

Mixing C and Assembly Language

This 64-bit program is a very simple function that takes in three 64-bit integer parameters and returns the maximum value. It shows how to extract integer parameters: They will have been pushed on the stack so that on entry to the function, they will be in rdi, rsi, and rdx, respectively. The return value is an integer so it gets returned in rax.

maxofthree.s

# -----------------------------------------------------------------------------
# A 64-bit function that returns the maximum value of its three 64-bit integer
# arguments.  The function has signature:
#
#   int64_t maxofthree(int64_t x, int64_t y, int64_t z)
#
# Note that the parameters have already been passed in rdi, rsi, and rdx.  We
# just have to return the value in rax.
# -----------------------------------------------------------------------------

        .globl  maxofthree
        
        .text
maxofthree:
        mov     %rdi, %rax                # result (rax) initially holds x
        cmp     %rsi, %rax                # is x less than y?
        cmovl   %rsi, %rax                # if so, set result to y
        cmp     %rdx, %rax                # is max(x,y) less than z?
        cmovl   %rdx, %rax                # if so, set result to z
        ret                               # the max will be in eax

Here is a C program that calls the assembly language function.

callmaxofthree.c

/*
 * callmaxofthree.c
 *
 * A small program that illustrates how to call the maxofthree function we wrote in
 * assembly language.
 */

#include <stdio.h>
#include <inttypes.h>

int64_t maxofthree(int64_t, int64_t, int64_t);

int main() {
    printf("%ld\n", maxofthree(1, -4, -7));
    printf("%ld\n", maxofthree(2, -6, 1));
    printf("%ld\n", maxofthree(2, 3, 1));
    printf("%ld\n", maxofthree(-2, 4, 3));
    printf("%ld\n", maxofthree(2, -6, 5));
    printf("%ld\n", maxofthree(2, 4, 6));
    return 0;
}

To assemble, link and run this two-part program:

$ gcc -std=c99 callmaxofthree.c maxofthree.s && ./a.out
1
2
3
4
5
6

Command Line Arguments

You know that in C, main is just a plain old function, and it has a couple parameters of its own:

    int main(int argc, char** argv)

Here is a program that uses this fact to simply echo the commandline arguments to a program, one per line:

echo.s

# -----------------------------------------------------------------------------
# A 64-bit program that displays its commandline arguments, one per line.
#
# On entry, %rdi will contain argc and %rsi will contain argv.
# -----------------------------------------------------------------------------

        .global main

        .text
main:
        push    %rdi                    # save registers that puts uses
        push    %rsi
        sub     $8, %rsp                # must align stack before call

        mov     (%rsi), %rdi            # the argument string to display
        call    puts                    # print it

        add     $8, %rsp                # restore %rsp to pre-aligned value
        pop     %rsi                    # restore registers puts used
        pop     %rdi

        add     $8, %rsi                # point to next argument
        dec     %rdi                    # count down
        jnz     main                    # if not done counting keep going

        ret
format:
        .asciz	"%s\n"

$ gcc echo.s && ./a.out 25782 dog huh $$
./a.out
25782
dog
huh
9971
$ gcc echo.s && ./a.out 25782 dog huh '$$'
./a.out
25782
dog
huh
$$

Note that as far as the C Library is concerned, command line arguments are always strings. If you want to treat them as integers, call atoi. Here's a little program to compute x^y. Another feature of this example is that it shows how to restrict values to 32-bit ones.

power.s

# -----------------------------------------------------------------------------
# A 64-bit command line application to compute x^y.
#
# Syntax: power x y
# x and y are integers
# -----------------------------------------------------------------------------

        .global main

        .text
main:
        push    %r12                    # save callee-save registers
        push    %r13
        push    %r14
        # By pushing 3 registers our stack is already aligned for calls

        cmp     $3, %rdi                # must have exactly two arguments
        jne     error1

        mov     %rsi, %r12              # argv

# We will use ecx to count down form the exponent to zero, esi to hold the
# value of the base, and eax to hold the running product.

        mov     16(%r12), %rdi          # argv[2]
        call    atoi                    # y in eax
        cmp     $0, %eax                # disallow negative exponents
        jl      error2
        mov     %eax, %r13d             # y in r13d

        mov     8(%r12), %rdi           # argv
        call    atoi                    # x in eax
        mov     %eax, %r14d             # x in r14d

        mov     $1, %eax                # start with answer = 1
check:
        test    %r13d, %r13d            # we're counting y downto 0
        jz      gotit                   # done
        imul    %r14d, %eax             # multiply in another x
        dec     %r13d
        jmp     check
gotit:                                  # print report on success
        mov     $answer, %rdi
        movslq  %eax, %rsi
        xor     %rax, %rax
        call    printf
        jmp     done
error1:                                 # print error message
        mov     $badArgumentCount, %edi
        call    puts
        jmp     done
error2:                                 # print error message
        mov     $negativeExponent, %edi
        call    puts
done:                                   # restore saved registers
        pop     %r14
        pop     %r13
        pop     %r12
        ret

answer:
        .asciz  "%d\n"
badArgumentCount:
        .asciz  "Requires exactly two arguments\n"
negativeExponent:
        .asciz  "The exponent may not be negative\n"

$ ./power 2 19
524288
$ ./power 3 -8
The exponent may not be negative
$ ./power 1 500
1

Exercise: Rewrite this example to use 64-bit numbers everywhere. You will also need to switch from atoi to strtol.

Floating Point Instructions

Floating-point arguments go int the xmm registers. Here is a simple function for summing the values in a double array:

sum.s

# -----------------------------------------------------------------------------
# A 64-bit function that returns the sum of the elements in a floating-point
# array. The function has prototype:
#
#   double sum(double[] array, unsigned length)
# -----------------------------------------------------------------------------

        .global	sum
        .text
sum:
        xorpd	%xmm0, %xmm0            # initialize the sum to 0
        cmp     $0, %rsi                # special case for length = 0
        je      done
next:
	addsd   (%rdi), %xmm0           # add in the current array element
        add     $8, %rdi                # move to next array element
        dec     %rsi                    # count down
        jnz     next                    # if not done counting, continue
done:
        ret                             # return value already in xmm0

A C program that calls it:

callsum.c

/*
 * callsum.c
 *
 * Illustrates how to call the sum function we wrote in assembly language.
 */

#include <stdio.h>

double sum(double[], unsigned);

int main() {
    double test[] = {
        40.5, 26.7, 21.9, 1.5, -40.5, -23.4
    };
    printf("%20.7f\n", sum(test, 6));
    printf("%20.7f\n", sum(test, 2));
    printf("%20.7f\n", sum(test, 0));
    printf("%20.7f\n", sum(test, 3));
    return 0;
}

$ gcc callsum.c sum.s && ./a.out
          26.7000000
          67.2000000
           0.0000000
          89.1000000

Data Sections

The text section is read-only on most operating systems, so you might find the need for a data section. On most operating systems, the data section is only for initialized data, and you have a special .bss section for uninitialized data. Here is a program that averages the command line arguments, expected to be integers, and displays the result as a floating point number.

average.s

# -----------------------------------------------------------------------------
# 64-bit program that treats all its command line arguments as integers and
# displays their average as a floating point number.  This program uses a data
# section to store intermediate results, not that it has to, but only to
# illustrate how data sections are used.
# -----------------------------------------------------------------------------

        .globl  main

        .text
main:
        dec     %rdi                    # argc-1, since we don't count program name
        jz      nothingToAverage
        mov     %rdi, count             # save number of real arguments
accumulate:
        push    %rdi                    # save register across call to atoi
        push    %rsi
        mov     (%rsi,%rdi,8), %rdi     # argv[rdi]
        call    atoi                    # now rax has the int value of arg
        pop     %rsi                    # restore registers after atoi call
        pop     %rdi
        add     %rax, sum               # accumulate sum as we go
        dec     %rdi                    # count down
        jnz     accumulate              # more arguments?
average:
        cvtsi2sd sum, %xmm0
        cvtsi2sd count, %xmm1
        divsd   %xmm1, %xmm0            # xmm0 is sum/count
        mov     $format, %rdi           # 1st arg to printf
        mov     $1, %rax                # printf is varargs, there is 1 non-int argument

        sub     $8, %rsp                # align stack pointer
        call    printf                  # printf(format, sum/count)
        add     $8, %rsp                # restore stack pointer

        ret

nothingToAverage:
        mov     $error, %rdi
        xor     %rax, %rax
        call    printf
        ret

        .data
count:  .quad   0
sum:    .quad   0
format: .asciz  "%g\n"
error:  .asciz  "There are no command line arguments to average\n"

Recursion

Perhaps surprisingly, there's nothing out of the ordinary required to implement recursive functions. You just have to be careful to save registers, as usual. Here's an example. In C:

uint64_t factorial(unsigned n) {
    return (n <= 1) ? 1 : n * factorial(n-1);
}

factorial.s

# ----------------------------------------------------------------------------
# A 64-bit recursive implementation of the function
#
#     uint64_t factorial(unsigned n)
#
# implemented recursively
# ----------------------------------------------------------------------------

        .globl  factorial

        .text
factorial:
        cmp     $1, %rdi                # n <= 1?
        jnbe    L1                      # if not, go do a recursive call
        mov     $1, %rax                # otherwise return 1
        ret
L1:
        push    %rdi                    # save n on stack (also aligns %rsp!)
        dec     %rdi                    # n-1
        call    factorial               # factorial(n-1), result goes in %rax
        pop     %rdi                    # restore n
        imul    %rdi, %rax              # n * factorial(n-1), stored in %rax
        ret

An example caller:

callfactorial.c

/*
 * An application that illustrates calling the factorial function defined elsewhere.
 */

#include <stdio.h>
#include <inttypes.h>

uint64_t factorial(unsigned n);

int main() {
    for (unsigned i = 0; i < 20; i++) {
        printf("factorial(%2u) = %lu\n", i, factorial(i));
    }
}

SIMD Parallelism

The XMM registers can do arithmetic on floating point values one opeation at a time or multiple operations at a time. The operations have the form:

operation  xmmregister_or_memorylocation, xmmregister

For floating point addition, the instructions are:

addpd — do 2 double-precision additions
addps — do just one double-precision addition, using the low 64-bits of the register
addsd — do 4 single-precision additions
addss — do just one single-precision addition, using the low 32-bits of the register

TODO - show a function that processes an array of floats, 4 at a time.

Saturated Arithmetic

The XMM registers can also do arithmetic on integers. The instructions have the form:

operation  xmmregister_or_memorylocation, xmmregister

For integer addition, the instructions are:

paddb — do 16 byte additions
paddw — do 8 word additions
paddd — do 4 dword additions
paddq — do 2 qword additions
paddsb — do 16 byte additions with signed saturation (80..7F)
paddsw — do 8 word additions with unsigned saturation (8000..7FFF)
paddusb — do 16 byte additions with unsigned saturation (00..FF)
paddusw — do 8 word additions with unsigned saturation (00..FFFF)

TODO - SHOW AN EXAMPLE

Local Variables and Stack Frames

First, please read Eli Bendersky's article That overview is more complete than my brief notes.

When a function is called the caller will first put the parameters in the correct registers then issue the call instruction. Additional parameters beyond those covered by the registers will be pushed on the stack prior to the call. The call instruction puts the return address on the top of stack. So if you have the function

long example(long x, long y) {
    long a, b, c;
    b = 7;
    return x * b + y;
}

Then on entry to the function, x will be in %edi, y will be in %esi, and the return address will be on the top of the stack. Where can we put the local variables? An easy choice is on the stack itself, though if you have enough regsters, use those.

If you are running on a machine that respect the standard ABI, you can leave %rsp where it is and access the "extra parameters" and the local variables directly from %rsp for example:

                +----------+
         rsp-24 |    a     |
                +----------+
         rsp-16 |    b     |
                +----------+
         rsp-8  |    c     |
                +----------+
         rsp    | retaddr  |
                +----------+
         rsp+8  | caller's |
                | stack    |
                | frame    |
                | ...      |
                +----------+

So our function looks like this:

        .text
        .globl  example
example:
        movl    $7, -16(%rsp)
        mov     %rdi, %rax
        imul    8(%rsp), %rax
        add     %rsi, %rax
        ret

If our function were to make another call, you would have to adjust %rsp to get out of the way at that time.

On Windows you can't use this scheme because if an interrupt were to occur, everything above the stack pointer gets plastered. This doesn't happen on most other operating systems because there is a "red zone" of 128 bytes past the stack pointer which is safe from these things. In this case, you can make room on the stack immediately:

example:
        sub	$24, %rsp

so our stack looks like this:

                +----------+
         rsp    |    a     |
                +----------+
         rsp+8  |    b     |
                +----------+
         rsp+16 |    c     |
                +----------+
         rsp+24 | retaddr  |
                +----------+
         rsp+32 | caller's |
                | stack    |
                | frame    |
                | ...      |
                +----------+

Here's the function now. Note that we have to remember to replace the stack pointer before returning!

        .text
        .globl  example
example:
        sub     $24, %rsp
        movl    $7, 8(%rsp)
        mov     %rdi, %rax
        imul    8(%rsp), %rax
        add     %rsi, %rax
        add     $24, %rsp
        ret