Skip to main content

1. Source code (Go)

package main

import "fmt"

func factorial(n int) int {
r := 1
for i := 2; i <= n; i++ {
r *= i
}
return r
}

func main() {
n := 10
result := factorial(n)
fmt.Println(result)
}

Expected output: 3628800


2. Build step (go build)

  • Go compiler translates .go into machine code directly (no separate assembler step like C).
  • Linker links against the Go runtime (no external libc needed; Go has its own fmt, os, syscall, runtime, etc).
  • Produces a static ELF binary (unless you specifically use cgo).

So compared to C: the binary is fatter because it embeds the runtime, garbage collector, scheduler, and type metadata.


3. Program start (process creation)

When you run ./fact:

  1. Shell → execve() syscall (same as C).
  2. Kernel maps ELF into memory (text, data, heap, stack).
  3. Entry point is not main directly, but Go runtime startup (runtime.rt0_go).

4. Go runtime startup

Flow (simplified):

_rt0_go (assembly stub)
-> runtime·rt0_go (sets up argc/argv/envp from stack)
-> runtime·args()
-> runtime·osinit() (OS thread setup)
-> runtime·schedinit() (scheduler init, memory allocator, GC init)
-> newproc(main.main) (create goroutine for user main)
-> runtime·mstart() (start machine, scheduler loop)

So your main.main runs inside a goroutine, scheduled by Go’s M:N scheduler (goroutines multiplexed onto OS threads).

This is the first big difference from C: 👉 In C, main() is just a function on the process’ initial thread stack. 👉 In Go, main.main is wrapped into a goroutine, with its own tiny growable stack, scheduled by Go runtime.


5. Memory layout (Go process under Linux)

Pretty similar high-level layout as C (because kernel decides this), but runtime carves things up differently:

0x00400000  ->  .text (Go program + runtime)
.rodata (string literals, type metadata, itabs, method tables)
.data / .bss (global vars, runtime state)
heap (managed by Go runtime, not libc malloc!)
goroutine stacks (allocated dynamically, mmap’d)
...
0x7fff.... -> OS thread stacks (one per M)
TLS, signal stacks

Key parts:

  • Go heap: Managed by Go runtime + GC. Allocations from new, make, composite literals go here.
  • Goroutine stacks: Start small (2 KB in modern Go) and grow/shrink dynamically (split stacks, implemented with mmap + stack-copying).
  • Global data: Includes runtime’s tables for types, GC metadata, scheduler state, etc.

6. Execution of factorial in Go

  • When main.main runs, Go runtime schedules it on an OS thread (M).

  • Arguments (n) are passed according to Go ABI:

    • In new Go ABI (Go 1.17+), args go in registers (DI, SI, DX, CX, R8, R9) like SysV ABI, with spill slots on stack.
  • Locals (r and i) typically live in registers, unless escape analysis forces them to heap (not in this case).

  • Return value (int) in AX.

So factorial(10) executes similarly at the machine-code level to C, but the stack is not a big fixed 8 MB chunk like pthreads — it’s a tiny growable goroutine stack. If more space is needed, Go runtime transparently allocates a bigger stack and copies frame data over.


7. fmt.Println (runtime + syscalls)

This is where the Go runtime does heavy lifting:

  • fmt.Println → calls into fmt.Fprintlnfmt.Fprintfio.Writer.Write.
  • Default writer is os.Stdout (a Go struct wrapping FD=1).
  • Ultimately calls syscall.Write → makes a raw write(1, buf, len) syscall to the kernel.

So the actual printing mechanism is the same as C (kernel syscall write), but the path is Go runtime code, not libc’s printf.


8. Go scheduler (M:N)

Goroutines are green threads:

  • G = goroutine (has stack, PC, status).
  • M = machine (OS thread executing Go code).
  • P = processor (logical CPU resource, tied to an M, holds run queues).

Execution flow:

  • On startup, runtime creates a G for main.main.
  • Scheduler picks an M (OS thread), assigns a P, and runs that G.
  • If more goroutines exist, they are queued in a work-stealing scheduler (run queues per P, work stealing for load balance).
  • Blocking syscalls (like read, write) may park the goroutine and reassign the M to another G.

This means your single main.main G runs on one thread, but if you launched many goroutines, the runtime transparently multiplexes them across threads.


9. Garbage collection impact

Not visible in this factorial program (no heap allocs), but:

  • Go runtime’s GC runs concurrently.
  • Managed heap is divided into spans & arenas.
  • Write barriers are inserted around pointer stores to track references during concurrent GC.
  • Small goroutine stacks may move during GC (stack shrinking, root scanning).

C doesn’t have this — memory management is manual.


10. Syscalls and process lifecycle

Syscalls you’d see if you strace ./fact:

  • execve (launch binary)
  • mmap (for stacks, heap arena, runtime structures)
  • write (from fmt.Println)
  • exit_group (when program exits)

Go runtime hides all this, but it’s happening under the hood.


11. Comparison: C vs Go path

StageCGo
Entry point_start → __libc_start_main_rt0_go → runtime.schedinit
main runsOn process main thread stackAs a goroutine (small growable stack)
Memory mgmtmalloc/free (libc, brk/mmap)Go runtime heap + GC
Printingprintf (libc → write syscall)fmt.Println (Go runtime → write)
Threadspthreads (1:1 with OS threads)goroutines (M:N scheduler)
StackFixed 8MB per threadStarts ~2KB, grows/shrinks as needed

12. Exit

  • After main.main returns, runtime calls runtime.goexit for cleanup.
  • If all goroutines are done, runtime calls exit → syscall exit_group.
  • Kernel tears down the process like in C.