Process lifecycle - ImaSystems.Engineer

giphy

There’s a lot of production confusion that disappears if I can answer one boring question precisely:

“What happened to the process?”

Not “what happened to the code”, not “what happened to the server”, not “what happened to Kubernetes”.

The OS runs my app as a process. And processes have a very small set of ways they can begin, change state, and end. That’s the process lifecycle.

The words I’m grounding here are: process start, exit code, crash, signal, graceful shutdown, and forced shutdown.

Process start is an OS event, not a vibe

A process “starting” isn’t “the app came alive”. It’s the OS creating a process and loading an executable into it.

At process start time, the OS decides a bunch of concrete things:

what executable is being run
what arguments and environment variables it gets
what its working directory is
what user/permissions it runs with
what file descriptors it starts with (stdout/stderr, network sockets, log pipes, etc.)

This is where a lot of “it works locally” vs “it fails in prod” differences sneak in, because “start” includes identity, permissions, paths, and I/O wiring.

“Running” is the boring part (and that’s the point)

Once started, the process is just executing in user space. If it’s a server, it sits in a loop: accept a request, do work, write a response, repeat.

Most of the interesting lifecycle moments happen at the boundaries: how it starts, how it is interrupted, and how it ends.

A process ends by exiting (and it leaves an exit code)

A normal end is: the process returns from main (or equivalent) and exits.

When that happens, the OS records an exit code:

0 usually means “success”
non-zero usually means “some kind of failure”

Exit codes are simple but extremely useful, because they’re a stable interface between “the process” and “the thing supervising it” (a shell, systemd, a container runtime, an orchestrator).

Crash: the process did not choose to exit

A crash is when a process stops because of an unrecoverable fault, not because the program intentionally finished.

Sometimes that shows up as a non-zero exit code. Sometimes it shows up as termination by a signal (for example, invalid memory access).

From the outside, the point is the same: the process ended abruptly, and whatever was “in memory” at the time is gone.

Signals: how the OS taps a process on the shoulder

A signal is the OS telling a process “an event happened. React to this”.

Signals are how interrupts and shutdown requests usually arrive.

Some common ones you’ll see in real life:

SIGTERM: “please terminate” (polite shutdown request)
SIGINT: “interrupt” (often from Ctrl+C)
SIGKILL: “stop right now” (cannot be handled. It just dies)

Graceful vs forced shutdown

When a system wants a process to stop (during a deploy, a restart, scaling down), the best-case path is:

Send a signal that can be handled (often SIGTERM).
The process performs a graceful shutdown: stop accepting new work, finish or abort in-flight work safely, flush buffers, close connections, and exit with a clear code.

If that doesn’t happen fast enough (or the process is wedged), the system escalates to a forced shutdown (often SIGKILL).

This distinction matters because “the process stopped” can mean either:

the app chose a clean exit, leaving the world consistent, or
the OS stopped it mid-flight, and you need to assume work was interrupted

A concrete mental model to keep around

For most application work, you don’t need kernel details. You just need to be able to classify the end of a process:

It exited (and has an exit code).
It crashed (faulted) and ended unexpectedly.
It was terminated by a signal (polite or forced).

Once you know which one happened, a lot of “mystery behavior” becomes actionable.

That’s it, may the force be with you!