It’s 2 AM. The server is down. Your application is hanging, your website is unresponsive, and a single process is mysteriously eating 100% of the CPU. You’re losing money. You’re losing sleep. You ssh in… now what?
Or a simpler, more common nightmare: your Linux desktop is frozen. The mouse moves, but you can’t click anything. An application has gone rogue. Do you pull the plug?
Welcome to the world of Linux Process Management.
Understanding what a “process” is and how to find, inspect, and control it is arguably the most important skill for any Linux user, from a casual desktop user to a high-level systems administrator.
This is not a quick 5-command cheat sheet. This is the definitive guide. By the end of this article, you will be able to look at any Linux syste…
It’s 2 AM. The server is down. Your application is hanging, your website is unresponsive, and a single process is mysteriously eating 100% of the CPU. You’re losing money. You’re losing sleep. You ssh in… now what?
Or a simpler, more common nightmare: your Linux desktop is frozen. The mouse moves, but you can’t click anything. An application has gone rogue. Do you pull the plug?
Welcome to the world of Linux Process Management.
Understanding what a “process” is and how to find, inspect, and control it is arguably the most important skill for any Linux user, from a casual desktop user to a high-level systems administrator.
This is not a quick 5-command cheat sheet. This is the definitive guide. By the end of this article, you will be able to look at any Linux system, instantly understand what it’s doing, find the exact program causing a problem, and fix it with surgical precision.
We will go far beyond just ps and kill. We will explore the very philosophy of a process, dissect commands like top and htop column by column, and master the art of sending signals, all while learning why kill -9 should be your absolute, final, last resort.
Let’s get started.
What IS a Process?
Before you can manage processes, you have to understand what they are.
A “program” is a file on your disk. It’s an inert, lifeless set of instructions. For example, /usr/bin/firefox is a program. It just sits there.
A “process” is what happens when you run a program. It’s a program that has been loaded into your computer’s memory (RAM) and is being actively executed by the CPU. It’s a “program in execution.” It is alive.
When you run Firefox, the operating system (Linux) creates a new process. When you open a terminal, that’s a process. When you run ls, that’s a new process that lives for a fraction of a second and then dies. Your entire system is a collection of hundreds or thousands of these processes, all working in parallel.
The Most Important Number: The PID
When Linux creates a process, it gives it a unique number: the PID (Process ID). This is like a social security number for your process. It’s how the system keeps track of it.
If you want to control a process (e.g., kill it, pause it, change its priority), you must know its PID.
The “Family Tree”: PID, PPID, and the Init Process
This is where it gets interesting. Every process is started by another process. This creates a parent-child relationship.
- PID: The process’s own ID.
- PPID: The Parent Process ID. This is the PID of the process that started it.
You can see this in action with the ps command (which we’ll cover in detail later).
# The -f flag shows a "full" format, including PPID
ps -f
Output:
UID PID PPID C STIME TTY TIME CMD
user 1234 1233 0 09:00 pts/0 00:00:00 bash
user 1235 1234 0 09:01 pts/0 00:00:00 ps -f
Look at this. The ps -f command (PID 1235) was started by the bash shell (PID 1234). So, ps‘s PPID is 1234.
This creates a giant “tree” of processes on your system. So who is at the top? Who is the ultimate ancestor?
On most Linux systems, this is the init process (or its modern equivalent, systemd). It is always PID 1. It’s the first process the Linux kernel starts when the system boots, and it is the “mother of all processes.” Every other process on your system is a descendant of PID 1.
Linux Process Flowchart
The “Living Dead”: Orphan and Zombie Processes
This parent-child relationship is critical for system cleanup. But what happens when a parent or child dies unexpectedly? This creates two special “edge cases” you will see on a running server.
- Orphan Process:
- What it is: A child process whose parent process has died.
- What happens: This would be a problem (who “cleans up” the child when it’s done?), but Linux has a built-in solution. Any orphan process is immediately “adopted” by the
initprocess (PID 1).initbecomes its new parent and will wait for it to finish, ensuring it gets cleaned up properly. - Is it bad? No. This is a normal and healthy part of the system.
- Zombie Process (or “Defunct” Process):
- What it is: A child process that has finished and died, but its parent process has not yet acknowledged its death.
- What happens: When a child dies, it sends a “death certificate” (an exit code) to its parent. The parent is supposed to “reap” the child by reading this code, which allows the kernel to fully remove the dead child from the process table. A “zombie” is a process in this tiny limbo state: it’s dead, but its entry in the process table still exists because the (often poorly-coded) parent hasn’t “reaped” it yet.
- How to see them: In
toporps, you’ll see aZin the status column. - Is it bad? A few zombies are harmless. They use no memory or CPU. A large, persistent number of zombie processes, however, points to a bug in the parent application (a “bad parent”).
- Can you
killa zombie? No. You cannot kill a process that is already dead. Akill -9on a zombie will do nothing. The only way to clear a zombie is to kill its parent process. This orphans the zombie,init(PID 1) adopts it, andinit‘s job is to immediately reap all its dead children. So, PID 1 will finally reap the zombie, and it will vanish.
ps — The Static Process Snapshot
ps (process status) is your foundational tool. It takes a “snapshot” of all the processes running at the exact moment you run the command. It’s not real-time; it’s a picture.
There are two main “flavors” of ps syntax, which is a confusing historical quirk. You will see both, so you must know both.
- BSD Syntax (common on systems like Ubuntu, Debian, macOS): No hyphen (
-). Example:ps aux - System V Syntax (common on systems like Red Hat, Fedora, CentOS): Requires a hyphen (
-). Example:ps -ef
You can use both on almost any modern Linux system, but they show slightly different information.
The “BSD” Way: ps aux
This is the most common command new users learn. It’s my personal favorite for a quick, “tell me everything” overview.
a: Show processes for all users (not just you).u: Show in a “user-oriented” format (shows theUSERand gives more detail).x: Show processes that don’t have a “controlling TTY” (this includes system daemons and background processes).
Running ps aux gives you a lot of data. Let’s break it down, column by column.
Command: ps aux Output (example):
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 169444 11624 ? Ss 08:00 0:01 /sbin/init
root 987 0.0 0.2 1109924 16244 ? Sl 08:00 0:02 /usr/sbin/somedaemon
user 1234 0.0 0.1 25100 8620 pts/0 Ss 09:00 0:00 /bin/bash
user 1245 99.9 2.5 120456 80123 pts/0 R+ 09:01 5:12 ./rogue-script
user 1246 0.0 0.0 17236 3652 pts/0 T 09:01 0:00 vim somefile.txt
user 1247 0.0 0.0 0 0 ? Z 09:01 0:00 [my-app] <defunct>
user 1248 0.0 0.0 21100 4652 pts/1 S+ 09:02 0:00 ps aux
ps aux Command Explanation
-
USER: The user who owns this process. -
PID: The Process ID. This is the number you need forkill. -
%CPU: CPU usage. This is a “lifetime average,” which is why it’s often misleading.topis much better for real-time CPU. (Note ourrogue-scriptat 99.9%!). -
%MEM: Memory (RAM) usage. A percentage of your total physical RAM. -
VSZ(Virtual SiZe): The total virtual memory size of the process. This is everything the process thinks it has, including memory that’s been paged out to disk (swap). It’s usually a very large, scary, and not very useful number. -
RSS(Resident Set Size): This is the most important memory number. It’s the actual amount of physical RAM (not on disk) the process is currently using. This is what you should look at to see who is “eating your RAM.” -
TTY: The “controlling terminal” for the process. -
pts/0: A pseudo-terminal (like the terminal window you’re in). -
?: No controlling terminal. This means it’s a system daemon or background process. -
STAT(STATus): The most important status column. This tells you what the process is doing right now. -
R(Running): The process is actively running on the CPU or is in the “run queue,” ready to run. Ourrogue-scriptisR. -
S(Sleeping): The process is “interruptible sleep.” It’s waiting for something (like user input, or a network connection). 99% of your processes will beS. This is normal. -
D(Disk Sleep): “Uninterruptible sleep.” The process is stuck waiting for I/O (like a disk read/write). This is bad. If you see this, the process is likely “stuck,” and evenkill -9might not work until the I/O completes. -
T(Stopped): The process has been stopped by a signal (e.g., you hitCtrl+Z). Ourvimprocess isT. -
Z(Zombie): It’s dead. Look, we found our zombie from Part 1! -
+: It’s a “foreground” process (like ourbashandpscommands). -
s: It’s a “session leader” (like ourbashshell). -
<: High-priority (“niced”). -
N: Low-priority (“niced”). -
L: Pages are locked into memory. -
START: The time of day the process was started. -
TIME: The total amount of CPU time this process has used since it started. Ourrogue-scripthas been running for 5 minutes and 12 seconds of pure CPU time. -
COMMAND: The command that was used to start the process.
The “SysV” Way: ps -ef
This is the other common command, often used by Red Hat/CentOS admins.
-e: Show every process.-f: Show “full” format (includesPPID!).
Command: ps -ef Output (example):
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 08:00 ? 00:00:01 /sbin/init
root 987 1 0 08:00 ? 00:00:02 /usr/sbin/somedaemon
user 1234 1233 0 09:00 pts/0 00:00:00 /bin/bash
user 1248 1234 0 09:02 pts/0 00:00:00 ps -ef
The columns are different, but you know them all now:
UID: Same asUSER.PID: Process ID.PPID: Parent Process ID! This is the killer feature. It’s how you trace family trees.C: CPU usage (again, an average).STIME: Start time.TTY: Same as before.TIME: Total CPU time.CMD: Same asCOMMAND.
Which is better? aux or -ef?
- Use
ps auxfor a quick look at memory (RSS) and CPU (%CPU). - Use
ps -efwhen you need to see the parent-child relationships (PPID).
ps Cookbook (Practical Recipes)
- Recipe 1: Find a specific process (the
greppipe). This is the #1 use case forps.ps aux | grep "firefox" - Recipe 2: See the process tree. This is a fantastic way to visualize the parent-child relationships.
ps -f # Or, if installed, the dedicated \pstree` command: pstree` - Recipe 3: Find all processes for a specific user.
ps -u username - Recipe 4: A custom
psformat. You can get exactly the columns you want. This is great for scripting.# Show me just the PID, %CPU, %MEM, and Command for all processes ps aux --sort=-%mem | awk '{print $2, $3, $4, $11}' # This example sorts by memory and just prints those 4 columns.
top — The Real-Time Process Dashboard
ps is a static picture. top is a live video.
It’s an interactive, full-screen, real-time dashboard of everything happening on your system right now. When a sysadmin logs into a slow server, top is the first command they run.
Just type top and hit Enter. You’ll be presented with a full-screen UI that updates every few seconds.
Let’s break down this intimidating screen, from top to bottom.
Top UI
The Summary (The “Uptime”)
top - 09:30:15 up 1:30, 2 users, load average: 0.15, 0.08, 0.05
-
09:30:15: Current time. -
up 1:30: How long the system has been running (1 hour, 30 mins). -
2 users: How many users are logged in. -
load average: 0.15, 0.08, 0.05: This is the most important line. It’s the system’s “load” over the last 1, 5, and 15 minutes. -
What is “load”? It’s a measure of how many processes are running (
Rstate) or waiting for disk I/O (Dstate). -
How to read it: On a single-core CPU, a load of
1.0means it is 100% utilized. A load of2.0means it’s 100% utilized, and another 100% worth of work is waiting in the queue. -
On a multi-core system: This number is relative to your cores. On a 4-core system, a load of
4.0is 100% utilization. A load of1.0is 25% utilization. -
Rule of Thumb: Look at the 5 and 15-minute averages. If they are consistently higher than your number of CPU cores, your server is overloaded.
The Tasks
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie This is a direct, real-time count of your process STAT codes.
1 running: This is thetopcommand itself (or our rogue script!).179 sleeping: Normal.0 stopped,0 zombie: Good. If you see zombies here, it’s time to investigate.
The CPU(s)
%Cpu(s): 1.2 us, 0.5 sy, 0.0 ni, 98.0 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st This is a deeply informative breakdown of what your CPU is spending its time on.
us(user): Time spent running user processes (e.g., your app,bash).sy(system): Time spent running kernel processes (e.g., drivers, system calls).ni(nice): Time spent running low-priority (“niced”) processes.id(idle): This is the most important number. It’s how much CPU you have free. 98% idle is a happy, bored CPU. 0% idle is a 100% busy CPU.wa(IO-wait): This is the other most important number. It’s the time the CPU spent waiting for disk I/O to complete. If youridis high butwais also high, it means your CPU is fine, but your hard drives are the bottleneck.hi(hardware interrupts),si(software interrupts),st(steal time, for VMs): More advanced, but generally you want them to be low.
The Memory
MiB Mem : 8000.0 total, 4000.0 free, 2000.0 used, 2000.0 buff/cache MiB Swap: 4000.0 total, 4000.0 free, 0.0 used, 6000.0 avail Mem This is your RAM.
Mem(Physical RAM): Showstotal,free, andused.buff/cache: This is a key Linux concept. Linux uses all your free RAM to “cache” files from the disk to make things faster. This is good! This memory is instantly given to an application if it needs it.avail Mem(Available): This is the real “free” memory number. It’sfree+buff/cache. This is what you should look at.Swap: Your disk-based “emergency” memory. Ifusedswap is high, it means you’ve run out of real RAM and your system is now using the slow hard drive. This is bad and is called “swapping” or “thrashing.”
The Process List
This is the live, updating ps-like list. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
-
PID: Process ID. -
USER: Process owner. -
PR(Priority): The actual priority, as scheduled by the kernel. -
NI(Nice): The “nice” value. This is a user-controllable priority modifier. -
Ranges from
-20(highest priority) to+19(lowest priority). -
Normal processes are
0. -
VIRT: Virtual Memory (same asVSZinps). Ignore it. -
RES: Resident Memory (same asRSSinps). Look at this one. -
SHR: Shared Memory. -
S: Process Status (R,S,D,Z,T). -
%CPU: Real-time CPU usage. This is the column to find CPU hogs. -
%MEM: Real-time memory usage (based onRES). -
TIME+: Total CPU time, with high precision. -
COMMAND: The command.
top Interactive Commands
top is not just for viewing.
q: Quit.P(Shift+p): Sort by Processor. This is the default.M(Shift+m): Sort by Memory (%MEM). This is the #1 way to find RAM hogs.T(Shift+t): Sort by Time (TIME+).k: Kill a process.
- Press
k. topwill ask:PID to kill:.- Type the PID of the rogue process and hit Enter.
topwill ask:Send signal (15):.- Just hit Enter to send the default, safe
SIGTERM(15). (We’ll cover this in Part 7).
r: Renice a process (change its priority).
- Press
r. topwill ask:PID to renice:.- Type the PID and hit Enter.
topwill ask:Renice value:.- Type
10to make it lower priority (nicer). Hit Enter.
s: Change the update speed.1: (The number 1) Toggle “summary” mode. This shows you the CPU usage for every single CPU core individually. Essential for multi-core systems.
htop — The Interactive, Modern Upgrade
top is powerful, but it’s old and clunky. htop is the modern, user-friendly, colorful, and vastly superior alternative.
If top is a DOS program, htop is a modern GUI. It is not installed by default, but it should be the first thing you install on any new server.
# On Ubuntu/Debian
sudo apt install htop
# On Red Hat/Fedora/CentOS
sudo dnf install htop
Just type htop and see the magic.
Htop UI
Why htop is Better:
- Color: It’s colorful and easy to read.
- Visual Graphs: It shows you beautiful text-based graphs for every CPU core, plus your Memory and Swap.
- Mouse Support: You can click on a process to select it.
- Scrolling: You can scroll up and down the process list with your arrow keys or mouse wheel.
- No Magic Keys: All the “interactive commands” are listed at the bottom as F-keys (F1-F10).
F9(Kill): Select a process, press F9. A menu pops up showing you all the signals (SIGTERM, SIGKILL, etc.). Just select one and press Enter. It’s 1000x more intuitive.F7/F8(Nice): Press F7 to decrease priority (be “nicer”). Press F8 to increase priority.F4(Filter): Want to find “firefox”? Press F4, type “firefox,” and the list is instantly filtered. This replaces the clunkyps | grepcombo.F5(Tree): Press F5 to see the process list as apstree-style family tree.
Rule: Learn top because it’s on every system. But use htop every day.
nice and renice — Controlling Priority
We saw the NI (Nice) column. How do we control it? You use the nice and renice commands.
nice: Starts a new command with a specific priority.renice: Changes the priority of an already running process.
Remember the scale: -20 (highest priority) to +19 (lowest priority). Only root can set a negative (higher) priority.
Example: nice You need to run a massive, CPU-intensive data-processing script, but you don’t want it to slow down the whole server. You can start it with a low priority (a “nice” value of 15).
# The -n flag sets the nice value
nice -n 15 ./my_big_script.py
This script will now run, but the kernel will always give preference to other, more important tasks (like the web server or your SSH connection).
Example: renice You forgot to use nice, and my_big_script.py is now running and hogging the CPU. top tells you its PID is 1245. You can “renice” it while it’s running.
# renice [priority] [PID]
renice 10 1245
# Output: 1245 (process) old priority 0, new priority 10
You’ve just fixed the problem without killing the process!
jobs, fg, bg — Foreground & Background
This is the other side of process management: controlling processes in your own terminal.
- A foreground process is one that is attached to your terminal. It has your “focus.” When you run
vim, it’s a foreground process. You can’t type new commands until it exits. - A background process is one that is “detached” from your terminal. It’s running, but it’s not “in your way.”
You can move processes between these two states.
- Start a long-running process, like a test script:
./run_tests.sh - Oh no, it’s going to take 10 minutes. You need to do something else.
- Press
Ctrl+Z. Output:[1]+ Stopped ./run_tests.shYou have just sent aSIGSTOP(Stop) signal to the process. It is paused, frozen in memory. - You’re back at your prompt. How do you see your stopped jobs?
jobs # Output: [1]+ Stopped ./run_tests.sh - You have two choices:
- Resume in the Background (
bg): You want it to keep running, but in the background.bg %1 # Output: [1]+ ./run_tests.sh &The%1refers to “job 1”. The process is now running again, but you have your terminal back. - Resume in the Foreground (
fg): You’re ready to bring it back to the front.fg %1The process is back, and you’re “inside” it again.
Pro-Tip: You can start a process in the background from the start by adding an ampersand (&) to the end of the command.
# This starts the script and immediately gives you your prompt back
./run_tests.sh &
kill — The Art of Sending Signals
This is the most misunderstood, and most powerful, part of the toolkit.
kill does not mean “kill”. kill means “send a signal”.
A “signal” is a short message the kernel sends to a process. The process can choose what to do when it gets a signal.
You can see all 64 available signals by typing: kill -l
You only need to know about 5 of them.
Signal 1: SIGHUP (1) — The “Reload” Signal
- Signal:
SIGHUP(Signal 1) - Command:
kill -HUP [PID]orkill -1 [PID] - What it does: “Hang Up.” In the old days, this meant the user “hung up” their terminal. Now, it’s a standard, polite way to tell a daemon to “reload your configuration file.”
- Use Case: You just edited
/etc/nginx/nginx.conf. You don’t need to restart the web server (which would drop connections). You just tell it to reload the config.# Find the nginx master process pgrep nginx # Output: 1234 # Tell it to reload sudo kill -HUP 1234
Signal 2: SIGINT (2) — The “Interrupt” Signal
- Signal:
SIGINT(Signal 2) - Command:
kill -INT [PID]orkill -2 [PID] - What it does: “Interrupt.” This is the signal that is sent when you press
Ctrl+C. - Use Case: It’s a polite “please stop what you’re doing and quit.” Most programs “catch” this signal and run a cleanup routine (close files, delete temp files) before exiting.
Signal 3: SIGTERM (15) — The “Polite” Kill
- Signal:
SIGTERM(Signal 15) - Command:
kill [PID](This is the default signal) - What it does: “Terminate.” This is the standard, generic, “please shut down” signal. It’s not
Ctrl+C. It’s a more formal request. - Use Case: This is your first, default kill command.
# Find the rogue process PID from 'top' kill 1245This gives the process a chance to shut down gracefully. It might take a second or two. This is the “right” way to kill a process.
Signal 4: SIGSTOP (19) / SIGCONT (18) — The “Pause/Resume”
-
Signal:
SIGSTOP(19) /SIGCONT(18) -
Command:
kill -STOP [PID]andkill -CONT [PID] -
What it does: These are the signals behind
Ctrl+Z,fg, andbg. -
SIGSTOPis an unblockable “pause” button. The process must freeze. -
SIGCONTtells it to “continue.” -
Use Case: You can pause a process without
Ctrl+Z.kill -STOP 1234will freeze process 1234.kill -CONT 1234will unfreeze it.
kill -9 — The Unstoppable Force (and Why It’s a Last Resort)
This is the one everyone knows. And it’s the one you should use least.
Signal 5: SIGKILL (9) — The “Assassination”
-
Signal:
SIGKILL(Signal 9) -
Command:
kill -KILL [PID]orkill -9 [PID] -
What it does: This is not a request. It’s a demand. This signal goes straight to the kernel, not the process. The kernel does not ask the process to shut down. It immediately removes the process from the scheduler, deallocates its memory, and kills it.
-
What this means: The process has ZERO chance to clean up.
-
It cannot save its work.
-
It cannot delete its temporary files.
-
It cannot close database connections.
kill -9 is the “rip the power cord out of the wall” of process management.
Why is this bad? You might leave behind corrupted data. You might leave behind lock files that prevent the application from starting again. You might leave open connections that clog the database.
When should you use kill -9? ONLY after you have already tried a polite kill [PID] (SIGTERM) and waited 5-10 seconds, and the process still hasn’t died.
The Professional Sysadmin’s Workflow:
top(orhtop) to find thePIDof the rogue process.kill 1245(SendSIGTERM).- Wait 10 seconds. Watch
top. - …If it’s still there…
kill -9 1245(SendSIGKILL).- Investigate why it got stuck and check for any collateral damage (like lock files).
The Modern Toolkit: killall, pgrep, and pkill
It’s often a pain to type ps aux | grep "firefox" just to find a PID. The “p” suite of commands solves this.
killall
This does exactly what it says. It kills all processes matching a name.
# This will find ALL processes named "firefox" and send SIGTERM
killall firefox
This is convenient, but dangerous. What if you accidentally kill a system process?
pgrep (Process Grep)
This is the safe version. It just finds the PIDs for you.
pgrep firefox
# Output: 1234
# 1236
# 1239
It’s just the PIDs. No other junk. This is perfect for scripting.
pkill (Process Kill)
This is the modern killall. It combines pgrep and kill. It finds processes matching a pattern and sends a signal to them.
# Same as 'killall firefox', but more modern
pkill firefox
Why pkill is a superpower:
- It’s more than just a name. By default, it searches the process name. But you can make it search the full command.
# Kill a specific script, not just "python" pkill -f "python my_bad_script.py" - You can kill by user. This is amazing.
# Log out a misbehaving user by killing all their processes pkill -u username
You Are Now the Process Master
You made it. You have gone from “what is a process” to “how to safely manage a server’s entire workload.”
This is the power of Linux. You are not shielded from the system; you are given a complete toolkit to understand and control it.
Let’s recap your new toolkit:
-
The Concept: A process is a “program in execution” with a
PIDandPPID. It can beR(Running),S(Sleeping),D(Disk Wait),T(Stopped), orZ(Zombie). -
The Snapshot:
ps aux(BSD, seeRSSmemory) andps -ef(SysV, seePPIDparent) give you a static picture. -
The Dashboard:
topis the real-time, universal tool.htopis the modern, superior, interactive upgrade you should install immediately. -
The Governor:
niceandrenicecontrol a process’s priority (NI) so you can make CPU hogs “play nice” with others. -
The Terminal:
Ctrl+Z(SIGSTOP),jobs,bg, andfglet you manage your own terminal’s foreground and background tasks. -
The Scalpel:
killis for sending signals. -
kill [PID](SIGTERM15) is the polite first request. -
kill -HUP [PID](1) is for reloading configs. -
kill -9 [PID](SIGKILL9) is the impolite last resort, the “assassination” that can cause data corruption. -
The Modern Kit:
pgrepfinds PIDs.pkillkills processes by name, user, or full command string, and is the modern successor tokillall.
You’re no longer the person who pulls the plug. You’re the sysadmin who logs in, runs htop, sorts by %CPU, finds the rogue process, renices it to 19, and then sends a polite SIGTERM to shut it down, all without breaking a sweat.
Go forth and manage.