Processes In Linux - How Linux Works Internally? Explained in Detail
This is the fourth blog in the Linux Masterclass Series.
Table of contents
Processes are programs running on your Linux system, managed by the Kernel. Each Process has a PID (Process ID) associated with it used by the Kernel to identify it. If we execute the command ps
The terminal shows us four columns.
PID - Process ID
TTY – 'Controlling Terminal' of the process. Most Processes we run are associated with a terminal. The ID of that terminal is written in the TTY.
TIME – Total CPU Usage Time a Process is taking
CMD – CMD (Command) is the name of the executable command/program currently running in the system.
If you open another terminal window while the first one is still open and run the ps
command, the TTY value will change from 0 to 1.
ps aux command
In the ps aux
command, a, u, and x are the flags of the ‘ps’ command.
- a – All the processes, even the ones run by other users
- u – More details about the processes
- x – All the processes, even the ones running in the background with no associated terminal (TTY).
If you run the ps aux
command, it opens a large output with more rows and columns.
- USER – Every Process has an owner.
- PID – Process ID.
- %CPU – Percentage of CPU time a Process is taking.
- %MEM – Percentage of Memory a process takes out of the entire physical memory of the system.
- VSZ – When a Process starts up, Virtual Memory Usage or VSZ is the maximum memory in kb available to the entire Process. It can use some portion or all of the allocated memory but not more.
- RSS – The amount of memory in kb currently used by the process out of the total VSZ.
- TTY – Earlier, we saw the TTY column only consisted of values like ‘pts/o’ or ‘pts/1’, but now it also has ‘?’, ‘tty2’, and ‘tty3’ as the values. ‘?’ means no terminal associated. We’ll understand the other values below in the Controlling Terminal section.
- STAT – This tells us the process status.
- START – The exact time a process started.
- TIME – Total CPU Usage Time a process is taking.
- CMD – Name of the command/program.
top command
If you run the top
command, this will show the real-time version of the ps aux
data and much more. Here the values keep fluctuating based on what happens in the system. Press q
to get back to the terminal.
The top
command helps us to find out what processes might be slowing down or affecting the system in any other way.
Controlling Terminal (terminal and pseudo-terminal)
Earlier in the TTY section of ps aux
, we saw two types of values.
Tty1, tty2 or tty3 – Regular Terminal
pts/0 or pts/1 - Pseudo Terminal
A regular Terminal is a terminal that has no graphics associated with it. It is only a shell where you can only use a keyboard to type commands without any GUI elements. Type ‘Ctrl+Alt+F3' to access TTY3, and to exit type ‘Alt+left arrow’.
The processes not associated with any terminal and run without it are known as Daemon Processes. They start when the system boots up and terminate when it shuts down. The daemon processes run in the background, ensuring our system runs fine. We do not want them to close, as our system might crash without them. That is the reason they are not associated with a controlling terminal.
All the processes in the ps aux
output with ‘?’ under the TTY column are essential processes without any associated terminal. Because if we accidentally close a terminal that is running one of them, our system will crash.
Processes are an instance of a Program
Open three terminals and execute thecat
command in two of them with or without an argument. And in the third one, execute ps aux | grep cat
where you are transferring the output of ps aux
to the grep
command that will search the input for ‘cat’ and show output in the terminal with all the lines that contain ‘cat’.
Notice that there are two instances of cat
, one running in pts/1
and the other in pts/2
. Although both terminals execute the same command or program, we still have two instances. If you close the other two terminals, there will be no instances of cat
running in pseudo-terminals anymore. This example proves that ‘processes’ are instances of a running program. We can have as many instances of a program in our system as we want.
Who manages the Processes? And how?
By managing, I mean who reserves/allocates and makes decisions about the VSZ memory, CPU time etc., for a process to use. The answer is the ‘Kernel’. The Kernel manages all the processes in the system. It loads up the code of a program in the memory, allocates all the resources that the process might need, and keeps all of them in check. It knows the status, the owner, and the resources allocated/consumed by a process.
When a process ends, the Kernel freezes up the resources the process was using and makes them available for other processes to use.
Now let us come to the How part. When the user creates an instance of a program, it clones using the ‘Fork System Call’. Fork System Call creates an identical child process (or we can say, forks it) and assigns a Process ID (PID) to it. The original process that this child process was cloned or forked from becomes the Parent Process.
Parent & Child Processes
A Child Process can use the same ‘Program’ that its parent process was using or in most cases, it can use ‘execve system call’ through which it can run a new program.
If you run the command ps –l
, it shows a more detailed view of the ps
command output where there is a new column called PPID
.
The ps
command has a personal ‘PID’ of 13087 and a ‘PPID’ of 11530. This PPID of 11530 is the same as PID 11530 of the bash
command. So, we can say the ‘ps’ command process is the child process of the ‘bash’ command. But you can see the ‘bash’ command also has a PPID, which means it also has a parent process. We can find it out by running ps aux | grep 11522
where 11522 is the PPID of bash.
We can see that bash
is the child process of /usr/libexec/gnome-terminal-server
.
Mother Process
Now, you may wonder if every process is the child of its parent process, then there must be a process above all other processes, the Mother process. This Mother Process is known as ‘init’. When the system boots up, it creates the mother process ‘init’ with the PID of 1. ‘init’ runs on root privileges with the ‘root’ user as the owner and is responsible for most of the processes responsible for running the system.
By running the ps aux
command, the first line shows the ‘init’ process with the PID of 1 and owner as ‘root’. It has the highest ‘VSZ’ allocated and ‘RSS’ used. All the other processes below, with TTY as ‘?’ and PIDs consecutively increasing, are the Daemon Processes essential for running the system. They are all child processes or forks of the ‘init’ mother process.
Termination of a Process
The system terminates a Process using the ‘-exit system call’. It is responsible for freeing up all the utilized resources. When a process terminates, it lets the kernel know why it is terminating with a ‘Termination Status’. The most common Termination status is ‘0’, which means ‘the process succeeded’.
When a child process terminates, its parent process has to acknowledge its status with the ‘wait system call’ and issue the termination status.
There are different use cases when it comes to the termination of a process. Let us understand them.
Orphan Process
In this case, the parent process gets terminated instead of the children. These Child Processes are known as Orphan Processes. They do not have a Parent Process that can use the ‘wait system call’ to acknowledge the termination of their child. In such a case, the Kernel puts such Orphan Processes under the control of the ‘init’ mother process.
Zombie Process
In this case, the parent process exists but does not acknowledge the termination of the child process and does not call the ‘wait system call’ for some reason. So, the Kernel turns such a Child Process into a ‘Zombie Process’. It is living dead, meaning it is terminated, but still exists because the system does not know its termination status.
Let us assume after the child process turns into a zombie process, the Parent process calls the ‘wait system call’. This is known as ‘reaping’. But if the ‘reaping’ does not happen, the ‘init’ process adopts the zombie process, calls the ‘wait system call’ and issues a termination status so the zombie process can rest in peace.
Signal
A Signal is a notification sent to a Process that something has happened. When a user kills or suspends a process, the process needs to know that it has been killed/suspended only then the user action will transpire. Maybe some hardware or software issue occurred, and the Kernel wants to notify the Process. All this happens with the help of signals.
There are many things a Process can do with a signal it receives. It can ignore the signal or catch it and perform a signal handler routine based on the configuration. All processes have signal handlers that are responsible for handling different signals.
For example, the software is running and as soon as it receives a termination signal saying it has to stop working and terminate itself, the signal handler will save the collected data and terminate the process. There are different types of signals:
- SIGHUP/HUP/1 – Hang up
- SIGINT/INT/2 – Interrupt
- SIGKILL/KILL/9 – Kill. No Signal Handler of any process can block this signal.
- SIGSEGV/SEGV/11 – Segmentation Fault
- SIGTERM/TERM/15 – Terminate
- SIGSTOP/STOP/19 - Stop
If you run the command kill 15345
where 15345 is the PID, it will terminate the process. And the ‘exit system call’ will be called to free up all the occupied resources in the system. But let us say you do not want to terminate but kill the process entirely.
You can run the command kill –9 15345
where 9 is the numerical representation of the KILL signal. It will destroy the process instantaneously, but the occupied resources might not free up, which the Kernel might have to do later.
SIGHUP Signal is sent when the terminal on which the process was running is closed. In simple words, it tells the Process that you do not have any medium to work anymore as your terminal is closed, and you can hang up.
SIGINT Signal is sent to terminate/interrupt the Process when we press Ctrl+C after running the command.
SIGSEGV Signal occurs when the process tries to access some area of the memory it does not have access to. For example, We run the command cat /etc/shadow
, but as we know that it is a root file and the user does not have access to it, this is when the segmentation fault occurs.
SIGTERM Signal kills the Process but after doing the cleanup like freeing the allocated memory etc., unlike SIGKILL.
SIGSTOP Signal only suspends a process for some time after which the process might resume.
CPU Time
We generally run multiple software in our system simultaneously like, Chrome, Terminals, VSCode etc. It may seem like all are running simultaneously, but that is not the case. All the processes use the CPU for a small amount of time, known as a ‘Time Slice’.
All the Processes use a short amount of CPU time (time slice in milliseconds) one by one in a cycle. Once a process gets its time slice, it usually pauses for a few milliseconds allowing other processes to use that freed CPU time slot.
This is where the number of cores in the CPU plays a role. A CPU can be dual-core (2), quad-core (4), octa-core (8) etc. The more cores a CPU has, the more Processes it can run simultaneously. For example, if your CPU has four cores and runs 100 Processes, each core in the system handles 25 processes.
The Kernel handles this switching of CPU time among the Processes. If a process gets this control, it will tend to keep a disproportionate amount of CPU time. We have a way to influence the amount of CPU time a process can have with a value known as ‘nice’. nice and renice commands
All the Processes running in the system have a number to determine their priority for the CPU for the amount of time slice the CPU should give them.
When you run the command top
, there is a column ‘NI’ which contains values like 0, -20, 19, and 5. The higher the value, the nicer the process is.
High NI Value -> Low Priority
Low NI Value -> High Priority
If a process has a Low NI Value means it is not very nice. So, the CPU has to give more time to run that process smoothly and keep it on high priority. On the other hand, if a process has a High NI Value, the CPU does not have to prioritize it as the process is already very nice and will run smoothly even on a thin Time Slice.
You can change this NI Value with the command nice
. For example, if there is a process that is not nice and you want the CPU to keep it on High Priority and a lot more of its time to run it faster, you can do that by running the command nice –n –50 [process name]
.
For the processes already running on the system, you can use the command renice
like renice –10 –p [PID]
where PID is of the already running process, and you want to change its NI value.
#States of a Process
- R State – Running or Runnable. It means the process is running or is waiting for the CPU to give it time to process.
- S State – Interruptible Sleep. It means the process is waiting for some event which can be input from the user.
- D State – Uninterruptible Sleep. It means the process cannot be killed or interrupted by a signal. The only way to do it is to reboot your system.
- Z State – Zombie State. It means the process is terminated and waiting for their ‘wait call status’ to be collected. We understood this in the above ‘Termination of a Process’ section.
- T State – Stopped State.
How everything in Linux is a file - the ‘/proc’ directory
In the first blog of this Linux Masterclass series, we talked about how everything, even the commands, processes, and directories, are all files in the Linux system.
‘/proc’ is the directory that contains all the information about all the processes in the system. If you run the command ls /proc
, it will show a large list of the same.
All the subdirectories of this /proc directory in the output above are all the processes running in the system. Most of their names are their respective PIDs (Process IDs). These subdirectories contain all the information related to these specific processes.
If you run `ls /proc/1', the first subdirectory in the output which belongs to the ‘init’ mother process will show up.
Three of the results we are not permitted to see as they belong to the root. All these results are information about different processes in the form of files.
The /proc
directory is from where the Kernel sees the system. The Kernel does not see everything the user is doing in a system at all times. The Kernel saves all the processes running in the system inside the /proc directory as files or subdirectories.
Whenever the Kernel wants to see what is happening in the system, what process does it need to give CPU Time Slice to, allocate/take resources or switch processes in the core or something else, is known by the Kernel with this /proc directory only.
Jobs
There can be a situation when you run a process in the terminal, and it takes a lot of time to execute, and until it finishes, you cannot run any other command in the same terminal. This is where the concept of ‘Jobs’ comes in. By writing &
at the end of any command like python3 test.py &
, it will run asynchronously and provide you with a Process ID.
As you can see, the terminal has provided the PID (21266) of this file and the Job ID ( [1] ) as the output running in the background. We can view all the jobs running in the background with the command jobs
. It shows the exact command (process), the status as ‘Running’ and the Job ID as 1.
Now, if we write another command like cat &
, it will show the Job ID as 2 with the + sign in front of it. The python command's Job ID sign has changed from + to – because it is older in chronological order. If you write another cat &
, its Job ID will be 3 and the sign will be + while the previous command sign has changed to – and the python one has gone blank.
To bring a job from the background to the foreground, you can run the command fg %1
where 1 is the Job ID of the Process.
Let us say I run the command python3 test.py
and sometime later decide it is taking a lot of time, so I should send it to the background. I can click Ctrl+Z to stop it first. Remember, Ctrl+C terminates a process, but Ctrl+Z stops it. If we now write the command bg
, it will send it to the background and make it run once again.
If I want to kill this command midway, one way is to run fg %5
where 5 is the Job ID, bring it to the foreground and press Ctrl+C. The other way to kill the process without bringing it to the foreground is by running the command kill %5
where 5 is the Job ID.
Thanks for reading :)
To read the previous blogs of this Linux Series, go to blog.gauranggaur.com
To see the video lecture of this blog, go to this youtube video
Follow me on Twitter
Do comment your thoughts, questions or anything unique you learned below!