Processes In Linux - How Linux Works Internally? Explained in Detail

Processes In Linux - How Linux Works Internally? Explained in Detail

This is the fourth blog in the Linux Masterclass Series.

Processes are programs running on your Linux system, managed by the Kernel. Each Process has a PID (Process ID) associated with it used by the Kernel to identify it. If we execute the command ps

Screenshot from 2022-11-04 16-03-34.png

The terminal shows us four columns.

  • PID - Process ID

  • TTY – 'Controlling Terminal' of the process. Most Processes we run are associated with a terminal. The ID of that terminal is written in the TTY.

  • TIME – Total CPU Usage Time a Process is taking

  • CMD – CMD (Command) is the name of the executable command/program currently running in the system.

If you open another terminal window while the first one is still open and run the ps command, the TTY value will change from 0 to 1.

Screenshot from 2022-11-07 17-02-57.png

ps aux command

In the ps aux command, a, u, and x are the flags of the ‘ps’ command.

  • a – All the processes, even the ones run by other users
  • u – More details about the processes
  • x – All the processes, even the ones running in the background with no associated terminal (TTY).

If you run the ps aux command, it opens a large output with more rows and columns.

Screenshot from 2022-11-07 17-05-15.png

  • USER – Every Process has an owner.
  • PID – Process ID.
  • %CPU – Percentage of CPU time a Process is taking.
  • %MEM – Percentage of Memory a process takes out of the entire physical memory of the system.
  • VSZ – When a Process starts up, Virtual Memory Usage or VSZ is the maximum memory in kb available to the entire Process. It can use some portion or all of the allocated memory but not more.
  • RSS – The amount of memory in kb currently used by the process out of the total VSZ.
  • TTY – Earlier, we saw the TTY column only consisted of values like ‘pts/o’ or ‘pts/1’, but now it also has ‘?’, ‘tty2’, and ‘tty3’ as the values. ‘?’ means no terminal associated. We’ll understand the other values below in the Controlling Terminal section.
  • STAT – This tells us the process status.
  • START – The exact time a process started.
  • TIME – Total CPU Usage Time a process is taking.
  • CMD – Name of the command/program.

top command

If you run the top command, this will show the real-time version of the ps aux data and much more. Here the values keep fluctuating based on what happens in the system. Press q to get back to the terminal.

Screenshot from 2022-11-07 18-06-36.png

The top command helps us to find out what processes might be slowing down or affecting the system in any other way.

Controlling Terminal (terminal and pseudo-terminal)

Earlier in the TTY section of ps aux, we saw two types of values.

Tty1, tty2 or tty3 – Regular Terminal

pts/0 or pts/1 - Pseudo Terminal

A regular Terminal is a terminal that has no graphics associated with it. It is only a shell where you can only use a keyboard to type commands without any GUI elements. Type ‘Ctrl+Alt+F3' to access TTY3, and to exit type ‘Alt+left arrow’.

The processes not associated with any terminal and run without it are known as Daemon Processes. They start when the system boots up and terminate when it shuts down. The daemon processes run in the background, ensuring our system runs fine. We do not want them to close, as our system might crash without them. That is the reason they are not associated with a controlling terminal.

All the processes in the ps aux output with ‘?’ under the TTY column are essential processes without any associated terminal. Because if we accidentally close a terminal that is running one of them, our system will crash.

Processes are an instance of a Program

Open three terminals and execute thecat command in two of them with or without an argument. And in the third one, execute ps aux | grep cat where you are transferring the output of ps aux to the grep command that will search the input for ‘cat’ and show output in the terminal with all the lines that contain ‘cat’.

Screenshot from 2022-11-08 13-46-52.png

Notice that there are two instances of cat, one running in pts/1 and the other in pts/2. Although both terminals execute the same command or program, we still have two instances. If you close the other two terminals, there will be no instances of cat running in pseudo-terminals anymore. This example proves that ‘processes’ are instances of a running program. We can have as many instances of a program in our system as we want.

Who manages the Processes? And how?

By managing, I mean who reserves/allocates and makes decisions about the VSZ memory, CPU time etc., for a process to use. The answer is the ‘Kernel’. The Kernel manages all the processes in the system. It loads up the code of a program in the memory, allocates all the resources that the process might need, and keeps all of them in check. It knows the status, the owner, and the resources allocated/consumed by a process.

When a process ends, the Kernel freezes up the resources the process was using and makes them available for other processes to use.

Now let us come to the How part. When the user creates an instance of a program, it clones using the ‘Fork System Call’. Fork System Call creates an identical child process (or we can say, forks it) and assigns a Process ID (PID) to it. The original process that this child process was cloned or forked from becomes the Parent Process.

Parent & Child Processes

A Child Process can use the same ‘Program’ that its parent process was using or in most cases, it can use ‘execve system call’ through which it can run a new program. If you run the command ps –l, it shows a more detailed view of the ps command output where there is a new column called PPID.

Screenshot from 2022-11-08 14-20-46.png

The ps command has a personal ‘PID’ of 13087 and a ‘PPID’ of 11530. This PPID of 11530 is the same as PID 11530 of the bash command. So, we can say the ‘ps’ command process is the child process of the ‘bash’ command. But you can see the ‘bash’ command also has a PPID, which means it also has a parent process. We can find it out by running ps aux | grep 11522 where 11522 is the PPID of bash.

Screenshot from 2022-11-08 14-26-06.png

We can see that bash is the child process of /usr/libexec/gnome-terminal-server.

Mother Process

Now, you may wonder if every process is the child of its parent process, then there must be a process above all other processes, the Mother process. This Mother Process is known as ‘init’. When the system boots up, it creates the mother process ‘init’ with the PID of 1. ‘init’ runs on root privileges with the ‘root’ user as the owner and is responsible for most of the processes responsible for running the system.

By running the ps aux command, the first line shows the ‘init’ process with the PID of 1 and owner as ‘root’. It has the highest ‘VSZ’ allocated and ‘RSS’ used. All the other processes below, with TTY as ‘?’ and PIDs consecutively increasing, are the Daemon Processes essential for running the system. They are all child processes or forks of the ‘init’ mother process.

Termination of a Process

The system terminates a Process using the ‘-exit system call’. It is responsible for freeing up all the utilized resources. When a process terminates, it lets the kernel know why it is terminating with a ‘Termination Status’. The most common Termination status is ‘0’, which means ‘the process succeeded’.

When a child process terminates, its parent process has to acknowledge its status with the ‘wait system call’ and issue the termination status.

There are different use cases when it comes to the termination of a process. Let us understand them.

Orphan Process

In this case, the parent process gets terminated instead of the children. These Child Processes are known as Orphan Processes. They do not have a Parent Process that can use the ‘wait system call’ to acknowledge the termination of their child. In such a case, the Kernel puts such Orphan Processes under the control of the ‘init’ mother process.

Zombie Process

In this case, the parent process exists but does not acknowledge the termination of the child process and does not call the ‘wait system call’ for some reason. So, the Kernel turns such a Child Process into a ‘Zombie Process’. It is living dead, meaning it is terminated, but still exists because the system does not know its termination status.

Let us assume after the child process turns into a zombie process, the Parent process calls the ‘wait system call’. This is known as ‘reaping’. But if the ‘reaping’ does not happen, the ‘init’ process adopts the zombie process, calls the ‘wait system call’ and issues a termination status so the zombie process can rest in peace.

Signal

A Signal is a notification sent to a Process that something has happened. When a user kills or suspends a process, the process needs to know that it has been killed/suspended only then the user action will transpire. Maybe some hardware or software issue occurred, and the Kernel wants to notify the Process. All this happens with the help of signals.

There are many things a Process can do with a signal it receives. It can ignore the signal or catch it and perform a signal handler routine based on the configuration. All processes have signal handlers that are responsible for handling different signals.

For example, the software is running and as soon as it receives a termination signal saying it has to stop working and terminate itself, the signal handler will save the collected data and terminate the process. There are different types of signals:

  • SIGHUP/HUP/1 – Hang up
  • SIGINT/INT/2 – Interrupt
  • SIGKILL/KILL/9 – Kill. No Signal Handler of any process can block this signal.
  • SIGSEGV/SEGV/11 – Segmentation Fault
  • SIGTERM/TERM/15 – Terminate
  • SIGSTOP/STOP/19 - Stop

If you run the command kill 15345 where 15345 is the PID, it will terminate the process. And the ‘exit system call’ will be called to free up all the occupied resources in the system. But let us say you do not want to terminate but kill the process entirely.

You can run the command kill –9 15345 where 9 is the numerical representation of the KILL signal. It will destroy the process instantaneously, but the occupied resources might not free up, which the Kernel might have to do later.

SIGHUP Signal is sent when the terminal on which the process was running is closed. In simple words, it tells the Process that you do not have any medium to work anymore as your terminal is closed, and you can hang up.

SIGINT Signal is sent to terminate/interrupt the Process when we press Ctrl+C after running the command.

SIGSEGV Signal occurs when the process tries to access some area of the memory it does not have access to. For example, We run the command cat /etc/shadow, but as we know that it is a root file and the user does not have access to it, this is when the segmentation fault occurs.

SIGTERM Signal kills the Process but after doing the cleanup like freeing the allocated memory etc., unlike SIGKILL.

SIGSTOP Signal only suspends a process for some time after which the process might resume.

CPU Time

We generally run multiple software in our system simultaneously like, Chrome, Terminals, VSCode etc. It may seem like all are running simultaneously, but that is not the case. All the processes use the CPU for a small amount of time, known as a ‘Time Slice’.

ls -la etc  less (2).png

All the Processes use a short amount of CPU time (time slice in milliseconds) one by one in a cycle. Once a process gets its time slice, it usually pauses for a few milliseconds allowing other processes to use that freed CPU time slot.

This is where the number of cores in the CPU plays a role. A CPU can be dual-core (2), quad-core (4), octa-core (8) etc. The more cores a CPU has, the more Processes it can run simultaneously. For example, if your CPU has four cores and runs 100 Processes, each core in the system handles 25 processes.

The Kernel handles this switching of CPU time among the Processes. If a process gets this control, it will tend to keep a disproportionate amount of CPU time. We have a way to influence the amount of CPU time a process can have with a value known as ‘nice’. nice and renice commands

All the Processes running in the system have a number to determine their priority for the CPU for the amount of time slice the CPU should give them.

When you run the command top, there is a column ‘NI’ which contains values like 0, -20, 19, and 5. The higher the value, the nicer the process is.

High NI Value -> Low Priority

Low NI Value -> High Priority

If a process has a Low NI Value means it is not very nice. So, the CPU has to give more time to run that process smoothly and keep it on high priority. On the other hand, if a process has a High NI Value, the CPU does not have to prioritize it as the process is already very nice and will run smoothly even on a thin Time Slice.

You can change this NI Value with the command nice. For example, if there is a process that is not nice and you want the CPU to keep it on High Priority and a lot more of its time to run it faster, you can do that by running the command nice –n –50 [process name].

For the processes already running on the system, you can use the command renice like renice –10 –p [PID] where PID is of the already running process, and you want to change its NI value.

#States of a Process

  • R State – Running or Runnable. It means the process is running or is waiting for the CPU to give it time to process.
  • S State – Interruptible Sleep. It means the process is waiting for some event which can be input from the user.
  • D State – Uninterruptible Sleep. It means the process cannot be killed or interrupted by a signal. The only way to do it is to reboot your system.
  • Z State – Zombie State. It means the process is terminated and waiting for their ‘wait call status’ to be collected. We understood this in the above ‘Termination of a Process’ section.
  • T State – Stopped State.

How everything in Linux is a file - the ‘/proc’ directory

In the first blog of this Linux Masterclass series, we talked about how everything, even the commands, processes, and directories, are all files in the Linux system.

‘/proc’ is the directory that contains all the information about all the processes in the system. If you run the command ls /proc, it will show a large list of the same.

Screenshot from 2022-11-08 18-09-06.png

All the subdirectories of this /proc directory in the output above are all the processes running in the system. Most of their names are their respective PIDs (Process IDs). These subdirectories contain all the information related to these specific processes.

If you run `ls /proc/1', the first subdirectory in the output which belongs to the ‘init’ mother process will show up.

Screenshot from 2022-11-08 18-19-52.png

Three of the results we are not permitted to see as they belong to the root. All these results are information about different processes in the form of files.

The /proc directory is from where the Kernel sees the system. The Kernel does not see everything the user is doing in a system at all times. The Kernel saves all the processes running in the system inside the /proc directory as files or subdirectories.

Whenever the Kernel wants to see what is happening in the system, what process does it need to give CPU Time Slice to, allocate/take resources or switch processes in the core or something else, is known by the Kernel with this /proc directory only.

Jobs

There can be a situation when you run a process in the terminal, and it takes a lot of time to execute, and until it finishes, you cannot run any other command in the same terminal. This is where the concept of ‘Jobs’ comes in. By writing & at the end of any command like python3 test.py &, it will run asynchronously and provide you with a Process ID.

Screenshot from 2022-11-08 18-41-43.png

As you can see, the terminal has provided the PID (21266) of this file and the Job ID ( [1] ) as the output running in the background. We can view all the jobs running in the background with the command jobs. It shows the exact command (process), the status as ‘Running’ and the Job ID as 1.

Now, if we write another command like cat &, it will show the Job ID as 2 with the + sign in front of it. The python command's Job ID sign has changed from + to – because it is older in chronological order. If you write another cat &, its Job ID will be 3 and the sign will be + while the previous command sign has changed to – and the python one has gone blank.

Screenshot from 2022-11-08 18-52-09.png

To bring a job from the background to the foreground, you can run the command fg %1 where 1 is the Job ID of the Process.

Let us say I run the command python3 test.py and sometime later decide it is taking a lot of time, so I should send it to the background. I can click Ctrl+Z to stop it first. Remember, Ctrl+C terminates a process, but Ctrl+Z stops it. If we now write the command bg, it will send it to the background and make it run once again.

Screenshot from 2022-11-08 19-00-53.png

If I want to kill this command midway, one way is to run fg %5 where 5 is the Job ID, bring it to the foreground and press Ctrl+C. The other way to kill the process without bringing it to the foreground is by running the command kill %5 where 5 is the Job ID.

Thanks for reading :)

To read the previous blogs of this Linux Series, go to blog.gauranggaur.com

To see the video lecture of this blog, go to this youtube video

Follow me on Twitter

Do comment your thoughts, questions or anything unique you learned below!

Did you find this article valuable?

Support Kshitij Sharma by becoming a sponsor. Any amount is appreciated!