Filesystem, Partitions & Inodes in Linux – A Deep Dive

Filesystem, Partitions & Inodes in Linux – A Deep Dive

This is the fifth blog in the Linux Masterclass Series

/dev Directory (Device Files)

When you connect any device to your system, it needs some device drivers to function. And we need device files to exist in the /dev directory for interaction with device drivers. You can view them by running the command ls /dev or, for a detailed view, run ls –l /dev.

Screenshot from 2022-11-10 10-43-32.png

You may notice that the standard output of this command is different from the usual ls –l output for any other regular directory like /home. Let us understand the unusual terminal output in the above screenshot.

Permissions

A 10-bit character set makes up a Permission set for a file/directory. We understand from the previous blog of this Linux Masterclass Series that ‘d’ represents a directory, ‘-’ represents a file, and ‘l’ represents a link. You can read that part of the blog along with the ‘rgx (read-write-execute)’ and ‘ugo (user-group-others)’ dynamic here in great detail

But here we can see more types like b, c, and p. ‘c’ represents a character, ‘b’ represents a block, ‘s’ represents a socket and ‘p’ a pipe (|).

c -> Character Device

This device transfers data but one character at a time. As we know, every piece of information is made up of bits. And a character device transfers one bit at a time.

b -> Block Device

This device transfers data in large fixed-size blocks. Devices like Hard Disk (HHD), and SSD are block devices.

p -> Pipe Device

Pipe or | allows two processes to communicate by taking the output of one command and transferring it as an input to another. Understand this in detail here

s -> Socket Device

This device also facilitates communication between processes like the pipe device but can communicate with multiple processes simultaneously.

It tells us how many hard links exist in this file. Discussed in much greater detail in the last section of this blog.

Owner

Group

Major Device Number, Minor Device Number

Some of the files have only one number, and some have two separated by a comma. This differentiation of major and minor is done for the ‘Device Characterization’.

The Major Device Number represents the in-use device driver of the device. For example, in the first line of the above output screenshot, autofs device is using the device driver 10. And in the third line, the btrfs-control device uses the same one.

Minor Device Number tells the Kernel, what unique device it is. The kernel uses this number to differentiate between different devices.

Screenshot from 2022-11-10 11-29-17.png

For example, in the above screenshot, the system has two nvme (one runs Windows, the other runs Linux), nvme0n1 and nvme1n1. They both have the same Major Device Number 259 as they use the same drivers. But as you notice, the Minor Device Numbers are different, like 0, 1, 2, 3, 4, 5, 6, and 7. The kernel uses these unique minor device numbers to identify each one. In case you are wondering what is p1, p2, p3, and p4, they are the partitions. nvme0n1 has 4, and nvme1n1 has 2.

###Timestamp

Device Name

Some devices in the above list are not connected physically to the system/motherboard. They are known as ‘Pseudo Devices’. They are /dev/null, /dev/zero, and /dev/random. Anything you enter/store in a ‘null’ device vanishes forever and never returns. It is like the black hole of your computer system. ‘random’ device is used to generate random numbers. All these three pseudo devices are character devices and transfer data one character at a time.

/sys Directory

The /sys directory gives detailed information about the /dev directory. Both are similar but have some major differences. While the /dev directory allows programs to access devices, the /sys directory helps to view information and manage devices. It stores device information like model, state, hierarchy, device manufacturer, where it is plugged into the system etc.

Screenshot from 2022-11-10 12-06-23.png

The numbers inside /sys/dev/block are the Major and Minor Device Numbers separated by a colon. The first one is Major the other is Minor. Their colour is light blue because they are link type. If you run ls –l /sys/dev/block/7:0, you can see more information, related to the devices using this driver.

Screenshot from 2022-11-10 12-10-50.png

Filesystem (fs) Hierarchy

If you run the command ls –l / where / represents the root directory. It will show a detailed list of all the root folder files. The whole tree of filesystem hierarchy in the Linux System starts from this root (/) directory.

Screenshot from 2022-11-10 13-51-42.png

This list also consists of /sys and /dev we understood above. Let us understand the other important directories and files in this root folder.

  • bin directory -> It contains essential and ready-to-run binaries for commands like mkdir, ls, cd, and many more like mongod, mongoose etc.

  • boot directory -> It contains the bootloader files used by the Kernel.

  • dev directory -> It contains the device files

  • etc directory -> It is the core system config directory. It contains all the configuration files of any program you run. Most programs you run have configurations/settings that you can influence to change their functioning.

  • home directory -> it is the personal directory of the user. This is where your general-purpose directories are like Desktop, Documents, Downloads, Pictures etc. And the location where the terminal opens up by default.

  • lib directory -> It contains the library files that the binaries might use. Binaries of the commands might need some external libraries to execute their code, like in web development.

  • media directory -> It is an attachment point for any USB or something like that. If you connect a USB to the system, this is where it gets connected and accessed.

  • mnt directory -> It is the temporarily mounted filesystem. The entire filesystem gets temporarily mounted here.

  • opt directory -> It contains optional software packages. For example, if you are using Google Chrome, you will find its name inside it if you run the command ls /opt

  • proc directory -> It contains the Process information. You can understand it in much greater detail here

  • root directory -> It is like a home directory for the root user just like ‘/home’ is for a regular user.

  • run directory -> It contains information about the running system since boot. From the moment you start your system, whatever information the system collects gets stored in this /run directory.

  • sbin directory -> It contains the system binaries. The earlier binaries directory was for commands that any user could run. But the system binaries can only be executed by the root user.

  • sys directory -> It helps in managing the filesystem.

  • tmp directory -> It is the storage for temporary files.

  • usr directory -> It contains the user-installed software and utility.

  • var directory -> var stands for Variable. Any system login, user tracking, caches, and many other constantly changing things are variables maintained in this variable directory.

Filesystem (fs) Types

There are different implementations of filesystems. Some are faster than others, and some support larger storage. Different filesystems have different ways of structuring their data. The structure we see in the above section of the root directory is not the same for every system. It might be different based on the requirements.

Let us say we have an app that runs on a filesystem. The app doubts whether the filesystem supports the way it runs. In such a situation, we can take the help of an abstraction layer between the app and the filesystem known as a ‘Virtual File System’. If the app wants to operate on this filesystem, it will try to use the Virtual File System. Now, it is the job of the Virtual File System to ensure that the app file is operable on the filesystem, no matter the structure.

ls -la etc  less (3) (1).png

You can have multiple filesystems in your computer machine depending on how you have created the partition. We saw in the above section that there were two partitionsnvme0n1 and nvme1n1, one for Windows and the other for Linux.

Journaling

Let us say you have a large file and are copying it from one place to another. Since this is a large file, it will take a lot of time for copying to finish. Suppose the system battery gets depleted or your system shuts down for some reason, and the large file you tried to copy, gets corrupted. Because of this, there will be inconsistency in the filesystem. When you restart your system, it will perform some filesystem checks to make sure everything is okay.

You might have noticed when your computer unexpectedly shuts down takes a lot of time to boot back up on restart. Filesystem checks are one of the reasons why.

A process known as ‘Journaling’ helps you avoid such situations. It is part of the majority of operating systems that come these days. With Journaling, before your system begins to copy a file, it will write down what the user is attempting in a log file. This log file is known as a ‘Journal’. Once the task gets completed, like copying a file, it will be marked as ‘completed’ in the journal.

With the Journaling feature, your filesystem remains consistent. And if suddenly your system shuts down, it would know what it was trying to do before powering off by checking the journal. It results in the boot-up being much faster.

Desktop Filesystem Types

  • ext4 –> This is one of the latest versions of Linux Filesystem and is compatible with previous filesystems like ext3, ext2 etc. It supports disk space of up to 1 exabyte and a file size of up to 16 terabytes. ext4 is the standard choice for a Linux Filesystem.

For Reference

8 bits -> 1 byte

1024 bytes -> 1 kb

1024 kb -> 1 Megabyte

1024 Mb -> 1 Gigabyte

1024 Gb -> 1 Terabyte

1024 Tb -> 1 Petabyte

1024 Pb -> 1 Exa byte

1024 Eb -> 1 Zeta byte

1024 Zb -> 1 Yota byte

1024 Yb -> 1 Bronto byte

1024 Bb -> 1 Geop byte

  • Btrfs or Butter/Better File System -> This is also one of the latest Linux filesystems and comes with many things like snapshots, incremental backups, and performances. It is a new filesystem and yet not stable.

  • XFS –> It is a high-performance journaling filesystem. It is generally good for servers that host media like YouTube and Netflix.

  • NTFS & FAT –> It is used by the Windows Operating System.

  • HFS+ -> It is used by the macOS.

You can check what all filesystems your computer system uses by running the command df –T, where the T flag displays a column with filesystem type.

Partition Table

We already know from the above sections that the HDD and SSD are block devices. And we can create partitions inside the block devices to separate data. Instead of having one filesystem in the entire HDD, SSD or nvme, you can create multiple partitions that act as individual block devices. This process will let you have more than one filesystem.

For example, in your 1 Tb SSD, you can have two 500 Gb partitions, one where Windows is running and the other for Linux. If we have some space that is not a part of any partition is declared as Free Space.

To check what partitions you have created, you can check the ‘Partition Table’. It also tells where the partition begins and ends, which are bootable, and which sector we have allocated to a partition etc.

There are two Partition Schemes:

  1. MBR or Master Boot Record

  2. GPT or GUID Partition Table

MBR

MBR is the traditional partition table that used to be the standard early on. It only supports disks up to 2 Tb. The problem with MBR is that it has a limit of four partitions known as ‘Primary Partitions’. But you can opt to have only one partition and make it an ‘Extended Partition’. In this Extended Partition, you can create as many ‘Logical Partitions’ as you want that behave as Primary partitions. But you can only choose one Primary partition to extend.

GPT

GPT is becoming the new standard for disk partitioning. Unlike MBR, GPT only has one partition type, and you can divide it as much as you want. Each partition will have a Globally Unique ID (GUID). GPT is used with UEFI-based Booting.

Filesystem Structure

A filesystem is an organized collection of files and directories. It’s like a database to manage the files and the actual files themselves. Any Filesystem Structure has four major components:

  • Boot Block – It is in the first few sectors of the filesystem. It is only used to boot the Operating System.

  • Super Block – It is the single block that comes after the boot block and generally contains the information about the filesystem itself. Information like the size of the Inode tables, filesystem etc.

  • Inode Table – It is like a database that manages our files. Each file or directory has a unique entry in the Inode table that contains information about them.

  • Data Blocks – These blocks contain the actual data of the files and directories.
    We can run the command sudo parted –l to view the partition detail of a system.

Screenshot from 2022-11-10 17-23-57.png

As you can see, Partition Table is ‘gbt’ which I explained earlier. Inside the table, the first one is for the Boot Block, which starts at 1024kb and ends at 538MB. The second part is the Data Block dedicated to the Linux OS with an ext4 filesystem.

There are different disk partitioning tools you can use:

  • fdisk – It is a CLI-based tool, and it does not support GPT.

  • parted – It is also a CLI-based tool. It supports both MBR and GPT.

  • Gparted – This is the GUI-based version of parted.

  • gdisk – It is also CLI-based and does not support MBR.

/etc/fstab File

If you want automatic mounting of a filesystem at system startup, you can add it inside this /etc/fstab file. You cannot read or go through a filesystem without mounting it to the system. For example, Linux is mounted in the system, and that is how I am able to use it on the machine. Earlier, we saw the ‘mnt’ directory, which is the place where this Linux OS is mounted.

If you run cat /etc/fstab, it will show a bunch of information. Let us go through some data points inside it.

Screenshot from 2022-11-11 11-22-48.png

It is showing UUID=4b6264c6-9bea-42a9-8f11-956f53700ea7. It is the unique identifier for the Linux Filesystem. In front of it, there is the directory / (root), where the Linux Filesystem is mounted. Then, there is the Filesystem type ext4. Then, we have the mount options, followed by dump and pass columns. We use the dump utility to create a backup. Pass system means in what order your filesystem should be checked, in case some discrepancy occurs.

The last line mentions /swapfile and swap. Let us understand them.

Swap Memory

We use swap to assign some virtual memory to your system. If you are low on memory or have a low amount of RAM, you can create a Swap Partition. The system uses this swap partition to swap memory pieces of idle processes to the disk.

The system takes out some pieces of memory that some idle processes occupy in the RAM and saves them into the swap partition inside the disk. This process will free up some memory in your system to be used by the active processes.

Disk Free and Disk Usage (‘df –h’ and ‘du –h’ Commands)

If you run the command df –h where the h flag shows a human-readable version of the output, it will show the currently mounted filesystems utilization. It includes their size, usage, availability, use percentage, and where we have mounted the filesystem in the machine. For example, the Linux filesystem is /dev/sda2 in the third row with a size of 916 Gb and mounted in the / (root) directory.

Screenshot from 2022-11-11 12-17-09.png

If you want to see what is taking how much space in your whole system, you can run the command du –h.

Inode Table

In the Filesystem Structure section, I explained that Inode Tables are like a database to manage files. In this table, each file/directory has an ‘Inode’ that stores all the information about it.

That information can be like file type, owner, group, permissions, number of hard links, size, number of blocks allocated, and the pointers to data blocks of the file. It is everything except the filename and the file itself. Because a file is stored on the disk and not in the inode table.

When we create a filesystem, some space for Inodes is allocated as well. There is an algorithm that determines how much inode space would be needed based on the size of the disk. You might be familiar with the error that says, ‘You are out of disk space’, but when you check your disk usage, you notice that there is still a good chunk of it left. It means your disk has space but the space allocated for your inode table does not have space anymore.

Data Storage on our disks depends on both, the data and the database (Inode Table). If you run the command df -i, it will show the total inodes available for each filesystem, how many you have used, how many are free, and the usage percentage.

Screenshot from 2022-11-11 13-23-38.png

Now, let us assume that you created the total number of empty files & directories equal to the available inodes for the Linux filesystem. Now even if the files are empty and your disk usage is not filled, you still would not be able to create/download any more files and directories in your system. Because there would be no space left in the inode table to add more entries for any new files/directories.

Inode Information

We identify Inodes by numbers. Whenever a new file/directory gets created, they are assigned an inode number. Inode numbers are usually in sequential order, but sometimes you may get a lower inode number. Because when you delete a file/directory, the associated inode number gets freed. And when you create a new file/directory, it may get assigned the same free inode number.

You can check the inode number of any file/directory by running the ls –li command. The first column consists of uniquely assigned inode numbers. To see a more detailed view of a directory, you can use the command stat [directory name or path].

If I run stat /, it will show me the information of my root directory. It consists of the size, Blocks, UID, GID, Access (permissions) in both numerical and graphical form, type (which is directory), Inode number, number of links, and some timestamps.

Screenshot from 2022-11-11 13-53-21.png

How does an Inode locate a file?

Let’s understand this with an example. You created two files: ‘First name’ and ‘Last name’. You created the variable ‘Elon’ in the ‘First Name’ file that gets stored in block 1 of main Data Storage. Then, you created the variable ‘Musk’ inside the ‘Last Name’ file that gets stored in block 2 of Data Storage. Then, you created the variable ‘Mark’ in the ‘First Name’ file that gets stored in block 3 of storage. Then, you created the variable ‘Cuban’ in the ‘Last Name’ file that gets stored in block 4 of storage.

Now, Blocks 1 & 3 belong to the ‘First Name’ file and Blocks 2 & 4 belong to the ‘Last Name’ file. You can notice that the files’ data is not stored sequentially. We know this data is stored somewhere in the Data Storage, but we don’t exactly know where it is as it was not stored sequentially.

ls -la etc  less (4) (1).png

In such a situation, Inodes help in finding the desired data. Each Inode contains 15 pointers. This set of 15-pointers is also known as a ‘Super Block’ which I explained above in the Filesystem Structure section. Each Inode pointer or block out of 15 has a size of 4 kb. The first 12 pointers point directly to the data blocks of the file. For the ‘First Name’ file, the pointers will point to the 1st & 3rd data block and for the ‘Last Name’ file, the pointers will point to the 2nd & 4th data blocks.

But if a single Inode pointer is 4 kb, does that mean a single Inode can only contain 48 kb of max data? The answer is no. This is when the 13th, 14th, and 15th blocks/pointers of an Inode come into the picture. They can point to a group of pointers (an Inode). And let’s say that Inode will again point to another Inode that at last points to a comparatively much larger number of data blocks compared to 15.

For example, Inode A’s 13, 14, and 15 Pointers point to three different Inodes. Those three Inodes’ 13, 14, and 15 pointers point further to three different Inodes. This is how a single initial Inode can point to a much larger number of data blocks.

ls -la etc  less (5).png

Earlier I told you that an ext4 filesystem has 1 exabyte of disk space and 16 terabytes of file space. The vast network of Inodes we saw above fills this 16 TB of file space. In other words, this network of pointers pointing to other Inodes can go to a maximum size of 16 TB.
All the Inodes that come between the first initial Inode and the data are known as ‘Indirect Blocks’.

Links

If you have used Windows OS, you might be familiar with shortcuts. When we install a file, Windows asks us that do we want a shortcut for this file on the Desktop. These shortcuts are Aliases of the directory/file/program that we are shortcutting at a particular location (like, Desktop).

If you make some changes or delete the original file/directory, it will affect the shortcut as well and might break it. If the original file ceases to exist, where will the shortcut lead to?
In Linux, Symlinks/Symbolic Links/Soft Links are equal to shortcuts in Windows. To create a soft link of a file, you can run the command ln –s [filename] [SoftLink name]. Here ln stands for a link, and the -s flag stands for a soft link.

Screenshot from 2022-11-11 16-40-26.png

As you can see, there is one more file created. It is a soft link and is pointing towards the test1 file. And in the permissions column, instead of -, it has l, which means it is a link type. To understand Permissions in deep detail, go here

To create a hard link, you only have to remove the -s flag from the above command. Run a command like ln test2 hard2 to create a hard link.

Screenshot from 2022-11-11 17-02-04.png

You can notice that earlier, the number column after the Permission column consisted of only the digit ‘1’, but now test2 and hard2 have ‘2’ in their rows. Because this column tells us how many hard links of an Inode number exist in the system. Now that we have created a hard link of the ‘test2’ file, the number to ‘2’ from ‘1’.

Notice that the Inode number of hard2 and test2 is the same, contrary to soft1 and test1, which have unique Inode numbers. But earlier, we learned that each file/directory we create should have a unique Inode number. Let us understand what is happening here.

There is an Inode number and a file point to it. If we now create a soft link, it will point to the file instead of the Inode number. But if we create a Hard link, it points directly to the same Inode number the file is pointing to. If this file gets deleted, the soft link will also stop working. But since the hard link is pointing directly to the Inode, it will still work.

It does not mean that if you make any changes to the original file, they will not reflect in the hard link. They will. As the changes are made in the data storage. And both the file and hard link are pointing to the same data storage blocks via Inode.

If you make any changes to the soft link file, they will reflect in the original file as well.

If you make any changes in the hard link file, they will reflect in both the original file and the soft link file.

If you delete the original file, then the soft link file will not work, but the hard link file will work properly.

Thanks for reading :)

To read the previous blogs of this Linux Series, go to blog.gauranggaur.com

To see the video lecture of this blog, go to this youtube video

Follow me on Twitter

Do comment your thoughts, questions or anything unique you learned below!

Did you find this article valuable?

Support Kshitij Sharma by becoming a sponsor. Any amount is appreciated!