📁 Hard Disks and File Systems

During a forensic investigation, storage devices such as Hard Disk Drives (HDDs) and Solid-State Drives (SSDs) are valuable sources of data. The data collected from storage devices should be located and protected as evidence by the investigator. As a result, the investigator must be familiar with the structure and behavior of storage devices. The file system is also important because the file system determines how data is stored and distributed in a device.

3.1 Describe Different Types of Disk Drives and their Characteristics

Understanding Hard Disk Drive
- HDD is a non-volatile digital data storage device that records data magnetically on a metallic platter
- The read/write performance of an HDD is directly proportional to the RPM (revolutions per minute) of the drive platter
- Since an HDD contains moving parts, it is susceptible to physical damage, and the moving parts wear out over time
- Tracks
  - Tracks are the concentric circles on platters where all the information is stored
  - The drive head can access these circular rings in one position at a time
  - Tracks are numbered for identification purposes
  - Read/write is performed by rolling headers from the inner to the outermost part of the disk
  - Track Numbering
    - Track numbering on a hard disk begins at 0 from the outer edge and moves towards the center. The number of tracks on a hard disk depends on the size of the disk
    - The read/write heads on both surfaces of a platter are tightly packed and locked together on an assembly of head arms
    - The arms move in and out together to physically locate all heads at the same track number
    - Therefore, a track location is often referred to by a cylinder number rather than a track number
    - A cylinder is a group of all tracks that start at the same head position on the disk
- Sectors
  - A sector is the smallest physical storage unit on the disk platter
  - Each sector holds data of fixed size: 512 bytes for HDDs, 2048 bytes for CD-ROMs and DVD-ROMs. Latest HDDs use 4096-byte (4KB) sectors.
  - Each disk sector is labelled using the factory track-positioning data
  - The optimal method of storing a file on a disk is in a contiguous series
  - For example, if the file size is 600 bytes, two 512-bytes sectors are allocated for the file
  - Sector Addressing
    - Cylinders, heads, and sectors (CHS) determine the address of the individual sectors on the disk
    - When a disk is formatted, it is divided into tracks and sectors
    - For example, the formatted disk might contain 50 tracks, each of which is divided into 10 sectors
    - Track and sector numbers are used by the OS and disk drive to identify the stored information
- 4K Sectors
  - New hard drives use 4096-byte (4 KB or 4K) advanced format sectors
  - Generation-one Advanced Format also called as 4K sector technology, efficiently uses the storage surface media of a disk by merging eight 512-byte sectors into a single sector of 4096 bytes
  - After merging, the structure of the 4K sector does not disturb the key design elements of the traditional 512-byte sector
Data Density on a Hard Disk
- Data is recorded onto a hard disk using a method called zoned bit recording (also known as a multiple zone recording)
- In this technique, tracks are combined together into zones depending on their distance from the center of the disk
- Each zone is assigned a number of sectors per track
- Types of data densities on a hard disk
  - Track Density
    - It is defined as the space between tracks on a disk
  - Areal Density
    - It is defined as the number of bits per square inch on a platter
  - Bit Density
    - It is the bits per unit length of track
CHS (Cylinder-Head-Sector) Data Addressing and Disk Capacity Calculation
- The CHS addressing method addresses each physical block of data on a hard disk by specifying the cylinder (radius), head (platter side), and sector (angular position)
  - Example of Disk Capacity Calculation: A disk drive has 16,384 cylinders, 80 heads, and 63 sectors per track. Assume - a sector has 512 bytes. What is the capacity of such a disk?
    - Total Size of the Disk = No. of Cylinders * No. of Heads * No. of Sectors per Track * 512 bytes per Sector
    - Total Size of the Disk = (16,384 cylinders) * (80 heads) * (63 sectors / track) * (512 bytes / sector) = 42,278,584,320 bytes
Measuring the Hard Disk Performance
- Data is stored on the hard disk in the form of files
- When a running program requests a file, the hard disk recovers the byte content of the file and sends the bytes to the CPU, one at a time, for further processing
- Hard disk performance is measured by these factors
  - Data rate→It is a ratio of the number of bytes per second that the hard disk sends to the CPU
  - Seek time→It is the amount of time required to send the first byte of the file to the CPU, when it requests the file
Understanding Solid-State Drive (SSD)
- SSD is a non-volatile storage device that uses NAND flash memory chips to store digital data
- SSDs are faster than HDDs as they have no moving parts, and the read/write performance depends on the data connection of the drive
- Components
  - NAND Flash Memory
    - It is the main data storage unit made up of floating gate transistors which retain the charge state even without power
  - Controller
    - It is a processor that acts as a bridge between the flash memory components and the computer (host) by executing firmware-level software
  - DRAM
    - It is a volatile memory that provides faster read/write performance
  - Host Interface
    - An SSD connects to the host machine using an interface. The commonly used SSD interfaces are SATA, PCIe, SCSI, etc.
Disk Interfaces
- ATA/PATA (IDE/EIDE)
  - ATA (Advanced Technology Attachment) is the official ANSI (American National Standards Institute) name of Integrated Drive Electronics (IDE), a standard interface between a motherboard’s data bus and storage disks
- Serial ATA/ SATA (AHCI)
  - It is an advancement of ATA and uses serial signaling, unlike IDE’s parallel signaling
- Serial Attached SCSI
  - SAS (Serial Attached SCSI) is the successor and an advanced alternative to parallel SCSI in enterprise environments
- PCIe SSD
  - A PCIe (Peripheral Component Interconnect Express) SSD is a high-speed serial expansion card that integrates flash directly into the motherboard
- SCSI
  - SCSI (Small Computer System Interface) refers to a set of ANSI standard interfaces based on the parallel bus structure and designed to connect multiple peripherals to a computer

3.2 Explain the Logical Structure of a Disk>

Logical Structure of Disks
- The logical structure of a hard disk is the file system and software utilized to control access to the storage on the disk
- A hard disk’s logical structure has a significant influence on the performance, consistency, expandability, and compatibility of the storage subsystem of the hard disk
- Different OSes have different file systems and use various methods of arranging and controlling access to data on the hard disk
Clusters
- A cluster is the smallest logical storage unit on a hard disk. It is a set of sectors within a disk ranging from cluster number 2 to 32 or more, depending on the formatting scheme in use
- The file system divides the storage on a disk volume into discreet chunks of data for efficient disk usage and performance. These chunks are called clusters
- The process by which files are allocated to clusters is called allocation; therefore, clusters are also known as allocation units
- In the File Allocation Table (FAT) file system, the clusters linked with a file keep track of file data in the hard disk’s file allocation table
- Cluster Size
  - Cluster sizing has a significant impact on the performance of an OS and disk utilization
  - Cluster size can be altered for optimum disk storage
  - The size of a cluster depends on the size of the disk partition and type of file system installed on the partition
  - A large cluster size (greater than one sector) has the following effects
    - Minimizes the fragmentation problem
    - Increases the probability of unused space in the cluster
    - Reduces the disk storage area in which information can be saved
    - Reduces the unused area on the disk
Lost Clusters
- When the OS marks clusters as used but does not allocate them to any file, such clusters are known as lost clusters
- A lost cluster is a FAT file system error that results from the manner in which the FAT file system allocates space and chains files together
- It is mainly the result of a logical structure error and not a physical disk error
- They usually occur because of interrupted file activities caused when, for example, a file is not properly closed; thus, the clusters involved in such activity are never linked correctly to a file
- CHKDSK is a system tool in Windows that authenticates the file system reliability of a volume and repairs logical file system errors
Slack Space
- Slack space is the storage area of a disk between the end of a file and the end of a cluster
- If the file size is less than the cluster size, a full cluster is still assigned to that file. The remaining unused space is called slack space.
- For example, if the partition size is 4 GB, each cluster will be 32 KB in size. Even if a file requires only 10 KB, the entire 32 KB will be allocated to that file, resulting in 22 KB of slack space.
Master Boot Record (MBR)
- A master boot record (MBR) is the first sector (“sector zero”) of a data storage device such as a hard disk
- The information regarding the files on the disk, their locations and sizes, and other important data is stored in the MBR file
- In practice, MBR almost always refers to the 512-byte boot sector (or partition sector) of a disk
- MBR is used for the following
  - Holding a partition table which refers to the partitions of a hard disk
  - Bootstrapping an OS
  - Distinctively recognizing individual hard disk media with a 32-bit disk signature
Structure of a Master Boot Record
- The structure of MBR consists of three parts
  - Master Boot Code or Boot Strap→It is an executable code and responsible for loading OS into computer memory. It consists of a data structure of 446 bytes.
  - Partition Table→It maintains the data of all the hard disk partitions and consists of a data structure 64 bytes
  - Disk Signature→It is located at the end of the MBR and contains only 2 bytes of data. It is required by BIOS during booting
Disk Partitions
- Disk partitioning is the creation of logical divisions on a storage device (HDD/SSD) to allow the user to apply OS-specific logical formatting
- The disk-partitioning process is the same for both HDDs and SSDs
- Primary Partition
  - It is a drive that holds the information regarding the OS, system area, and other information required for booting
  - In MS-DOS and earlier versions of Microsoft Windows systems, the first partition (C:) must be a “primary partition”
- Extended Partition
  - It is a logical drive that holds the information regarding stored data and files in the disk
BIOS Parameter Block (BPB)
- The BIOS parameter block (BPB) is a data structure in the partition boot sector
- It describes the physical layout of a data storage volume, such as the number of heads and the size of the tracks on the drive
- BPB in file systems such as FAT12 (except in DOS 1.x), FAT16, FAT32, HPFS (High Performance File System), and NTFS (New Technology File System) defines the filesystem structure
- The BPB length varies for FAT16, FAT32, and NTFS boot sectors due to different types of fields and the amount of data stored in them
- BPB assists investigators to locate the file table on the hard drive
Globally Unique Identifier (GUID)
- The Globally Unique Identifier (GUID) is a 128-bit unique reference number used as an identifier in computer software
- In general, GUIDs are displayed as 32 hexadecimal digits with groups separated by hyphens
- Common Uses
  - In Windows Registry, GUIDs are used to identify COM (Component Object Model) DLLs (dynamic-link libraries)
  - In database tables, GUIDs are used as primary key values
  - In some instances, a website may assign a GUID to a user’s browser to record and track the session
  - Windows assigns a GUID to a username to identify user accounts
GUID Partition Table (GPT)
- Unified Extensible Firmware Interface (UEFI) replaces legacy BIOS firmware interfaces
- UEFI is a specification that defines a software interface between an OS and platform firmware
- It uses a partition system known as GUID Partition Table (GPT), which replaces the traditional MBR
- Advantages of GPT disk layout
  - Partition 2 Supports up to 128 partitions and uses 64-bit Logical Block Addresses (LBAs)
  - Supports a maximum partition size ranging from 2 Tebibytes (TiB) to 8 Zebibytes (ZiB)
  - Provides primary and backup partition tables for redundancy
- Protective MBR
  - A disk formatted with a GPT disk layout has a protective MBR located at Logical Block Address (LBA) 0
  - Protective MBR provides compatibility with legacy tools that fail to understand the GPT format
  - It is similar to the “legacy” MBR in functionality but has only one partition of type 0xEE (EFI_GPT_DISK)
  - This partition reserves the entire disk for the formal GPT structure
    - Note: The UEFI Firmware does not execute the MBR Boot Code (the first 440 bytes)
  - The Get-MBR cmdlet displays the MBR Partition Table of a GPT-formatted disk
- GUID Partition Table
  - The formal GPT starts at LBA 1, where the GPT header is found
- GPT Header:
  - It is a pointer to the partition table and defines the complete logical layout
  - It contains information such as the “EFI PART” signature and a unique GUID of the disk
  - The firmware detects GPT corruption using the CRC32 values
  - The MyLBA value, always 1, defines the location of GPT header, while the AlternateLBA value represents the backup GPT and occupies the last sector on a disk
  - The backup GPT replaces the original GPT when it is corrupted
  - The FirstUsableLBA and LastUsableLBA values point to the disk portion the partitions can use
  - The PartitionEntryLBA value represents the start of the array, while the NumberOfPartitionEntries and SizeOfPartitionEntry values denote the overall partition size
- GPT Partition
  - Array The GPT header points to the partition array via the PartitionEntryLBA value
  - The size and number of partitions are defined in the GPT header
- Each partition contains the following
  - Two GUIDs, where one represents the type of partition, and the other uniquely identifies the partition
  - Partition StartingLBA and EndingLBA values describing the location and size of the partition
  - Partition-type-specific attributes
  - 36-character user-defined partition name

3.3 Understand Booting Process of Windows, Linux and Mac Operating Systems>

What is the Booting Process?
- Booting refers to the process of starting or restarting the OS when the user turns on a computer system
- It loads the OS (stored in the hard disk) to the RAM (working memory)
- Types of Booting
  - Cold boot (Hard boot)
    - It is the process of starting a computer from a powered-down or off state
  - Warm boot (Soft boot)
    - It is the process of restarting a computer that is already turned on. A warm boot might occur when the system encounters a program error or requires a restart to make certain changes after installing a program, etc.
Essential Windows System Files
Windows Boot Process: BIOS-MBR Method
- Windows XP, Vista, and 7 OSes power on and start up using the traditional BIOS-MBR method
- OSes starting from Windows 8 and above use either the traditional BIOS-MBR method or newer UEFI-GPT method according to the user’s choice
- Identifying the MBR Partition
Windows Boot Process: UEFI-GPT
- Identifying the GUID Partition Table (GPT)
  - Get-GPT
    - It parses the GPT data structure contained within the first few sectors of the device specified
    - It requires the use of the -Path parameter, which takes the Win32 device namespace (e.g., \\.\PHYSICALDRIVE1) for the device from which the GPT should be parsed
    - If Get-GPT is run against a disk formatted with an MBR, it will throw an error prompting to use Get-MBR instead
  - Alternate Method
    - Open “Computer Management” application and click “Disk Management” on the left pane. Right-click on the primary disk (here, Disk 0) and then click Properties.
    - In the Device Properties window, click “Volumes” tab to view the Partition style
  - Get-BootSector
    - It reviews the hard drive’s first sector and determines if the disk is formatted using the MBR or GPT partitioning scheme; once done, it acts just as Get-MBR or Get-GPT would, respectively
  - Get-BootSector run against a disk formatted using the GPT partitioning scheme
  - Get-BootSector run against a disk formatted using the MBR partitioning scheme
  - Get-PartitionTable
    - It determines the type of boot sector (MBR or GPT) and returns the correct partition object (PartitionEntry or GuidPartitionTableEntry)
  - Get-PartitionTable run against an MBR-formatted disk, returning a PartitionEntry object
  - Get-PartitionTable run against a GPT-formatted disk, returning an array of GuidPartitionTableEntry Objects
- Analyzing the GPT Header and Entries
  - Most OSes that support GPT disk access provide a basic partitioning tool, which displays details about GPTs
  - Example: DiskPart tool (Windows), OS X Disk utility (Mac), GNU Parted tool (Linux)
  - Sleuthkit (mmls command) can be used to view the detailed partition layout for a GPT disk
  - Alternatively, details about the GPT header and partition entries can be obtained via manual analysis using a hex editor
- GPT Artifacts
  - Deleted and Overwritten GUID Partitions
    - Case 1
      - If the MBR disk is repartitioned or converted to GPT, then sector zero will be generally overwritten with a protective MBR
      - To recover data from previously MBR-partitioned volumes, investigators can use standard forensic methods used to perform an extensive search for file systems
    - Case 2
      - If the GPT disk is repartitioned or converted to MBR, then the GPT header and tables may remain intact based on the tool used
      - Implementation of general partition deletion tools on a GPT disk might only delete the protective MBR, which can be recreated by simply reconstructing the disk
    - As per UEFI specifications, if all the fields in a partition entry are zeroed, it implies that the entry is not in use. In this case, data recovery from deleted GUID partition entries is not possible
  - GUID Identifiers
    - The GPT scheme provides GUIDs of investigative value as they are unique and hold potentially useful information within them
    - GUIDs possess unique identifying information for both disks and individual partitions
    - Investigators can use tools such as uuid to decode various versions of GUID/UUID
  - Hidden Information on GPT Disks
    - Intruders may hide data on GPT disks as they do it on traditional MBR disks
    - Locations on GPT disks where data may be hidden are inter-partition gaps, unpartitioned space towards the end of the disk, GPT header, and reserved areas
    - Current forensic methods and tools to perform GPT analysis are unsatisfactory
Macintosh Boot Process
Linux Boot Process

3.4 Understand Various File Systems of Windows, Linux and Mac Operating Systems>

Windows File Systems
- File Allocation Table (FAT)
  - The FAT file system is used with DOS, and it was the first file system used with the Windows OS
  - It is named for its method of organization, the file allocation table, which resides at the beginning of the volume
  - FAT has three versions (FAT12, FAT16, and FAT32), which differ in terms of the size of the entries in the FAT structure
- FAT File System Layout
  - Reserved Area→It is 1 section in size and includes data in the file system category
  - FAT Area↔It contains the FAT structures. Its size is determined based on the number of structures and their sizes.
  - Data Area↔It includes the clusters that are allocated to store the contents of files and directories
- FAT Partition Boot Sector
  - Boot Sector is the first sector (512 bytes) of a FAT file system
  - FAT partition boot sector holds data used by the file system to access the partition or volume
  - MBR of x86-based computer systems uses this boot sector on the system partition to load the system kernel files
- FAT Folder Structure
  - FAT file systems have a set of 32-byte folder entries for every file
  - The following are the folder entries in the FAT system
- Directory Entries and Cluster Chains
  - Directory entry→is a data structure (32 bytes) allotted for each file and directory
  - It contains information about a file such as attributes, size, starting cluster, and dates and times
- Filenames on FAT Volumes
  - Whenever users create or rename a file on a FAT volume, Windows uses attribute bits to support long filenames and creates an eight-plus-three-character name for the file
  - Windows also create many secondary folder entries for the file
  - The below diagram shows all of the folder entries for the file Thequi~1.fox, which has the long name of The quick brown.fox
- FAT32
  - FAT32 file system is derived from the FAT file system and supports drives up to 2 TB (terabytes) in size
  - It uses drive space efficiently and uses small clusters
  - It creates a backup of the file allocation table instead of maintaining only the default copy
- New Technology File System (NTFS)
  - NTFS is the standard file system of Windows NT and its descendants Windows XP, Vista, 7, 8.1,10, Server 2003, Server 2008, Server 2012, Server 2016 and Server 2019
  - From Windows NT 3.1 onwards, it is the default file system of the Windows NT family
  - It has several improvements over FAT such as improved support for metadata and the use of advanced data structures to improve performance, reliability, and disk-space utilization, as well as additional extensions such as security access-control lists and file system journaling
- NTFS Architecture
- NTFS System Files
- NTFS Partition Boot Sector
  - When a volume is formatted as an NTFS volume, the format program allocates the first 16 sectors for the boot sector, and the bootstrap code
  - Partition identifier: 0x07 (MBR) EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (GPT)
- Cluster Sizes of NTFS Volume
  - A cluster is the smallest allocation unit on the hard disk that is used to hold a file
  - NTFS uses clusters of different sizes to hold the files, depending on the size of the NTFS volume
  - List of the default cluster sizes for an NTFS volume
- NTFS Master File Table (MFT)
  - Each file on an NTFS volume is represented by a record in a special file called the Master File Table (MFT)
  - It reserves the first 16 records of the table for special information
  - The first record of this table describes the MFT itself, followed by an MFT mirror record
  - If the first MFT record is corrupted, NTFS reads the second record to find the MFT mirror file, whose first record is identical to the first record of the MFT
  - The locations of the data segments for both the MFT and MFT mirror file are recorded in the boot sector, and a duplicate of the boot sector is located at the logical center of the disk
  - The third record of the MFT is the log file, which is used for file recovery. The seventeenth and subsequent records of the MFT are for each file and directory (also viewed as a file by NTFS) on the volume
  - MFT is a relational database that consists of information related to files and the file attributes
  - The rows consist of file records, and the columns consist of file attributes
  - It has information on every file on the NTFS volume including its own information
  - It has 16 records reserved for system files
  - For a small folder, MFT is represented as follows:
  - Structure of a MFT on an NTFS volume
- Metadata Files Stored in the MFT
- NTFS Attributes
  - Every file has unique attributes such as name, security information, and metadata of the file system
  - Every attribute is identified by an attribute type code and attribute name
  - There are two categories of attributes
    - Resident attributes
      - These are the attributes that are contained in the MFT
    - Non-resident attributes
      - These are the attributes that are allocated with one or more clusters of disk space
- NTFS Data Stream
  - An NTFS data stream is a unique set of file attributes
  - NTFS supports multiple data streams per file, where the stream name identifies a new data attribute on the file
  - The following command creates a data stream in an existing file on an NTFS volume: C:\>ECHO text_message > myfile.txt:stream1
  - The following command displays the contents of the data stream: C:\>MORE < myfile.txt:stream1
  - A data stream does not appear when a file is opened in a text editor. The only way to determine whether a data stream is attached to a file is to examine the MFT entry for the file.
  - When you copy an NTFS file to a FAT volume such as a floppy disk, data streams and other attributes not supported by FAT are lost
- NTFS Compressed Files
  - The compressed files present on an NTFS volume can be read and written by any Windows-based application without first being decompressed by another program
  - NTFS promotes compression of individual files, all the files within a folder, and all the files/folders within an NTFS volume
  - The file is automatically decompressed by the filter driver when Windows applications request access
  - NTFS compression algorithms support cluster sizes of up to 4 KB
  - Setting the Compression State of a Volume
    - Right-click on the drive that is to be compressed and click Properties
    - On the General tab, check “Compress this drive to save disk space” and click Apply
    - In the Confirm Attribute Changes dialog box, choose an option and click OK
- Encrypting File Systems (EFS)
  - Encrypting File System (EFS) was first introduced in version 3.0 of NTFS and offers file system-level encryption
  - This encryption technology maintains a level of transparency to the user who encrypted the file, which means there is no need for users to decrypt the file to access it to make changes
  - After a user is done with the file, the encryption policy is automatically restored
  - When any unauthorized user tries to access an encrypted file, they are denied access
  - To enable the encryption and decryption facilities, a user must set the encryption attributes of the files and folders they wish to encrypt or decrypt
- Components of EFS
- EFS Attribute
  - NTFS sets a flag for a file once you encrypt it and creates an EFS attribute where it stores Data Decryption Field (DDF) and Data Recovery Field (DDR)
  - This attribute has Attribute ID = 0x100 in NTFS
- Sparse Files
  - Sparse files provide a method of saving disk space for files by allowing the I/O subsystem to allocate only meaningful (nonzero) data
  - If NTFS marks a file as sparse, it assigns a hard disk cluster only for the data defined by the application
  - Non-defined data of the file are represented by non-allocated space on the disk
Linux File Systems
- Linux File System Architecture
- Filesystem Hierarchy Standard (FHS)
  - The Filesystem Hierarchy Standard (FHS) defines the directory structure and its contents in Linux and Unix-like OSes
  - In the FHS, all files and directories are present under the root directory (represented by /)
- Extended File System (ext)
  - The extended file system (ext) is the first file system for the Linux OS to overcome certain limitations of the Minix file system
  - It has a maximum partition size of 2 GB and a maximum filename size of 255 characters
  - It removes the two major Minix file system limitations: a maximum partition size of 64 MB and short filenames
  - The major limitation of this file system is that it does not support separate access, inode modification, and data-modification timestamps
  - It was replaced by the second extended file system (ext 2)
- Second Extended File System (ext2)
  - ext2 is a standard file system that uses improved algorithms compared to ext, which greatly enhances its speed; further, it maintains additional time stamps
  - It maintains a special field in the superblock that keeps track of the file system status and identifies it as either clean or dirty
  - Its major shortcomings are the risk of file system corruption when writing to ext2, and the lack of journaling
  - ext2 Inode
    - The inode is a basic building block of the ext2 file system
    - Each file and directory is described by a single inode
    - The inodes for each file system block are placed together in an inode table
  - ext2 Directories
    - ext2 directories are special files that can create and hold the access path of the files in the file system
    - These files contain the list of directory entries with the following information
      - Directory inode
      - Length of the filename
      - Name of the directory
- Third Extended File System (ext3)
  - ext3 is a journaling version of the ext2 file system and is greatly used in the Linux OS
  - It is an enhanced version of the ext2 file system
  - It uses file system maintenance utilities (such as fsck) for maintenance and repair, as in the ext2 file system
  - The following command converts ext2 to ext3 file system
    - /sbin/tune2fs -j
  - ext3 Features
    - Data Integrity
      - It provides stronger data integrity than ext2 for events that occur due to computer system shutdowns
    - Speed
      - As ext3 file system is journaling the file system, it has higher throughput in most cases than ext2
    - Easy Transition
      - The user can easily change the file system from ext2 to ext3 and increase the performance of the system
- Journaling File System
  - Journaling file systems ensure data integrity on a computer
  - These file systems consist of a journal that records all the information on the updates that are ready to be applied to the file system before they are applied. This mechanism is referred to as journaling.
  - Journaling prevents data corruption by restoring the data on the hard disk to the state it existed in before the occurrence of a system crash or power failure. This helps the system to resume the completion of tasks or updates that were interrupted by an unexpected event.
  - A journaling file system also recovers unsaved data and ensures that it is saved to the intended location after a system has recovered from a crash or power failure
  - Journaling file systems provide great reliability in terms of minimizing data loss
  - ext3, ext4, ZFS, and XFS are some of the examples of journaling file systems in Linux. Because of its stability, ext4 is the most commonly implemented file system on Linux systems.
- Fourth Extended File System (ext4)
  - ext4 is a journaling file system developed as the replacement of the commonly used ext3 file system
  - With the incorporation of new features, ext4 has significant advantages over ext3 and ext2 file systems, particularly in terms of performance, scalability, and reliability
  - It supports Linux Kernel v2.6.19 onwards
  - Key Features
    - File System Size→supports a maximum individual file size of 16TB and overall maximum ext4 file system size of 1EB (exabyte)
    - Extents→replaces the block mapping scheme used by ext2 and ext3, improving large-file performance and reducing fragmentation
    - Delayed allocation→improves performance and reduces fragmentation by effectively allocating larger amounts of data at a time
    - Multi-block allocation→allocates files contiguously on a disk
    - fsck speed→supports faster file system checking
    - Journal checksumming→uses checksums in the journal to improve reliability
    - Persistent pre-allocation→pre-allocates on-disk space for a file
    - Improved Timestamps→provides timestamps measured in nanoseconds
    - Backwards compatibility→makes it possible to mount ext3 and ext2 as ext4
Understanding Superblocks, Inodes, and Data Blocks
- Superblock
  - Superblock→stores information pertaining to the characteristics of a file system
  - It stores information such as file system size, file system type, blocks of the file system that are empty or filled, locations of inode tables and their sizes, block size of the file system, etc.
  - A superblock is highly critical to the working of a file system. If a superblock becomes corrupt, the OS will not be able to access the file system associated with it.
  - Therefore, a Linux system creates multiple backup copies of a superblock for each mounted file system to prevent data loss, even if a superblock becomes corrupt
  - To view superblock information of a file system, use the command
    - dumpe2fs /dev/sda1 | grep –i superblock
- Inode
  - Inode→stores metadata pertaining to a file or directory on the Linux filesystem
  - Each inode is identified by an inode number or index number. Every file on a Linux system will have a unique inode number assigned to it when it is created.
  - The inode number of a file stores attributes such as the size of the file, file type, permissions and access control, date/time, file location etc.
  - To view the assigned inode numbers of files or directories, run the command
    - ls -il
- Data Blocks
  - Data block→stores the actual contents of a file
  - A data block can be allocated only to one file in a file system
  - In case a data block is not allocated to any file, the system treats it as an available data block and allocates it to a file as and when it is needed for allocation
  - When a file is deleted, the data block associated with it becomes free and empty and can be allocated to another file to store its contents
  - Apart from storing a file’s contents, a data block can also store the contents of an entire directory
Mac OS X File Systems
- Hierarchical File System Plus (HFS+)
  - HFS+ is the successor to HFS and used as the primary file system in Macintosh
  - It is also called Mac OS Extended (HFS Extended) and is one of the formats used in the Apple iPod
  - It supports large files and uses Unicode for naming items (files and folders)
  - The HFS Plus allows user to
    - Efficiently use hard disk space
    - Use only international-friendly filenames
    - Easily boot on non-Mac OSes
- HFS Plus Volumes
  - HFS+ volumes are divided into logical blocks (sectors) of 512 bytes each
  - These sectors are clustered into allocation blocks
  - The total number of allocation blocks depends on the volume size
  - The bulk of an HFS+ volume contains seven types of sectors
    - User file fork
    - Allocation file
    - Catalog file
    - Extents overflow file
    - Attributes file
    - Startup file
    - Unused space
- HFS Plus Journal
  - An HFS+ volume has an optional journal, which helps in mounting an unmounted volume in the case of a system crash
  - The journal restores the volume structures to a trustworthy state without scanning all of the structures Journal Header Journal Buffer Transactions
  - The journal info block (.journal_info_block) is stored as a file on the HFS+ volume’s root directory
- Apple File system (APFS)
  - Apple File System
    - APFS (Apple File System), is a file system developed and introduced by Apple for MacOS High Sierra and later versions as well as iOS 10.3 and later versions in the year 2017
    - It replaced all the file systems used by Apple and is suitable for all Apple OSes including iOS, watchOS, tvOS, and macOS
  - APFS consists of two layers
    - The container layer→organizes file system layer information and stores higher-level information such as volume metadata, encryption state, and snapshots of the volume
    - The file system layer→It is made up of data structures that store information such as file metadata, file content, and directory structures
  - APFS supports TRIM operations, extended file attributes, sparse files, fast directory sizing, snapshots, cloning, greater timestamp granularity, and the copy-on-write metadata feature
  - It overcomes the disadvantages of the older file system, HFS+, which include a lack of functionality, low-security levels, limited capacity, and incompatibility with SSDs
  - Drawbacks of APFS
    - Due to ‘copy-on-write’ feature, APFS cannot be used on HDDs
    - It lacks NVRAM (non-volatile RAM) support and support for Apple Fusion Drives
- Major Components of APFS
  - Container Superblock→It is the highest level in the file system and has information about block size limitations, the total number of blocks, and previous checkpoints.
  - Checkpoint Superblock Descriptor (CSBD)→It is the block preceding the Checkpoint Superblock (CSB) and contains data about metadata structures in APFS. The location of the Bitmap Structure (BMS) is important during a forensic investigation.
  - Bitmap Structures→It records used and unused blocks. There is only one bitmap system that covers the whole container and is common to all volumes in the file system.
  - Volume Superblock→This is the highest level in a volume and has data about that volume.
  - File and folder B-Tree→It functions similar to catalog files in HFS Plus and records files and folders in the volume.
  - Extents B-Tree→Extents are references to file content, with information about where the data content starts and its length in blocks. This is a separate structure and is a part of the snapshot feature.
  - Snapshots→represent the state of your Mac device at a specific point in time.
  - Checkpoints→Every container superblock has checkpoints. The difference between a checkpoint and a snapshot lies in the user’s ability to restore the file system from stored snapshots using the file system API (Application Programming Interface)
- APFS vs. HFS Plus
CD-ROM/DVD File System
- The ISO (International Organization for Standardization) 9660→defines a file system for CD-ROM and DVD-ROM media
- To exchange data, it supports various OSes such as Microsoft Windows, Mac OS, and UNIX-based OSes
- Common extensions to ISO 9660 were developed to provide the following features:
  - Longer ASCII coded names and UNIX permissions are facilitated by Rock Ridge
  - Unicode filenames (such as non-Roman scripts)are supported by Joliet
  - Bootable CDs are facilitated by El Torito
- ISO 13490 is a combination of ISO 9660 with multisession support on a disc
- Windows supports two types of file systems on CD-ROMs and Digital Versatile Disks (DVDs)
  - Universal Disk Format (UDF)
  - Compact Disc File System (CDFS)
Virtual File System (VFS) and Universal Disk Format (UDF) File System
- Virtual File System (VFS)
  - A VFS is programming that forms an interface between the OS’s kernel and the file system
  - VFS acts as an abstraction layer and gives client applications access to the various concrete file systems of local and network storage devices
  - Some of the examples of VFS include VMware Virtual Machine File System (VMFS), New Technology File System (NTFS), Global File System (GFS), the Oracle Clustered File System (OCFS), etc.
- Universal Disk Format File System (UDF)
  - UDF is a file system specification defined by the Optical Storage Technology Association (OSTA), aimed at replacing the ISO 9660 file system on optical media and also FAT on removable media
  - It is an open-source file system based on ISO/IEC 13346 and ECMA-167 standards, which define how data are stored and interchanged on a wide variety of optical media

3.5 Examine File System Using Autopsy and The Sleuth Kit Tools>

File System Analysis Using Autopsy
- Autopsy→is a digital forensics platform and graphical interface to The Sleuth Kit (TSK) and other digital forensics tools. It can be used to investigate activities on a computer.
- Some of the modules provide the following functions
  - Timeline analysis
    - Advanced graphical event viewing interface (video tutorial included)
  - Hash filtering
    - Flags known bad files and ignores known good files
  - Keyword search
    - Indexed keyword search to find files that mention relevant terms
  - Web artifacts
    - Extracts history, bookmarks, and cookies from Firefox, Chrome, and Internet Explorer
  - Data carving
    - Recovers deleted files from unallocated space using PhotoRec
  - Multimedia
    - Extracts Exif files from pictures and videos
  - Indicators of compromise
    - Scans a computer using Structured Threat Information Expression (STIX)
File System Analysis Using the Sleuth Kit (TSK)
- The Sleuth Kit (TSK) is a library and a collection of command-line tools that allow the investigation of volume and file system data
- The file system tools allow you to examine file systems of a suspect computer in a non-intrusive fashion
- The volume system (media management) tools allow you to examine the layout of disks and other media
- It supports DOS partitions, BSD partitions (disk labels), Mac partitions, Sun slices (Volume Table of Contents), and GPT disks
- It analyzes raw (i.e. dd), Expert Witness (i.e. EnCase), and AFF file systems and disk images
- It supports the NTFS, FAT, ExFAT, UFS 1, UFS 2, ext2, ext3, ext4, HFS, ISO 9660, and YAFFS2 file systems
- The Sleuth Kit (TSK): fsstat
  - The fsstat tool in TSK→retrieves and shows details associated with a file system
    - Syntax
      - fsstat [-f fstype ] [-i imgtype] [-o imgoffset] [-b dev_sector_size] [-tvV] image [images]
- The Sleuth Kit (TSK): istat
  - The istat tool in TSK→displays the uid, gid, mode, size, link number, MAC times, and all the disk units a structure has allocated
    - Syntax
      - istat [-B num ] [-f fstype ] [-i imgtype] [-o imgoffset] [-b dev_sector_size] [-vV] [-z zone ] [-s seconds ] image [images] inode
- The Sleuth Kit (TSK): fls and img_stat
  - The fls tool in TSK→lists file and directory names in a disk image
    - Syntax
      - fls [-adDFlpruvV] [-m mnt ] [-z zone ] [-f fstype ] [-s seconds ] [-i imgtype ] [-o imgoffset ] [-b dev_sector_size] image [images] [ inode ]
  - The Img_stat tool→displays details of an image file
    - Syntax
      - img_stat [-i imgtype] [-b dev_sector_size] [-tvV] image [images]
3.6 Understand Storage Systems>
RAID Storage System
- Redundant Array of Independent Disks (RAID) is a technology that simultaneously uses multiple smaller disks, which function as a single large volume
- Its mechanism involves accessing one or many separate hard disks, thereby decreasing the risk of losing all data in case of hard disk failures or damage.
- This technology is developed to
  - Maintain a large amount of data storage
  - Achieve a greater level of input/output performance
  - Achieve a greater reliability through data redundancy
- Levels of RAID Storage System
  - RAID 0
    - Data is split into blocks and written equally across multiple hard drives
    - It improves I/O performance by spreading the I/O load across many channels and disk drives
    - If any drive fails, data recovery is not possible
    - It does not provide data redundancy
    - It requires a minimum of two drives for setting up
  - RAID 1
    - It consists of two disks for each volume and is designed for data recovery in the event of disk failure
    - The contents of the two disks are identical
    - It ensures that data is not lost and assists in preventing computer downtime
  - RAID 2
    - It provides rapid access and increased storage by configuring two or more disks as one large volume, similar to RAID 1
    - Data is written to a disk on a bit level
    - Error correcting code (ECC) is used to verify whether the writes are successful
    - It has better data-integrity checking but is slower than RAID 0
  - RAID 3
    - It uses data striping and dedicated parity, and requires at least three disks
    - Data is striped at a byte level across multiple drives, and one drive is set to store parity information
    - If any drive fails, data recovery and error correction is possible through the parity drive
  - RAID 5
    - Data is striped at a byte level across multiple drives, and parity information is distributed among all member drives
    - Data writing process is slow
    - It requires a minimum of three drives for setup
  - RAID 10 or Mirrored Striping
    - It is a combination of RAID 0 (striping of volume data) and RAID 1 (disk mirroring) and requires at least four drives to implement
    - It has same fault tolerance as RAID level 1 and the same overheads as mirroring alone
    - It allows mirroring of disks in pairs for redundancy and improved performance, and then data is striped across multiple disks for maximum performance
  - RAID 6
    - RAID 6 (also known as double-parity RAID) is a type of RAID level in which data is striped across various physical drives and it uses dual parity to achieve better data redundancy than RAID 5
    - RAID 6 is similar to RAID 5 but offers high fault and drive-failure tolerance
    - Its configuration requires a minimum of 4 drives, and the system stores an additional parity block on each disk in the array to survive data losses in the event of double disk failure in the array
  - RAID 1E
    - It is a combination of RAID 1 (data mirroring) and RAID 0 (data striping)
    - It requires minimum of 3 drives
    - RAID 1E extends RAID 1 data availability across an odd number of disks and hence avoids multiple disk failure
  - RAID 5E
    - It is similar to RAID 5 but includes an integrated hot spare drive that can be used for input/output operations
    - It requires minimum of 4 drives
  - RAID 5EE
    - It is similar to RAID level 5E and includes an additional hot spare drive for input/output operations
    - It requires minimum of 4 drives
    - In RAID 5EE, the spare area is distributed next to parity stripes
  - RAID 50
    - It is a combination of RAID 5 (striping with parity) and RAID 0 (disk striping)
    - Its configuration requires minimum of 6 drives
    - It provides a high degree of fault tolerance since one drive in each sub-array may fail without the loss of data
  - RAID 60
    - Raid 60 is a combination of RAID 6 (distributed parity) and RAID 0 (disk striping)
    - Its configuration requires minimum of 6 drives
    - It provides a high degree of fault tolerance since each of the RAID 60 sets can survive double disk failure without losing any data
- Just a Bunch of Drives/Disks (JBOD)
  - JBOD (an acronym for ‘Just a Bunch Of Drives/Disks’) is a type of data storage configuration for multiple hard disks that do not support RAID arrays
  - It is defined as the concatenation of multiple hard disks of varying capacities and specifications into a single, large logical drive. This integration of disks is referred to as “Spanning”.
  - Every individual drive within the JBOD can be accessed as a separate drive volume by the host computer
  - It does not support redundancy, parity check, or striping, unlike RAID configurations
  - In the event of a single disk failure, the entire system does not fail, and the data available on the other disks remain intact
- Host Protected Areas (HPA) and Device Configuration Overlays (DCO)
  - Host Protected Areas (HPA) and Device Configuration Overlays (DCO) are the hidden areas of a hard disk
  - HPA
    - HPA is the reserved area on an HDD meant to store data in a way that the user, BIOS, or OS cannot modify, change, or access it
    - Information about HDD utilities, diagnostic tools, boot sector code, etc. are available in this area
  - DCO
    - DCO is an additional hidden area available on modern hard disks, which enables system vendors to buy HDDs of varying sizes from different manufacturers and configure all of them to have an equal number of sectors
    - It can also be used to enable/disable features on the HDD
  - With an intent to hide information, intruders use certain tools to modify and write to the HPA and DCO areas on the HDD
  - HPAs and DCOs are of concern during an investigation as many tools fail to detect their presence
  - Investigators can use tools such as EnCase, TAFT (an ATA (IDE) forensics tool), TSK, etc. to detect and image HPAs and/or DCOs
NAS/SAN Storage
- Network-Attached Storage (NAS)
  - Network-Attached Storage (NAS)→is a centralized storage device in which one or more servers with dedicated multiple hard drives in a RAID configuration are used to store and share data with clients on a shared network
  - It is present on a Local Area Network (LAN) as an independent network node can access the shared storage devices through a standard Ethernet connection, and is defined by its own unique IP address
- Storage Area Network (SAN)
  - A Storage Area Network (SAN)→is a dedicated high-speed network that provides access to consolidated block-level storage
  - Its architecture allows a network of storage devices to be accessible to multiple servers as attached drives by eliminating network bottlenecks
  - It consists of components such as interconnected hosts, multiple switches, and storage devices that can be connected via Fiber Channel Technology since it supports faster data rates and uninterrupted data access
  - Benefits
    - Highly scalable
    - Serverless backup
    - Faster data rate but expensive to maintain
- Differences between NAS and SAN

3.7 Understand Encoding Standards and Hex Editors>

Character Encoding Standard: ASCII
- American Standard Code for Information Interchange (ASCII) is→a character encoding scheme developed from telegraphic codes
- ASCII encodes 128 specified characters into 7-bit integers. The encoded characters are as follows
  - Numbers 0 to 9
  - Lowercase letters a to z
  - Uppercase letters A to Z
  - Basic punctuation symbols
  - Control codes that originated with Teletype machines
  - Space
- The ASCII table has 3 divisions, namely, non-printable (system codes between 0 and 31), lower ASCII (codes between 32 and 127), and higher ASCII (codes between 128 and 255)
Character Encoding Standard: UNICODE
- UNICODE is an international encoding standard which supports consistent encoding, representation, and management of text expressed in many writing systems
- It provides a unique number for every character, irrespective of the platform, program, and language
- UTF-8, UTF-16, and UTF-32 are the most widely used UTF character encodings
OFFSET
- In computing, an offset usually refers to either the start of a file or the start of a memory address
- It is the value added to the base address to derive the address of a specific element in the same object/dataset
- Example:
  - If “A” denotes address 80, then the expression A+20 implies the address 100, where 20 in the expression is the offset
Understanding Hex Editors
- Hex editors→are programs used to examine or modify the physical (i.e., byte per byte) structure of a binary file
- Usually, hex editors have three areas
  - Address area→located on the left and displays the address of the first byte of each line, usually in hexadecimal format
  - Hexadecimal area→located at the center and lists each byte of the file in a table, usually with 16 bytes per line
  - Character area→located on the right and displays the ASCII representation of each of the bytes in the hexadecimal area
- In forensics, hex editors are used to view stored or deleted data from both files and disk sectors
- In general, investigators use hex editors to examine evidence in specific parts of a disk
- Apart from the hexadecimal view of the data, many hex editors also display data both in binary and ASCII forms
Understanding Hexadecimal Notation
- Hexadecimal numeral system, also known as hex, is a numeral system with base 16
- In hexadecimal notation, 0-9 represent the values zero to nine, and English alphabets A, B, C, D, E, and F represent the values ten to fifteen
- Example:
  - 2BA in hexadecimal is the same as 0010 1011 1010 in binary
- Hexadecimal notation allows the easy use of powers of 2, instead of writing the whole value in binary

3.8 Analyze Popular File Formats Using Hex Editor>

Image File Analysis: JPEG
- The JPEG (Joint Photographic Experts Group) is a commonly used method to compress photographic images
- It uses a compression algorithm to minimize the size of a file without affecting the quality of the image
Image File Analysis: BMP
- The Bitmap (BMP) is a standard file format for a Windows Device Independent Bitmap (DIB) file
- The size and color of these images can vary from 1 bit per pixel (black and white) to 24-bit color (16.7 million colors)
Hex View of Popular Image File Formats
- The GIF is a file format that contains 8 bits per pixel and displays 256 colors per frame
- It uses a lossless data compression technique to maintain the visual quality of the image
- The PNG is a lossless data compression image format, intended to replace the GIF and TIFF formats
- It supports the following:
  - Indexed/Palette-based images (24-bit RGB or 32-bit RGBA colors)
  - Grayscale images (with or without alpha channel)
  - Transparency (both normal and alph
PDF File Analysis
- Attackers use PDF (Portable Document Format) and Microsoft Office (Word, PowerPoint, and Excel) files as attack vectors because of their wide usage by individuals and organizations
- Thus, it is essential for an investigator to understand the PDF and Microsoft Office file formats and structures, which may assist during malicious document analysis
- PDF File Structure
Word File Analysis
- MS Office Documents – File Format
  - Binary File Format
    - Usually, attackers target MS Office documents of the binary format because they are still in use and the file structures are highly complex
- Microsoft Word File Structure (.doc/.docx)
  - Microsoft Word File Structure (.doc/.docx)
    - Word Document Stream/Main Stream
      - The mainstream contains binary data of the Word document and Word file header (known as the File Information Block (FIB)) located at the offset 0
      - FIB contains information about the document, and file length, and specifies pointers to elements in the document file
    - Summary Information Streams
      - The summary information is stored in two storage streams: Summary Information and DocumentSummaryInformation
    - Table Stream (0Table or 1Table)
      - It contains data referenced from the FIB and other parts of the file
      - It stores various plex of character positions (PLCs) and tables defining the document’s structure
      - It has a predefined structure only for encrypted files
    - Data Stream (Optional)
      - It has no predefined structure
      - It contains data referenced from the FIB in the mainstream or other parts of the file
    - Object Streams
      - They hold binary data for embedded OLE objects within the .doc file
PowerPoint File Analysis
- Microsoft PowerPoint Presentation File Structure (.ppt/.pptx)
  - Microsoft PowerPoint Presentation File Structure (.ppt/.pptx)
    - Current User Stream
      - It maintains CurrentUserAtom record, which identifies the name of the last user who opened/modified a target PPT and the location of the most recent user edit
    - PowerPoint Document Stream
      - It contains information about the presentation layout and its contents
    - Pictures Stream (Optional)
      - It contains information about embedded image files within the presentation
    - Summary Information Streams (Optional)
      - The summary information is stored in two storage streams: SummaryInformation and DocumentSummaryInformation
Excel File Analysis
- Microsoft Excel File Structure (.xls/.xlsx)
Hex View of Other Popular File Formats
Hex View of Popular Video File Formats
Hex View of Popular Audio File Formats

John Tai

📁 Hard Disks and File Systems

Table of Contents