What is a File?
A file is a named collection of related data stored on secondary memory. The file system provides persistent storage and organizes files into directories.
File Attributes
- Name.
- Identifier / inode number.
- Type (text, executable, directory).
- Location (disk blocks).
- Size.
- Protection (rwx for owner/group/others).
- Timestamps (create, modify, access).
- Owner / group.
File Operations
create, open, read, write, seek, close, delete, truncate, rename, link.
Access Methods
- Sequential — read/write in order. Simple, good for tapes and logs.
- Direct (random) — access block n directly. Needed for databases.
- Indexed — an index maps keys to block numbers.
Directory Structures
- Single-level — one flat list; naming collisions.
- Two-level — per-user directory.
- Tree — arbitrary hierarchy; absolute/relative paths.
- Acyclic graph — allow shared files via links.
- General graph — cycles allowed; need garbage collection.
File Protection
Access control: 9-bit rwx permissions (UNIX), or Access Control Lists (ACLs) for fine-grained entries. Always check on the server/kernel side.
File Allocation Methods
Contiguous Allocation
Each file occupies consecutive blocks. Pros: fast sequential and random access; simple. Cons: external fragmentation and difficult growth. Used in CD-ROMs and FAT with pre-allocation.
Linked Allocation
Each block has a pointer ao the next. Pros: no external fragmentation, files can grow. Cons: slow random access, pointer overhead, reliability risk.
Indexed Allocation
An index block holds pointers to all file blocks. Pros: fast random access. Cons: index overhead; for large files, use multi-level or combined indexing. UNIX inodes combine direct, single-indirect, double-indirect, and triple-indirect pointers.
UNIX inode
struct inode {
file_type; owner; group; permissions;
size; timestamps; reference_count;
direct_ptr[12]; single_ind; double_ind; triple_ind;
};Free-Space Management
- Bit vector (bitmap) — one bit per block; simple; easy to find contiguous runs.
- Linked list — free blocks chained; low overhead but slow to traverse.
- Grouping — first free block stores addresses of n other free blocks.
- Counting — store {start, count} records of contiguous free runs.
File System Mounting
Before a file system can be used, it must be mounted onto a directory in the existing tree (mount point). UNIX uses mount; Windows uses drive letters.
Journaling
A crash in the middle of a metadata update can corrupt the file system. Journaling file systems (ext4, NTFS, XFS) first write intent to a log and only then perform the actual update. After a crash, replay the log to recover.
Log-Structured and Copy-on-Write
Modern file systems (ZFS, Btrfs, APFS) write all changes as new blocks and update pointers atomically. Snapshots and checksums come for free, improving reliability.
Common File Systems
- FAT12/16/32, exFAT — simple, wide support.
- NTFS — Windows.
- ext2/3/4 — Linux; ext4 is the workhorse.
- XFS, Btrfs, ZFS — advanced features.
- APFS — macOS/iOS.
Summary
The file system turns raw storage blocks into named files and hierarchical directories, with allocation, free-space, and protection mechanisms beneath. Journaling and copy-on-write bring crash resilience that matches today's expectations.
Important Questions
- Define file and list its attributes.
- Compare sequential and direct access methods.
- Differentiate tree and acyclic graph directory structures.
- Compare contiguous, linked, and indexed file allocation.
- Explain the UNIX inode and multi-level indexing.
- Describe free-space management techniques.
- What is journaling? Why is it important?
- List four common file systems and their usage.