Skip to content

Linux IO data structure

Linux IO data structure: three tables

它非常重要,它是理解Linux IO的重要基础。

File descriptor table



File descriptor


1、File Descriptor Flags

2、File pointer

File table


1、OS wide

File table entry

open(2) 中,将此称为 "open file description" 。


1、file mode/file status flag

2、current file offset

inode table


APUE 3.10 File Sharing

we’ll examine data structures used by the kernel for all I/O, The following description is conceptual; it may or may not match a particular implementation.

The kernel uses three data structures to represent an open file, and the relationships among them determine the effect one process has on another with regard to file sharing.



wikipedia File descriptor

In the traditional implementation of Unix, file descriptors index into a per-process file descriptor table maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table. This table records the mode with which the file (or other resource) has been opened: for reading, writing, appending, and possibly other modes. It also indexes into a third table called the inode table that describes the actual underlying files.


table 1: file descriptors index into a per-process file descriptor table

table 2: system-wide table of files opened by all processes, called the file table

table 3: inode table that describes the actual underlying file

wikipedia Process control block

File descriptor is a reference to an open file description

一、这是在 open(2) 中提出的,我觉得它概括地非常好,它是理解后面的"System call 和 IO data structure"的基础。

三、在 close(2) 中,描述了Linux OS kernel何时释放 "the underlying open file description" 的内容,提示了我: Linux OS kernel对IO data structure的memory management 使用了类似于 reference counting的technique:

1、File descriptor is a reference to an open file description

2、当open file description的所有file descriptor被close(2)后,即它的reference count为0了,此时OS kernel就可以释放 " the underlying open file description " 了。

System call 和 IO data structure

system call 和 IO data structure之间的关联,需要进行总结。




close() closes a file descriptor, so that it no longer refers to any file and may be reused.

If fd is the last file descriptor referring to the underlying open file description (see open(2) ), the resources associated with the open file description are freed; if the descriptor was the last reference to a file which has been removed using unlink(2) the file is deleted.



2、下面的图是参考"APUE3.10 File Sharing"绘制的


源自: APUE Figure 3.7 Kernel data structures for open files;

file descriptor 1 和 file descriptor 2分别对应不同的file



源自: APUE Figure 3.8 Two independent processes with the same file open


源自: APUE Figure 8.2 Sharing of open files between parent and child after fork

进程执行dup系列函数来clone a file descriptor

源自: Figure 3.9 Kernel data structures after dup(1)


Pass file descriptor

Race condition

Race condition 和 file sharing密切相关,在下面章节中,对race condition进行了讨论:




通过 Book-APUE/3-File-IO/3.10-File-Sharing 中的内容可知:

一、对于"不同进程打开同一个文件"的情况是存在race condition的:

由于current file offset是在file table entry上的,这就可能导致overwrite。

二、对于共用file table entry的情况,是不存在race condition的。

stackoverflow two file descriptors to same file


It depends on where you got the two file descriptors. If they come from a dup(2) call, then they share file offset and status, so doing a write(2) on one will affect the position on the other. If, on the other hand, they come from two separate open(2) calls, each will have their own file offset and status.


上面描述的第二种情况,就会产生race condition

A file descriptor is mostly just a reference to a kernel file structure, and it is that kernel structure that contains most of the state. When you open(2) a file, you get a new kernel file structure and a new file descriptor that refers to it. When you dup(2) a file descriptor (or pass a file descriptor through sendmsg), you get a new reference to the same kernel file struct.


关于 "pass a file descriptor",参见 Pass-file-descriptor 章节