This post will explore the code flow when Open system call is issued.
Table of Contents
Open System Call
For most file systems, a program initializes access to a file in a filesystem using the open system call. This allocates resources associated to the file (the file descriptor), and returns a handle that the process will use to refer to that file. In some cases the open is performed by the first access.
Bird Eye View
Open system call goes through lot of twist and turns and explore some of the most complicated code paths. On a very high level it tries to do the following tasks:
- Allocate file structure and file descriptor
- Associate file structure with file descriptor
- Pathname lookup will return vnode
- Associate vnode with file structure
- Return file descriptor back to the user
Map - Open
All the system calls are mapped with system call number in syscalls.master (kern).
System Call Number | System Call | System Call Signature |
---|---|---|
5 | AUE_OPEN_RWTC | int open(char *path, int flags, int mode); |
More details about how system call is executed, can be found here.
File Structure & File Descriptor
‘Open’ system call starts from function open(td, uap) which calls function kern_open() which finally calls function kern_openat(). This is the function from where all the action starts.
In function kern_openat(), we are going to allocate the file descriptor. To do so we call function falloc().
Function falloc(struct thread *td, struct file **resultfp, int *resultfd) returns back pointer to the file entry and file descriptor.
File Structure Allocation
Allocate space for file structure in kernel either by malloc or we use zalloc here because memory is assigned from file zone. There are multiple zones like file_zone, proc_zone or thread_zone. So, we ask memory from file zone and we pass the parameter which is space in this zone, if not available then are we willing to wait. Its a synchronous call. If memory is not available then thread will go to sleep if M_WAITOK is set.
1
fp = uma_zalloc(file_zone, M_WAITOK | M_ZERO);
We get back pointer to file entry. Zone limits are set during boot up time and whenever allocation or free happens it occurs from that limited allocation.
File structure looks as follows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// https://github.com/coolgoose85/FreeBSD/blob/master/sys/sys/file.h#L116
struct file {
void *f_data; /* file descriptor specific data */
struct fileops *f_ops; /* File operations */
struct ucred *f_cred; /* associated credentials. */
struct vnode *f_vnode; /* NULL or applicable vnode */
short f_type; /* descriptor type */
short f_vnread_flags; /* (f) Sleep lock for f_offset */
volatile u_int f_flag; /* see fcntl.h */
volatile u_int f_count; /* reference count */
/*
* DTYPE_VNODE specific fields.
*/
int f_seqcount; /* Count of sequential accesses. */
off_t f_nextoff; /* next expected read/write offset. */
struct cdev_privdata *f_cdevpriv; /* (d) Private data for the cdev. */
/*
* DFLAG_SEEKABLE specific fields
*/
off_t f_offset;
/*
* Mandatory Access control information.
*/
void *f_label; /* Place-holder for MAC label. */
};
We have created a file structure now we have to initialize it. So we,
- Increment the ref count for the file structure to 1.
- Assign the input credentials.
- Initialize the file ops to badfileops and we will update it later as we figure it out.
- data is null
- There are no vnode associated yet.
File Descriptor
Now, allocate the file descriptor. For that get lock and call fdalloc() to allocate fd.
1
2
// fdalloc() https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/kern_descrip.c#L1446
if ((error = fdalloc(td, 0, &i))) {
Here ‘i’ is the file descriptor to be returned.
Once we get file descriptor from function fdalloc() we associate the file descriptor to file structure we allocated above.
1
2
// fdalloc() https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/kern_descrip.c#L1453
p->p_fd->fd_ofiles[i] = fp;
Here p is the process structure where it maintains all the files opened.
Return the fd and file structure pointer from function falloc() to function kern_openat()
How file descriptor is allocated (fdalloc)
In function fdalloc(), we search the bitmap for a free descriptor. If we can’t find then grow the file table until limit is hit.
1
2
3
4
5
6
7
8
9
10
// fdalloc() https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/kern_descrip.c#L1352
for (;;) {
fd = fd_first_free(fdp, minfd, fdp->fd_nfiles);
if (fd >= maxfd)
return (EMFILE);
if (fd < fdp->fd_nfiles)
break;
fdgrowtable(fdp, min(fdp->fd_nfiles * 2, maxfd));
}
...
Call Tree
Pathname Lookup
Now we pick up the mode information for file provided as input. Mode value will be used if we have to create a file.
Now we call vn_open() function and pass the input information.
1
2
3
// vn_open() https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_vnops.c#L89
error = vn_open(&nd, &flags, cmode, fp);
Here, struct nameidata nd;
If we don’t hit any error then we will get node related information in nd and all we do is extract the information and tie up with the fp file structure pointer.
Associate vnode with file structure
If we don’t hit any error from vn_open() then we will get node related information in nd and all we do is extract the information and tie up with the fp file structure pointer.
1
2
3
4
5
6
Assign vnode https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1131
error = vn_open(&nd, &flags, cmode, fp);
...
...
vp = nd.ni_vp;
fp->f_vnode = vp;
By this time if file operation is not set i.e. if it is still badfileops then we initialize it.
1
2
3
4
5
6
badfileops https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1138
if (fp->f_ops == &badfileops) {
KASSERT(vp->v_type != VFIFO, ("Unexpected fifo."));
fp->f_seqcount = 1;
finit(fp, flags & FMASK, DTYPE_VNODE, vp, &vnops);
}
So far, we have done all the book keeping operations. Once its done we need to write data to the file. Now, we do the locking of a file. Based on what kind of locking is requested i.e.
- Exclusive lock
- Shared Lock
- Write Lock
- Read Lock
- Lease to client
Lock the file, write to the file and set the attributes to the file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
file locking https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1145
if (flags & (O_EXLOCK | O_SHLOCK)) {
…
…
if ((error = VOP_ADVLOCK(vp, (caddr_t)fp, F_SETLK, &lf,type)) != 0) {
…
}
…
}
if (flags & O_TRUNC) {
if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
goto bad;
VOP_LEASE(vp, td, td->td_ucred, LEASE_WRITE);
…
…
vat.va_size = 0;
…
error = VOP_SETATTR(vp, &vat, td->td_ucred);
…
}
Return file descriptor back to the user
Finally we return fd (indx) and 0 which is success from function kern_openat()
1
2
3
4
// Return fd https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1184
td->td_retval[0] = indx;
return (0);