5 min read

Categories

Tags

This post will explore the code flow when Open system call is issued.

Table of Contents

  1. Open System Call
  2. Bird Eye View
  3. Map - Open
  4. File Structure & File Descriptor
    1. File Structure Allocation
    2. File Descriptor
    3. How file descriptor is allocated (fdalloc)
  5. Call Tree
  6. Pathname Lookup
  7. Associate vnode with file structure
  8. Return file descriptor back to the user

Open System Call

For most file systems, a program initializes access to a file in a filesystem using the open system call. This allocates resources associated to the file (the file descriptor), and returns a handle that the process will use to refer to that file. In some cases the open is performed by the first access.


Bird Eye View

Open system call goes through lot of twist and turns and explore some of the most complicated code paths. On a very high level it tries to do the following tasks:

  1. Allocate file structure and file descriptor
  2. Associate file structure with file descriptor
  3. Pathname lookup will return vnode
  4. Associate vnode with file structure
  5. Return file descriptor back to the user



Map - Open

All the system calls are mapped with system call number in syscalls.master (kern).

System Call Number System Call System Call Signature
5 AUE_OPEN_RWTC int open(char *path, int flags, int mode);

More details about how system call is executed, can be found here.



File Structure & File Descriptor

‘Open’ system call starts from function open(td, uap) which calls function kern_open() which finally calls function kern_openat(). This is the function from where all the action starts.

In function kern_openat(), we are going to allocate the file descriptor. To do so we call function falloc().

Function falloc(struct thread *td, struct file **resultfp, int *resultfd) returns back pointer to the file entry and file descriptor.


File Structure Allocation

Allocate space for file structure in kernel either by malloc or we use zalloc here because memory is assigned from file zone. There are multiple zones like file_zone, proc_zone or thread_zone. So, we ask memory from file zone and we pass the parameter which is space in this zone, if not available then are we willing to wait. Its a synchronous call. If memory is not available then thread will go to sleep if M_WAITOK is set.

1
     fp = uma_zalloc(file_zone, M_WAITOK | M_ZERO);

We get back pointer to file entry. Zone limits are set during boot up time and whenever allocation or free happens it occurs from that limited allocation.

File structure looks as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// https://github.com/coolgoose85/FreeBSD/blob/master/sys/sys/file.h#L116

struct file {
     void              *f_data;     /* file descriptor specific data */
     struct fileops     *f_ops;          /* File operations */
     struct ucred       *f_cred;     /* associated credentials. */
     struct vnode      *f_vnode;     /* NULL or applicable vnode */
     short          f_type;          /* descriptor type */
     short          f_vnread_flags; /* (f) Sleep lock for f_offset */
     volatile u_int     f_flag;          /* see fcntl.h */
     volatile u_int      f_count;     /* reference count */
     /*
      *  DTYPE_VNODE specific fields.
      */
     int          f_seqcount;     /* Count of sequential accesses. */
     off_t          f_nextoff;     /* next expected read/write offset. */
     struct cdev_privdata *f_cdevpriv; /* (d) Private data for the cdev. */
     /*
      *  DFLAG_SEEKABLE specific fields
      */
     off_t          f_offset;
     /*
      * Mandatory Access control information.
      */
     void          *f_label;     /* Place-holder for MAC label. */
};

We have created a file structure now we have to initialize it. So we,

  • Increment the ref count for the file structure to 1.
  • Assign the input credentials.
  • Initialize the file ops to badfileops and we will update it later as we figure it out.
  • data is null
  • There are no vnode associated yet.

File Descriptor

Now, allocate the file descriptor. For that get lock and call fdalloc() to allocate fd.

1
2
// fdalloc()  https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/kern_descrip.c#L1446 
if ((error = fdalloc(td, 0, &i))) {

Here ‘i’ is the file descriptor to be returned.

Once we get file descriptor from function fdalloc() we associate the file descriptor to file structure we allocated above.

1
2
// fdalloc()  https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/kern_descrip.c#L1453
p->p_fd->fd_ofiles[i] = fp;

Here p is the process structure where it maintains all the files opened.

Return the fd and file structure pointer from function falloc() to function kern_openat()


How file descriptor is allocated (fdalloc)

In function fdalloc(), we search the bitmap for a free descriptor. If we can’t find then grow the file table until limit is hit.

1
2
3
4
5
6
7
8
9
10
// fdalloc() https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/kern_descrip.c#L1352
for (;;) {
          fd = fd_first_free(fdp, minfd, fdp->fd_nfiles);
          if (fd >= maxfd)
               return (EMFILE);
          if (fd < fdp->fd_nfiles)
               break;
          fdgrowtable(fdp, min(fdp->fd_nfiles * 2, maxfd));
          }
          ...



Call Tree

center-aligned-image



Pathname Lookup

Now we pick up the mode information for file provided as input. Mode value will be used if we have to create a file.

Now we call vn_open() function and pass the input information.

1
2
3
// vn_open() https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_vnops.c#L89

error = vn_open(&nd, &flags, cmode, fp); 

Here, struct nameidata nd;
If we don’t hit any error then we will get node related information in nd and all we do is extract the information and tie up with the fp file structure pointer.



Associate vnode with file structure

If we don’t hit any error from vn_open() then we will get node related information in nd and all we do is extract the information and tie up with the fp file structure pointer.

1
2
3
4
5
6
Assign vnode https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1131
error = vn_open(&nd, &flags, cmode, fp); 
...
...
vp = nd.ni_vp;
fp->f_vnode = vp;

By this time if file operation is not set i.e. if it is still badfileops then we initialize it.

1
2
3
4
5
6
badfileops https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1138
if (fp->f_ops == &badfileops) {
          KASSERT(vp->v_type != VFIFO, ("Unexpected fifo."));
          fp->f_seqcount = 1;
          finit(fp, flags & FMASK, DTYPE_VNODE, vp, &vnops);
 }

So far, we have done all the book keeping operations. Once its done we need to write data to the file. Now, we do the locking of a file. Based on what kind of locking is requested i.e.

  1. Exclusive lock
  2. Shared Lock
  3. Write Lock
  4. Read Lock
  5. Lease to client

Lock the file, write to the file and set the attributes to the file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
file locking https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1145

if (flags & (O_EXLOCK | O_SHLOCK)) {
     
     
          if ((error = VOP_ADVLOCK(vp, (caddr_t)fp, F_SETLK, &lf,type)) != 0) {
               
          }
     
 }

if (flags & O_TRUNC) {
          if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
               goto bad;
          VOP_LEASE(vp, td, td->td_ucred, LEASE_WRITE);
          
          
          vat.va_size = 0;
          
          error = VOP_SETATTR(vp, &vat, td->td_ucred);
          
}



Return file descriptor back to the user

Finally we return fd (indx) and 0 which is success from function kern_openat()

1
2
3
4
 // Return fd https://github.com/coolgoose85/FreeBSD/blob/master/sys/kern/vfs_syscalls.c#L1184
 	td->td_retval[0] = indx;
	return (0);