Debug FreeBSD - Open

| Comments

Open System Call

For most file systems, a program initializes access to a file in a filesystem using the open system call. This allocates resources associated to the file (the file descriptor), and returns a handle that the process will use to refer to that file. In some cases the open is performed by the first access.

More information can be found at wikipedia article or Linux manual page.

This post will explore the code flow when open system call is issued.

Bird Eye View

Open system call goes through lot of twist and turns and explore some of the most complicated code paths. On a very high level it tries to do the following task.

  1. Allocate file structure and file descriptor
  2. Associate file structure with file descriptor
  3. Pathname lookup will return vnode
  4. Associate vnode with file structure
  5. Return file descriptor back to the user

Map - Open

All the system calls are mapped with system call number in syscalls.master (kern).

For open system call

System Call NumberSystem CallSystem Call Signature
5 AUE_OPEN_RWTC int open(char *path, int flags, int mode);

More details about how system call is executed can be found here.

File Structure & File Descriptor

Open system call starts from function open(td, uap) which calls function kern_open() which finally calls function kern_openat(). This is the function from where all the action starts.

In function kern_openat(), we are going to allocate the file descriptor. To do so we call function falloc().

Function falloc(struct thread *td, struct file **resultfp, int *resultfd) returns back pointer to the file entry and file descriptor.

File Structure Allocation

Allocate space for file structure in kernel either by malloc or we use zalloc here because memory is assigned from file zone. There are multiple zones like file_zone, proc_zone or thread_zone. So, we ask memory from file zone and we pass the parameter is space in this zone is not available then are we willing to wait. Its a synchronous call. If memory is not available then thread will go to sleep if M_WAITOK is set.

falloc() code
     fp = uma_zalloc(file_zone, M_WAITOK | M_ZERO);

We get back pointer to file entry. Zone limits are set during boot up time and whenever allocation or free happens it happens from that limit allocated.

File structure looks as follows

file structure code

struct file {
     void              *f_data;     /* file descriptor specific data */
     struct fileops     *f_ops;          /* File operations */
     struct ucred       *f_cred;     /* associated credentials. */
     struct vnode      *f_vnode;     /* NULL or applicable vnode */
     short          f_type;          /* descriptor type */
     short          f_vnread_flags; /* (f) Sleep lock for f_offset */
     volatile u_int     f_flag;          /* see fcntl.h */
     volatile u_int      f_count;     /* reference count */
      *  DTYPE_VNODE specific fields.
     int          f_seqcount;     /* Count of sequential accesses. */
     off_t          f_nextoff;     /* next expected read/write offset. */
     struct cdev_privdata *f_cdevpriv; /* (d) Private data for the cdev. */
      *  DFLAG_SEEKABLE specific fields
     off_t          f_offset;
      * Mandatory Access control information.
     void          *f_label;     /* Place-holder for MAC label. */

We have created a file structure now we have to initialize it. So we,

  • Increment the ref count for the file structure to 1.
  • Assign the input credentials.
  • Initialize the file ops to badfileops and we will update it later as we figure it out.
  • data is null
  • There are no vnode associated yet.

File Descriptor

Now, allocate the file descriptor. For that get lock and call fdalloc() to allocate fd.

fdalloc() code
if ((error = fdalloc(td, 0, &i))) {

Here ‘i’ is the file descriptor to be returned.

Once we get file descriptor from function fdalloc() we associate the file descriptor to file structure we allocated above.

fdalloc() code
p->p_fd->fd_ofiles[i] = fp;

Here p is the process structure where it maintains all the files opened.

Return the fd and file structure pointer from function falloc() to function kern_openat()

How file descriptor is allocated (fdalloc)

In function fdalloc(), we search the bitmap for a free descriptor. If we can’t find then grow the file table until limit is hit.

fdalloc() code
for (;;) {
          fd = fd_first_free(fdp, minfd, fdp->fd_nfiles);
          if (fd >= maxfd)
               return (EMFILE);
          if (fd < fdp->fd_nfiles)
          fdgrowtable(fdp, min(fdp->fd_nfiles * 2, maxfd));

Call Tree

Pathname Lookup

Now we pick up the mode information for file provided as input. Mode value will be used if we have to create a file.

Now we call vn_open() function and pass the input information.

vn_open() code
error = vn_open(&nd, &flags, cmode, fp); 

Here, struct nameidata nd;
If we don’t hit any error then we will get node related information in nd and all we do is extract the information and tie up with the fp file structure pointer.

Associate vnode with file structure

If we don’t hit any error from vn_open() then we will get node related information in nd and all we do is extract the information and tie up with the fp file structure pointer.

Assign vnode code
error = vn_open(&nd, &flags, cmode, fp); 
vp = nd.ni_vp;
fp->f_vnode = vp;

By this time if file operation is not set i.e. if it is still badfileops then we initialize it.

badfileops code
if (fp->f_ops == &badfileops) {
          KASSERT(vp->v_type != VFIFO, ("Unexpected fifo."));
          fp->f_seqcount = 1;
          finit(fp, flags & FMASK, DTYPE_VNODE, vp, &vnops);

So far, we have done all the book keeping operations. Once its done we need to write data to the file. Now, we do the locking of a file. Based on what kind of locking is requested i.e.

  1. Exclusive lock
  2. Shared Lock
  3. Write Lock
  4. Read Lock
  5. Lease to client

So, we do the locking of file. Write to the file and file set the attributes to the file.

file locking code

if (flags & (O_EXLOCK | O_SHLOCK)) {
          if ((error = VOP_ADVLOCK(vp, (caddr_t)fp, F_SETLK, &lf,type)) != 0) {

if (flags & O_TRUNC) {
          if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
               goto bad;
          VOP_LEASE(vp, td, td->td_ucred, LEASE_WRITE);
          vat.va_size = 0;
          error = VOP_SETATTR(vp, &vat, td->td_ucred);

Return file descriptor back to the user

Finally we return fd (indx) and 0 which is success from function kern_openat()

Return fd code
     td->td_retval[0] = indx;
    return (0);