You're reading for free via Monethic.io's Friend Link. Become a member to access the best of Medium.

Member-only story

Fixing an Infinite Loop on Unix

Symlinks, Firmlinks, /dev, FIFO, and other magic words in Unix

Karol Mazurek
10 min readNov 15, 2024

Writing software often means hitting roadblocks you never see coming—and sometimes, the solution is much simpler than you think. Recently, I encountered an issue with my MachOFileFinder on MacOS. It led me down an exciting path involving links and recursive loops they create and various file structures, flags, and other stuff related to the Unix file system.

I would like to thank Gergely Kalman, who showed me where the source of the problem might be. I found out later that the root cause was different, but I learned some new things and refreshed my knowledge about Unix.

I am sure that many of you will find the rabbit holes described here and the solution to the problem both interesting and valuable.

Here’s the story of the bug, the solution, and what it taught me about macOS.

The story behind the bug

The tool I was working on is designed to locate Mach-O files in a macOS filesystem, identify their types, and optionally filter for ARM64 files.

I just updated it, so it is much faster. I described the whole process in the below article. However, I did not test it on the whole system properly:

It worked smoothly most of the time when used for a single directory, and then Gergely asked me if I could run it in the root directory on the updated macOS to compare the findings with his tool. We wanted to ensure we gathered all Mach-O files on the latest macOS version.

I started the MachOFileFinder and went AFK for about two hours. When I returned, I noticed it got stuck, but by checking the outputs, it looked like the tool had finished its job. I exited it using CTRL+C in the terminal and sent the outputs to Gergely. I was surprised that a few files were missing!

So, I ran it for the second time and left the tool overnight. However, this time I used the time command to check the user and system times.

The problem

When I checked it in the morning, it was again stuck. I again exited it using CTRL+C and saw it was using user and system resources for about 16m:

In reality, the tool did nothing for over 940 minutes! It gathered the same output as Gergely in just a few minutes and then froze for hours.

Clearly, something was wrong. Gergely mentioned that the issue might be related to symlinks and directory loops in Unix, so I started to dig.

Fixing Infinite Loop

In macOS, some directories are structured in a way that causes them to point back to parent folders. For instance, the paths below can return to each other, creating an endless cycle when traversing recursively:

/System/Volumes/Data
'/System/Volumes/Data/Volumes/Macintosh HD/System/Volumes/Data'

At first, I thought the tool followed these paths, leading to infinite traversal of the same directories. So, I started to read how my code is handling symlinks.

Rabbit Hole I: Symlinks

A Symbolic Link is a pointer to another location in the filesystem. It is a helpful feature but can cause an infinite loop on recursive directory scan:

MacOS, for example, has system-level paths that point back and forth to each other. For instance /System/Volumes/Data/Volumes/Macintosh HD is symlinked so that it can point back to a higher-level directory, creating the possibility of an infinite loop if we are recursively traversing directories.

There is a simple way to address such an issue with Python with os.walk parameter followlinks set to False, which would not fall into this trap:

os.walk(path, followlinks=False)

In a simple example, if followlinks would be set to True, we would fall into an infinite loop because of symlink_dir in a sub_dir pointed to dir:

However, this should not be the issue in my case because it was set to False by default, as we can read in the official documentation:

Source

Any recursive file traversal can repeat the same set of directories indefinitely. Still, it should not in my case, as the tool used os.walk. So why?

Design issue with os.walk

When you pass a symlink as the base path to os.walk, it will follow that initial symlink, regardless of the followlinks setting. For example:

import os

# Define the path to test
base_path = '/System/Volumes/Data/Volumes/'

# Walk through the directory without following symlinks
for root, dirs, files in os.walk(base_path, followlinks=False):
print("Currently in directory:", root)

The output below is cut short, so I stopped the program with CTRL+C. As we can see, it resolved the initial symlink and traversed the directories:

So, if we want to avoid following symlinks when using os.walk we need to check if the initial path is also a symlink. For example:

if os.path.islink(base_path):
print(f"{base_path} is a symlink. Skipping traversal.")
else:
# Walk through the directory without following symlinks for subdirectories

I did some testing, which was not the issue, as the base path in MachOFileFinder was set only once at the start, and honestly, I do not want to change this standard behavior of symlinks resolution of os.walk in my tool.

Rabbit Hole II: Firmlinks

Gergely showed me some code on how he handled the shortcuts on macOS by implementing a blacklist. After a closer look, I saw that it looked like Firmlinks available on macOS in /usr/share/firmlinks file:

macOS defines Firmlinks via flags in stat.h, specifically SF_FIRMLINK.

bsd/sys/stat.h#L512

However, the stat command does not reliably report SF_FIRMLINK flag, which makes it challenging to identify firmlinks from the command line:

The stat command returned a 0x100000 value (SF_NOUNLINK) suggesting that SF_FIRMLINK is not exposed via stat.

This is a problem for tools in Python, as they utilize API, which stat rely on.

Detecting Firmlinks

There is a tricky way to determine if a given directory is a Firmlink from the terminal using ls. Most of the Firmlinks have sunlnk flag set:

/bin/ls -laO /

We can see this thanks to the -O option described in man pages of ls:

Unfortunately, when looking for this chflags in macOS man pages, there is no info about sunlnk. I found it in BSD online. Still, it does not say much:

Source

Directories marked with this flag cannot be removed, but files inside can be created|modified|removed. This flag is related to SIP on macOS.

This detection method is not bulletproof, though, as some of the Firmlinks from the list do not have the sunlnk set, but instead restricted flag:

/System/Library/Assets
/System/Library/PreinstalledAssets
/System/Library/AssetsV2
/System/Library/PreinstalledAssetsV2
/System/Library/CoreServices/CoreTypes.bundle/Contents/Library

Another method is to read the /usr/share/firmlinks and make a blacklist, but there is a catch! There is one more file that should be taken into account /etc/synthetic.conf. However, it does not exist by default.

man synthetic.conf

The third way is to use ctypes and port getattrlistbulk() which, according to the comment below, returns the SF_FIRMLINK flag:

Source

Maybe there are some more ways. I stopped digging after this, as I finally debugged the issue properly, as every Python expert should—using print()!

The root cause of the Infinite Loop: /dev

I thought the source of the problems was shortcuts, but I handled Symlinks well, implemented a blacklist for Frimlinks, and the issue persisted. Why?

Then, I did something that I should have done at the beginning of this journey: I debugged the tool instead of mindlessly betting on what the problem might be.

I started simply by putting the print() function here and there. That was enough to find out the problem is /dev directory as the tool froze at it:

When debugged further, it seems like we cannot use f.read() on files in the /dev directory, as the program hangs exactly at that moment:

MachOFileFinder.py#L53

We cannot read from files such as /dev/tty. They are interactive device files that wait for user input and block indefinitely if none is provided.

The /dev directory was the source of the problem.

Go to jail /dev!

The solution is simple — make a blacklist for /dev. However, I prepared a more robust code for those who cannot blacklist /dev for any reason:

# Use stat to check if the file is a character device
try:
if stat.S_ISCHR(os.stat(file_path).st_mode):
print(f"Skipping character device: {file_path}")
return None
except (OSError, IOError) as e:
print(f"Error checking file type for {file_path}: {e}")
return None

In MachOFileFinder, I wanted to implement a blacklist because it is more time efficient, and I want the tool to be as fast as possible.

One more Edge case: FIFO

However, it is good that I first tested the robust version of the fix before implementing the blacklist for /dev as I found out one more edge case:

Another infinite loop, this time because of the FIFO file structure:

FIFOs (named pipes) are special files used for one-way communication between processes, allowing them to send data sequentially. For example:

Similarly to character devices, it waits for interaction, so f.read() is stuck.

Solution

Firstly, I wanted to make a blacklist, so the tool would not lose speed, but this second issue showed that I needed a more general fix. The FIFO edge case fix would be similar to the character devices, but now we filter FIFO:

if stat.S_ISFIFO(file_mode):
print(f"Skipping named pipe (FIFO): {file_path}")
return None

I tested this on macOS 15.1. The tool finally exited normally when run on the / root directory! The infinite loop has gone, but there are more special file types. They may freeze the tool and will not be a MachO for sure.

Source

The best solution for resisting future changes in the file system would be to check for a regular file, as Mach-O files are always also a regular file:

Source

The final solution is a new isRegularFile() that checks if a file is regular before entering the getMachoInfo() so f.read() will not get stuck:

    def isRegularFile(self, file_path):
"""Check if the specified file is a regular file."""
try:
return stat.S_ISREG(os.stat(file_path).st_mode)
except (OSError, IOError) as e:
# print(f"Error checking file type for {file_path}: {e}")
return False

Here is the commit with a fix, and here is the proof that the tool is still fast:

Looking how simple the solution was and then at how much time it took me to find the issue, I really started to wonder if I should work in IT.

FINAL WORDS

The journey of fixing this infinite loop issue taught me a lot about filesystem nuances. If you work with Unix-based systems, be mindful of symlinks and diverse file structures — and ensure your code is as well.

Key Takeaways

  • Knowing the role of special files like symlinks, firmlinks, and device files in Unix-based systems is crucial for reliable tools. If not managed correctly, these can cause hangs and unexpected issues.
  • Always verify file types before reading them in directory-scanning scripts. As this article shows, checking if a file is regular can save your tool from hanging on device files or named pipes.
  • When code behaves unexpectedly, add print() to track progress. This often reveals the root cause faster than guessing.
  • Instead of fixed blacklists, implement adaptable and future-proof checks. Keep tools flexible in changing environments.

References

These three PDFs from Gergely Kalman are must read if you want to work with the macOS file system, especially for vulnerability research:

Here are some links related to file system programming:

Thanks for reading. I hope you learned something new! For more in-depth discussions and tips, feel free to explore my other blog posts.

Sssssssssstay tuned.

No responses yet

Write a response