Broken loops

I use shellcheck now. I came to realize in the past few weeks that some stuff in sh or bash can be weird. Once you figured them out it can feel obvious but not at first, really.

A problem

One of the weirdness of shell scripting appeared when I tried to do some work with the result of a find command. It made shellcheck angry.

For loops over find output are fragile. Use find -exec or a while read loop

For exemple if I do this:

touch file1 file2 'file 3'
find .

Here's the find output:

.
./file2
./file 3
./file1

Now in a for loop:

for file in $(find .); do echo $file; done

It will gives me:

.
./file2
./file
3
./file1

It's broken.

The reason is because of an internal variable called IFS, used to define what is a separator. By default, tabulations, spaces and new lines are all considered. Given the previous example, the string file 3 has a space, this is considered as 2 fields to work on.

A fix

The first fix you should think about if you work with find, is to use -exec. Depending on what you want to do with the output, it might still be a better solution to use a loop. If it makes sense, do that instead:

while IFS= read -r -d '' ITEM; do
  echo "${ITEM}"
  stat "${ITEM}"
  # do stuff
done < <(find . -name "*.md" -type f -print0)

Weird right ? This fix is proposed by shellcheck, and it uses some shell tricks.

An explanation

We still want to work over a find output. The find command is still there, but at the far end of the block, right next to it. All the find options -filter and output modifiers- will be whatever it needs to be, except for the line return to be changed for the NULL character. That's done with -print0.

The output to be worked on is inside <(). It's called Process Substitution. It'll transform the find output as a file, through what's called a file descriptor.

The feeding part of that “file” to the loop is done with < which is a shell redirection from a file -or a file descriptor here- (right hand side) to a command (here it's the while loop, left hand side).

Back at the beginning of the block, the while loop is built to use ITEM through the following command results:

IFS= read -r -d '' ITEM

It starts with IFS=. It's the internal field separator definition, which here is set as “nothing” and is being translated by the ASCII NULL character. This is why the find output needed to be formatted with -print0.

Then, read come along and will deal with any input given to it. Usually, read is used to process input from a user after pressing enter. Here, it's a bit different:

  • First we already have an input given by the shell redirection, feeding the output of find to the while loop.
  • We also explained to read that the delimiter is “nothing”, using -d '', instead of the newline that a user would trigger.
  • Doing so mean that read will rely on the IFS, that is also configured to only accept the ASCII NULL character as field separator.
  • And in case of backslashes in the output of find, read won't process them if you use -r.

Now you can just process whatever you want inside your loop as you would usually do.

Avatar
Julien Pericat
Linux Sysadmin, SysOps & DevOps friendly

Happily automating and putting things in containers.

Related