Linux

Revealing Why 'find' with No Match and Combined with 'xargs' Can Still List Files

Problem Description Using alone to search for matching files or directories returns no matches, as follows: List the files under for demonstration[^1] Find dire

Problem Description

Using find alone to search for matching files or directories returns no matches, as follows:

  1. List the files under /tmp for demonstration[^1]
cd /tmp && ls -l
# cd /tmp/ && ls -l
total 64
srwxrwx--- 1 netdata netdata    0 Sep  1 19:08 netdata-ipc
drwxrwxr-x 2 portage portage   40 Sep  1 19:08 portage
drwxrwxr-x 2 root    utmp      40 Sep  1 20:08 screen
-rw-r--r-- 1 root    root    1351 Sep  2 00:00 tmp0cznslq0
-rw-r--r-- 1 root    root    1315 Sep  3 00:00 tmp170scwam
-rw-r--r-- 1 root    root    1259 Sep  3 00:00 tmp3a69_fzu
-rw-r--r-- 1 root    root    1268 Sep  3 00:00 tmp6d_oxnxz
-rw-r--r-- 1 root    root    1240 Sep  3 00:00 tmp7iyopakk
-rw-r--r-- 1 root    root    1297 Sep  3 00:00 tmp7y_n8v26
-rw-r--r-- 1 root    root    1328 Sep  2 00:00 tmp9rnmmbff
-rw-r--r-- 1 root    root    1329 Sep  2 00:00 tmp9y8grm2d
-rw-r--r-- 1 root    root    1289 Sep  3 00:00 tmpbzop6hd7
-rw-r--r-- 1 root    root    1250 Sep  2 00:00 tmpe4isdmly
-rw-r--r-- 1 root    root    1284 Sep  2 00:00 tmppuf5jmy_
-rw-r--r-- 1 root    root    1237 Sep  2 00:00 tmpqf8tkzv2
-rw-r--r-- 1 root    root    1291 Sep  3 00:00 tmpq_le14ux
-rw-r--r-- 1 root    root    1259 Sep  3 00:00 tmpu2azbu5n
-rw-r--r-- 1 root    root    1270 Sep  2 00:00 tmpxjfh_00o
-rw-r--r-- 1 root    root    1259 Sep  2 00:00 tmpxp6854h3
  1. Find directories whose name contains abc
find /tmp -type d -name "*abc*" -print
Result is empty.

But, if the result of the find above is piped to xargs and then listed with ls

find /tmp -type d -name "*abc*" -print | xargs ls -l
# find /tmp -type d -name "*abc*" -print | xargs ls -l
total 64
srwxrwx--- 1 netdata netdata    0 Sep  1 19:08 netdata-ipc
drwxrwxr-x 2 portage portage   40 Sep  1 19:08 portage
drwxrwxr-x 2 root    utmp      40 Sep  1 19:08 screen
-rw-r--r-- 1 root    root    1351 Sep  2 00:00 tmp0cznslq0
-rw-r--r-- 1 root    root    1315 Sep  3 00:00 tmp170scwam
-rw-r--r-- 1 root    root    1259 Sep  3 00:00 tmp3a69_fzu
-rw-r--r-- 1 root    root    1268 Sep  3 00:00 tmp6d_oxnxz
-rw-r--r-- 1 root    root    1240 Sep  3 00:00 tmp7iyopakk
-rw-r--r-- 1 root    root    1297 Sep  3 00:00 tmp7y_n8v26
-rw-r--r-- 1 root    root    1328 Sep  2 00:00 tmp9rnmmbff
-rw-r--r-- 1 root    root    1329 Sep  2 00:00 tmp9y8grm2d
-rw-r--r-- 1 root    root    1289 Sep  3 00:00 tmpbzop6hd7
-rw-r--r-- 1 root    root    1250 Sep  2 00:00 tmpe4isdmly
-rw-r--r-- 1 root    root    1284 Sep  2 00:00 tmppuf5jmy_
-rw-r--r-- 1 root    root    1237 Sep  2 00:00 tmpqf8tkzv2
-rw-r--r-- 1 root    root    1291 Sep  3 00:00 tmpq_le14ux
-rw-r--r-- 1 root    root    1259 Sep  3 00:00 tmpu2azbu5n
-rw-r--r-- 1 root    root    1270 Sep  2 00:00 tmpxjfh_00o
-rw-r--r-- 1 root    root    1259 Sep  2 00:00 tmpxp6854h3

Note: no matches were piped from find to xargs, and yet all files under /tmp were listed.

The Question

  1. Using find to look up files and then piping to xargs together with rm -rf to delete — if the match fails, will it delete all files under the starting directory? Wouldn't that be dangerous???
  2. Using find together with -exec rm -rf would be relatively safer???

Verify the hypothesis

  1. Using find together with -exec to delete directories whose names contain abc
find /tmp -type d -name "*abc*" -exec rm -rf {} \;
Result: as expected, nothing matched, nothing was deleted.
  1. Use find combined with xargs to run rm -rf from /tmp and delete directories whose names contain abc
find /tmp -type d -name "*abc*" | xargs rm -rf
# find /tmp -type d -name "*abc*" | xargs rm -rf
#
# ls -l
total 64
srwxrwx--- 1 netdata netdata    0 Sep  1 19:08 **netdata-ipc**
drwxrwxr-x 2 portage portage   40 Sep  1 19:08 **portage**
drwxrwxr-x 2 root     utmp      40 Sep  1 19:08 **screen**
-rw-r--r-- 1 root     root     1351 Sep  2 00:00 tmp0cznslq0
-rw-r--r-- 1 root     root     1315 Sep  3 00:00 tmp170scwam
-rw-r--r-- 1 root     root     1259 Sep  3 00:00 tmp3a69_fzu
-rw-r--r-- 1 root     root     1268 Sep  3 00:00 tmp6d_oxnxz
-rw-r--r-- 1 root     root     1240 Sep  3 00:00 tmp7iyopakk
-rw-r--r-- 1 root     root     1297 Sep  3 00:00 tmp7y_n8v26
-rw-r--r-- 1 root     root     1328 Sep  2 00:00 tmp9rnmmbff
-rw-r--r-- 1 root     root     1329 Sep  2 00:00 tmp9y8grm2d
-rw-r--r-- 1 root     root     1289 Sep  3 00:00 tmpbzop6hd7
-rw-r--r-- 1 root     root     1250 Sep  2 00:00 tmpe4isdmly
-rw-r--r-- 1 root     root     1284 Sep  2 00:00 tmppuf5jmy_
-rw-r--r-- 1 root     root     1237 Sep  2 00:00 tmpqf8tkzv2
-rw-r--r-- 1 root     root     1291 Sep  3 00:00 tmpq_le14ux
-rw-r--r-- 1 root     root     1259 Sep  3 00:00 tmpu2azbu5n
-rw-r--r-- 1 root     root     1270 Sep  2 00:00 tmpxjfh_00o
-rw-r--r-- 1 root     root     1259 Sep  2 00:00 tmpxp6854h3

Surprised? Not surprised? Nothing happened??? If you want to know why, read on.

The Reveal

By running two simple commands, the picture becomes clear at a glance.

cd /tmp && ls -l

The output won't be pasted—it's the same handful of content, no matter what.

cd /tmp && rm -rf

Then the answer to the mystery explored in this article will be:

The issue isn't that find mistakenly passes its stdin (which would be /tmp in this case) through the pipe to xargs when its own stdout is empty (i.e., no match), nor that xargs mistakenly treats find's stdin as its own stdin. Rather, it's how the trailing ls and rm of the xargs command handle empty arguments. When ls is given no file argument, it lists the files in the current directory (pwd); whereas rm simply returns 0 with no output.

If you want to be safe, run the find | xargs ls -l command in some other directory with no files:

# cd /root/Documents/
# ls -l
total 0
# find /tmp -type d -name "*abc*" -print | xargs ls -l
total 0
#
#

After mastering the core technology, faced with layers of mysteries, I won't be surprised at all—what's listed is the /root/Documents/ directory, definitely correct.

Digression


Q1. Earlier in this article, the directory was changed to /tmp in advance. In actual use, if the directory passed to find is different from the current working directory, is there a risk of deleting the wrong files?

Theoretically, this possibility does not exist.

Just to be safe, one more verification is needed:

# pwd
/root/Documents
# ls /tmp
netdata-ipc  tmp170scwam  tmp7y_n8v26  tmpe4isdmly  tmpu2azbu5n
portage      tmp3a69_fzu  tmp9rnmmbff  tmppuf5jmy_  tmpxjfh_00o
screen       tmp6d_oxnxz  tmp9y8grm2d  tmpqf8tkzv2  tmpxp6854h3
tmp0cznslq0  tmp7iyopakk  tmpbzop6hd7  tmpq_le14ux
#
# find /tmp -type d -name "*abc*" | xargs rm -rf
#
#
# ls /tmp
netdata-ipc  tmp170scwam  tmp7y_n8v26  tmpe4isdmly  tmpu2azbu5n
portage      tmp3a69_fzu  tmp9rnmmbff  tmppuf5jmy_  tmpxjfh_00o
screen       tmp6d_oxnxz  tmp9y8grm2d  tmpqf8tkzv2  tmpxp6854h3
tmp0cznslq0  tmp7iyopakk  tmpbzop6hd7  tmpq_le14ux
#

Sure enough, as expected. But: if by mistake a * is added after rm -rf, the consequences would be catastrophic.


Q2: When find's matches are piped to xargs, does it send them one at a time as they're found, or wait until all are found and send them in a batch? [2]

Send all at once.

Premise: find looking for regular files starting with "tmp" under /tmp will match multiple; if sent in one batch, the output is stacked together; if sent in multiple batches, the output will be a single file with multiple columns.

# ls tmp6d_oxnxz tmp7iyopakk
tmp6d_oxnxz  tmp7iyopakk
#
#
# ls tmp6d_oxnxz; ls tmp7iyopakk
tmp6d_oxnxz
tmp7iyopakk

Verification:

find /tmp -type f -name "tmp*" | xargs ls  # Note: it's ls, without the -l option
# find /tmp -type f -name "tmp*" | xargs ls
/tmp/tmp0cznslq0  /tmp/tmp7iyopakk  /tmp/tmpbzop6hd7  /tmp/tmpq_le14ux
/tmp/tmp170scwam  /tmp/tmp7y_n8v26  /tmp/tmpe4isdmly  /tmp/tmpu2azbu5n
/tmp/tmp3a69_fzu  /tmp/tmp9rnmmbff  /tmp/tmppuf5jmy_   /tmp/tmpxjfh_00o
/tmp/tmp6d_oxnxz  /tmp/tmp9y8grm2d  /tmp/tmpqf8tkzv2  /tmp/tmpxp6854h3
#

In the output above, the matching files are stacked together in the listing—it's clear that the find pipe to xargs throws a batch of files at once.


Q3: When using find with the -exec parameter to operate on matches, does it process them one at a time, or batch them all and process at once?

Cut one down as they come.

Verification:

# find /tmp -type f -name "tmp*" -exec ls {} \;
/tmp/tmp7y_n8v26
/tmp/tmp3a69_fzu
/tmp/tmp6d_oxnxz
/tmp/tmpu2azbu5n
/tmp/tmpq_le14ux
/tmp/tmpbzop6hd7
/tmp/tmp7iyopakk
/tmp/tmp170scwam
/tmp/tmpe4isdmly
/tmp/tmppuf5jmy_
/tmp/tmpxjfh_00o
/tmp/tmp9rnmmbff
/tmp/tmp0cznslq0
/tmp/tmpxp6854h3
/tmp/tmpqf8tkzv2
/tmp/tmp9y8grm2d

The ls command was run without the -l flag, yet a single file occupied its own line — indicating FIFO (First In First Out — hope I'm showing off correctly) handling.

So the question is, if you're cleaning up files, find with -exec deletes them one by one, while xargs deletes them in batches. So:

  • Time using find alone: time to find n files + time to delete n files one by one.
  • Time using find piped to xargs: time to find n files + time to delete n files in one go.

So the question becomes: when using rm to handle files, is it faster to delete them one by one in a for loop, or to delete them all at once? Logically, should be a single sweep — feels cleaner. Using for calls extra functions, which must carry extra performance overhead. Of course, this is all guesswork; I have no intention of verifying it. Because right after, there's a "but."

But when you choose to use find to delete files, you usually face a relatively large or even massive amount of output. Whether rm can keep up performance-wise is one thing, but appending that many file paths directly after the rm command will simply error out — something like "args too long", I imagine.


Q4: Although find's -exec parameter passes matches one by one, what if I want to bundle them all in one go?

There is also a way. That is -exec command {} + parameters

Note: like the format of the -exec command ; argument, both ; and + need to be escaped with \.

Example:

# find /tmp -type f -name "tmp7*" -exec ls {} \+
/tmp/tmp7iyopakk  /tmp/tmp7y_n8v26
#

Same familiar directories and familiar files. The change to tmp7* was meant to cut down the output — after all, padding the article with command output over and over eventually weighs on the conscience. Watch: replace ; with + and the whole thing goes in one sweep.

Leaving a thread open

Everyone says that when deleting massive numbers of files, using -delete is more efficient than -exec rm -rf and | xargs rm -rf. Could it be that find has some cache or message queue, where the main process feeds them in, and -delete pulls them out? It's a bit like those physics or math problems where water is poured into a pool and drained at the same time—I used to think those were utterly stupid. Could this actually be the clever part of it?

One More Thing

On September 12, a certain fruit company's annual fall launch event is just around the corner. I wonder whether this year's "one more thing" will bring any surprises. I won't wait—I'll set it up myself first.

On using -delete with -depth and -prune together

  1. -depth:
  • Process the files inside a directory before processing the directory itself;
  • Using -delete automatically activates -depth (perhaps because -delete itself cannot delete a non-empty directory);
  • The manual recommends that when you intend to do a list-then--delete operation, you should explicitly specify -depth, to avoid unpredictable results (mastery of the core tech doesn't help here either);
  1. -prune:
  • Protect the specified directory from accidental operations.
  • If it appears with -depth, then -prune will be invalid.
  • Because -delete implies the -depth option, using -delete and -prune together also makes the latter ineffective.
  • Example: find . -path ./src/emacs -prune -o -print (copied from the man page, not really used).
  1. -delete
  • Cannot delete non-empty directories (wondering if specifying -depth will automatically clear files before deleting the directory??)
  • Used with -ignore_readdir_race, this parameter ignores errors when a file matched by find has already disappeared by the time the action is performed on it. It does not output an error, and does not set the command's return value to non-zero (0 means success). If there are no other errors, -delete still returns true.

  1. In Linux, everything is a file. Henceforth in this article, "file" means both files and directories; no need to belabor that point. ↩︎

  2. For how xargs further processes the stdin from the pipe (whether by column or by line), refer to the Linux Man page. ↩︎

N
norvyn

独立 iOS 开发者,写字的人。在一座有海的城市,慢慢地做一些小而确定的东西。An independent iOS developer and writer — slowly making small, certain things in a city by the sea.

评论Comments

加载中…Loading…

留下评论Leave a comment