Revealing Why 'find' with No Match and Combined with 'xargs' Can Still List Files
Problem Description Using alone to search for matching files or directories returns no matches, as follows: List the files under for demonstration[^1] Find dire
Problem Description
Using find alone to search for matching files or directories returns no matches, as follows:
- List the files under
/tmpfor demonstration[^1]
cd /tmp && ls -l
# cd /tmp/ && ls -l
total 64
srwxrwx--- 1 netdata netdata 0 Sep 1 19:08 netdata-ipc
drwxrwxr-x 2 portage portage 40 Sep 1 19:08 portage
drwxrwxr-x 2 root utmp 40 Sep 1 20:08 screen
-rw-r--r-- 1 root root 1351 Sep 2 00:00 tmp0cznslq0
-rw-r--r-- 1 root root 1315 Sep 3 00:00 tmp170scwam
-rw-r--r-- 1 root root 1259 Sep 3 00:00 tmp3a69_fzu
-rw-r--r-- 1 root root 1268 Sep 3 00:00 tmp6d_oxnxz
-rw-r--r-- 1 root root 1240 Sep 3 00:00 tmp7iyopakk
-rw-r--r-- 1 root root 1297 Sep 3 00:00 tmp7y_n8v26
-rw-r--r-- 1 root root 1328 Sep 2 00:00 tmp9rnmmbff
-rw-r--r-- 1 root root 1329 Sep 2 00:00 tmp9y8grm2d
-rw-r--r-- 1 root root 1289 Sep 3 00:00 tmpbzop6hd7
-rw-r--r-- 1 root root 1250 Sep 2 00:00 tmpe4isdmly
-rw-r--r-- 1 root root 1284 Sep 2 00:00 tmppuf5jmy_
-rw-r--r-- 1 root root 1237 Sep 2 00:00 tmpqf8tkzv2
-rw-r--r-- 1 root root 1291 Sep 3 00:00 tmpq_le14ux
-rw-r--r-- 1 root root 1259 Sep 3 00:00 tmpu2azbu5n
-rw-r--r-- 1 root root 1270 Sep 2 00:00 tmpxjfh_00o
-rw-r--r-- 1 root root 1259 Sep 2 00:00 tmpxp6854h3
- Find directories whose name contains
abc
find /tmp -type d -name "*abc*" -print
Result is empty.
But, if the result of the find above is piped to xargs and then listed with ls
find /tmp -type d -name "*abc*" -print | xargs ls -l
# find /tmp -type d -name "*abc*" -print | xargs ls -l
total 64
srwxrwx--- 1 netdata netdata 0 Sep 1 19:08 netdata-ipc
drwxrwxr-x 2 portage portage 40 Sep 1 19:08 portage
drwxrwxr-x 2 root utmp 40 Sep 1 19:08 screen
-rw-r--r-- 1 root root 1351 Sep 2 00:00 tmp0cznslq0
-rw-r--r-- 1 root root 1315 Sep 3 00:00 tmp170scwam
-rw-r--r-- 1 root root 1259 Sep 3 00:00 tmp3a69_fzu
-rw-r--r-- 1 root root 1268 Sep 3 00:00 tmp6d_oxnxz
-rw-r--r-- 1 root root 1240 Sep 3 00:00 tmp7iyopakk
-rw-r--r-- 1 root root 1297 Sep 3 00:00 tmp7y_n8v26
-rw-r--r-- 1 root root 1328 Sep 2 00:00 tmp9rnmmbff
-rw-r--r-- 1 root root 1329 Sep 2 00:00 tmp9y8grm2d
-rw-r--r-- 1 root root 1289 Sep 3 00:00 tmpbzop6hd7
-rw-r--r-- 1 root root 1250 Sep 2 00:00 tmpe4isdmly
-rw-r--r-- 1 root root 1284 Sep 2 00:00 tmppuf5jmy_
-rw-r--r-- 1 root root 1237 Sep 2 00:00 tmpqf8tkzv2
-rw-r--r-- 1 root root 1291 Sep 3 00:00 tmpq_le14ux
-rw-r--r-- 1 root root 1259 Sep 3 00:00 tmpu2azbu5n
-rw-r--r-- 1 root root 1270 Sep 2 00:00 tmpxjfh_00o
-rw-r--r-- 1 root root 1259 Sep 2 00:00 tmpxp6854h3
Note: no matches were piped from find to xargs, and yet all files under /tmp were listed.
The Question
- Using
findto look up files and then piping toxargstogether withrm -rfto delete — if the match fails, will it delete all files under the starting directory? Wouldn't that be dangerous??? - Using
findtogether with-exec rm -rfwould be relatively safer???
Verify the hypothesis
- Using
findtogether with-execto delete directories whose names containabc
find /tmp -type d -name "*abc*" -exec rm -rf {} \;
Result: as expected, nothing matched, nothing was deleted.
- Use
findcombined withxargsto runrm -rffrom/tmpand delete directories whose names contain abc
find /tmp -type d -name "*abc*" | xargs rm -rf
# find /tmp -type d -name "*abc*" | xargs rm -rf
#
# ls -l
total 64
srwxrwx--- 1 netdata netdata 0 Sep 1 19:08 **netdata-ipc**
drwxrwxr-x 2 portage portage 40 Sep 1 19:08 **portage**
drwxrwxr-x 2 root utmp 40 Sep 1 19:08 **screen**
-rw-r--r-- 1 root root 1351 Sep 2 00:00 tmp0cznslq0
-rw-r--r-- 1 root root 1315 Sep 3 00:00 tmp170scwam
-rw-r--r-- 1 root root 1259 Sep 3 00:00 tmp3a69_fzu
-rw-r--r-- 1 root root 1268 Sep 3 00:00 tmp6d_oxnxz
-rw-r--r-- 1 root root 1240 Sep 3 00:00 tmp7iyopakk
-rw-r--r-- 1 root root 1297 Sep 3 00:00 tmp7y_n8v26
-rw-r--r-- 1 root root 1328 Sep 2 00:00 tmp9rnmmbff
-rw-r--r-- 1 root root 1329 Sep 2 00:00 tmp9y8grm2d
-rw-r--r-- 1 root root 1289 Sep 3 00:00 tmpbzop6hd7
-rw-r--r-- 1 root root 1250 Sep 2 00:00 tmpe4isdmly
-rw-r--r-- 1 root root 1284 Sep 2 00:00 tmppuf5jmy_
-rw-r--r-- 1 root root 1237 Sep 2 00:00 tmpqf8tkzv2
-rw-r--r-- 1 root root 1291 Sep 3 00:00 tmpq_le14ux
-rw-r--r-- 1 root root 1259 Sep 3 00:00 tmpu2azbu5n
-rw-r--r-- 1 root root 1270 Sep 2 00:00 tmpxjfh_00o
-rw-r--r-- 1 root root 1259 Sep 2 00:00 tmpxp6854h3
Surprised? Not surprised? Nothing happened??? If you want to know why, read on.
The Reveal
By running two simple commands, the picture becomes clear at a glance.
cd /tmp && ls -l
The output won't be pasted—it's the same handful of content, no matter what.
cd /tmp && rm -rf
Then the answer to the mystery explored in this article will be:
The issue isn't that find mistakenly passes its stdin (which would be /tmp in this case) through the pipe to xargs when its own stdout is empty (i.e., no match), nor that xargs mistakenly treats find's stdin as its own stdin. Rather, it's how the trailing ls and rm of the xargs command handle empty arguments. When ls is given no file argument, it lists the files in the current directory (pwd); whereas rm simply returns 0 with no output.
If you want to be safe, run the find | xargs ls -l command in some other directory with no files:
# cd /root/Documents/
# ls -l
total 0
# find /tmp -type d -name "*abc*" -print | xargs ls -l
total 0
#
#
After mastering the core technology, faced with layers of mysteries, I won't be surprised at all—what's listed is the /root/Documents/ directory, definitely correct.
Digression
Q1. Earlier in this article, the directory was changed to /tmp in advance. In actual use, if the directory passed to find is different from the current working directory, is there a risk of deleting the wrong files?
Theoretically, this possibility does not exist.
Just to be safe, one more verification is needed:
# pwd
/root/Documents
# ls /tmp
netdata-ipc tmp170scwam tmp7y_n8v26 tmpe4isdmly tmpu2azbu5n
portage tmp3a69_fzu tmp9rnmmbff tmppuf5jmy_ tmpxjfh_00o
screen tmp6d_oxnxz tmp9y8grm2d tmpqf8tkzv2 tmpxp6854h3
tmp0cznslq0 tmp7iyopakk tmpbzop6hd7 tmpq_le14ux
#
# find /tmp -type d -name "*abc*" | xargs rm -rf
#
#
# ls /tmp
netdata-ipc tmp170scwam tmp7y_n8v26 tmpe4isdmly tmpu2azbu5n
portage tmp3a69_fzu tmp9rnmmbff tmppuf5jmy_ tmpxjfh_00o
screen tmp6d_oxnxz tmp9y8grm2d tmpqf8tkzv2 tmpxp6854h3
tmp0cznslq0 tmp7iyopakk tmpbzop6hd7 tmpq_le14ux
#
Sure enough, as expected. But: if by mistake a * is added after rm -rf, the consequences would be catastrophic.
Q2: When find's matches are piped to xargs, does it send them one at a time as they're found, or wait until all are found and send them in a batch? [2]
Send all at once.
Premise: find looking for regular files starting with "tmp" under /tmp will match multiple; if sent in one batch, the output is stacked together; if sent in multiple batches, the output will be a single file with multiple columns.
# ls tmp6d_oxnxz tmp7iyopakk
tmp6d_oxnxz tmp7iyopakk
#
#
# ls tmp6d_oxnxz; ls tmp7iyopakk
tmp6d_oxnxz
tmp7iyopakk
Verification:
find /tmp -type f -name "tmp*" | xargs ls # Note: it's ls, without the -l option
# find /tmp -type f -name "tmp*" | xargs ls
/tmp/tmp0cznslq0 /tmp/tmp7iyopakk /tmp/tmpbzop6hd7 /tmp/tmpq_le14ux
/tmp/tmp170scwam /tmp/tmp7y_n8v26 /tmp/tmpe4isdmly /tmp/tmpu2azbu5n
/tmp/tmp3a69_fzu /tmp/tmp9rnmmbff /tmp/tmppuf5jmy_ /tmp/tmpxjfh_00o
/tmp/tmp6d_oxnxz /tmp/tmp9y8grm2d /tmp/tmpqf8tkzv2 /tmp/tmpxp6854h3
#
In the output above, the matching files are stacked together in the listing—it's clear that the find pipe to xargs throws a batch of files at once.
Q3: When using find with the -exec parameter to operate on matches, does it process them one at a time, or batch them all and process at once?
Cut one down as they come.
Verification:
# find /tmp -type f -name "tmp*" -exec ls {} \;
/tmp/tmp7y_n8v26
/tmp/tmp3a69_fzu
/tmp/tmp6d_oxnxz
/tmp/tmpu2azbu5n
/tmp/tmpq_le14ux
/tmp/tmpbzop6hd7
/tmp/tmp7iyopakk
/tmp/tmp170scwam
/tmp/tmpe4isdmly
/tmp/tmppuf5jmy_
/tmp/tmpxjfh_00o
/tmp/tmp9rnmmbff
/tmp/tmp0cznslq0
/tmp/tmpxp6854h3
/tmp/tmpqf8tkzv2
/tmp/tmp9y8grm2d
The ls command was run without the -l flag, yet a single file occupied its own line — indicating FIFO (First In First Out — hope I'm showing off correctly) handling.
So the question is, if you're cleaning up files, find with -exec deletes them one by one, while xargs deletes them in batches. So:
- Time using
findalone: time to findnfiles + time to deletenfiles one by one. - Time using
findpiped toxargs: time to findnfiles + time to deletenfiles in one go.
So the question becomes: when using rm to handle files, is it faster to delete them one by one in a for loop, or to delete them all at once? Logically, should be a single sweep — feels cleaner. Using for calls extra functions, which must carry extra performance overhead. Of course, this is all guesswork; I have no intention of verifying it. Because right after, there's a "but."
But when you choose to use find to delete files, you usually face a relatively large or even massive amount of output. Whether rm can keep up performance-wise is one thing, but appending that many file paths directly after the rm command will simply error out — something like "args too long", I imagine.
Q4: Although find's -exec parameter passes matches one by one, what if I want to bundle them all in one go?
There is also a way. That is -exec command {} + parameters
Note: like the format of the -exec command ; argument, both ; and + need to be escaped with \.
Example:
# find /tmp -type f -name "tmp7*" -exec ls {} \+
/tmp/tmp7iyopakk /tmp/tmp7y_n8v26
#
Same familiar directories and familiar files. The change to tmp7* was meant to cut down the output — after all, padding the article with command output over and over eventually weighs on the conscience. Watch: replace ; with + and the whole thing goes in one sweep.
Leaving a thread open
Everyone says that when deleting massive numbers of files, using -delete is more efficient than -exec rm -rf and | xargs rm -rf. Could it be that find has some cache or message queue, where the main process feeds them in, and -delete pulls them out? It's a bit like those physics or math problems where water is poured into a pool and drained at the same time—I used to think those were utterly stupid. Could this actually be the clever part of it?
One More Thing
On September 12, a certain fruit company's annual fall launch event is just around the corner. I wonder whether this year's "one more thing" will bring any surprises. I won't wait—I'll set it up myself first.
On using -delete with -depth and -prune together
-depth:
- Process the files inside a directory before processing the directory itself;
- Using
-deleteautomatically activates-depth(perhaps because-deleteitself cannot delete a non-empty directory); - The manual recommends that when you intend to do a list-then-
-deleteoperation, you should explicitly specify-depth, to avoid unpredictable results (mastery of the core tech doesn't help here either);
-prune:
- Protect the specified directory from accidental operations.
- If it appears with
-depth, then-prunewill be invalid. - Because
-deleteimplies the-depthoption, using-deleteand-prunetogether also makes the latter ineffective. - Example:
find . -path ./src/emacs -prune -o -print(copied from the man page, not really used).
-delete
- Cannot delete non-empty directories (wondering if specifying
-depthwill automatically clear files before deleting the directory??) - Used with
-ignore_readdir_race, this parameter ignores errors when a file matched byfindhas already disappeared by the time the action is performed on it. It does not output an error, and does not set the command's return value to non-zero (0means success). If there are no other errors,-deletestill returns true.
评论Comments
加载中…Loading…
留下评论Leave a comment