blog advertising is good for you


blog advertising is good for you
User login

Handling Filenames with Spaces in Bash

This is a quickie. On the Mac you regularly handle files with spaces in the Finder without issue, and even on the command line when you put quotes around it or let tab-completion escape it properly. However, if you try to do things in a shell script, like a for loop, on filenames that involve a space you’re going to hit a wall. For splits items on a space, regardless of it they’re quoted (if they’re stored in a variable). However, the read command does not. Observe.

find ~ -name '* *' | while read FILE
do
        echo $FILE rocks.
done

And that’s that. Run the command and pipe to the while stanza and it works like a charm.

Average rating
(0 votes)
About Adam Knight
Adam Knight's picture

Author Biography

Adam Knight is one of the founders of Mac Geekery and is a geek at heart. Programmer by day, hacker by night, his daily life revolves around the Macintosh platform, which he has been a user and programmer for since the early days of System 7 when his LCII replaced his Apple //c.

In-between tech jobs, he’s managed to learn the basics of any web hacker: PHP, MySQL, Perl, Apache, Linux, *BSD, and the intricacies of ./configure —prefix=~/bombshelter/. Today, codepoet is concentrating on blogging again, writing some software for the Mac by himself (including Notae) and for his company (such as Switchblade) and has a few other toys coming out soon.

Bug him over AIM or email [link fixed].

That example is faulty. Your problem does not show itself with the * operator, as the following will attest:

for file in *
do
    echo "$file" rocks
done

That will have the exact same output. Your problem is with a variable containing newline-delimited files, like such:

files=$'foo\nbar\nblah yarr\ngreen   blue'

In any case, that’s not necessary. If you set IFS to $'\n' then it will only split on newlines, not spaces. And you can enclose the entire thing in a subshell using ( foo ) to scope IFS to just that for loop.

Example:

(
    IFS=$'\n'
    for file in $files
    do
        echo "$file" rocks.
    done
)

Adam Knight's picture

The example you gave will, indeed, show the issue if there are files with spaces. If listing a folder with “Application Support” inside you’ll get:

Application rocks.
Support rocks.

As for the newlines, the variable does not contain them. The shell, or something, makes them spaces such that “echo $FILES” will give you a space-delimited list of names with a space. I suggest you test it.

cp

The variable does contain the newlines; I suspect you’re leaving off the quotes somewhere important.

Femto% string=$'a\nstring\nwith\nnewlines'
Femto% echo $string
a
string
with
newlines
Femto% touch $string
Femto% touch "a string with spaces"
Femto% ls -l  | cat
total 0
-rw-r--r--   1 jtl  staff  0 Feb  6 01:57 a
string
with
newlines
-rw-r--r--   1 jtl  staff  0 Feb  6 02:02 a string with spaces

We now have a file with newlines in its name, and one with spaces; the cat was required to keep ls from being helpful and filtering out unprintable characters.

Now, we try the original suggestion:
Femto% find . -name '*' | while read FILE ; do echo $FILE rocks.; done
. rocks.
./a rocks.
string rocks.
with rocks.
newlines rocks.
./a string with spaces rocks.

Well, that worked for files with spaces, but not with newlines, so it’s still dangerous.

Now, we try the simplest possible method, but with proper quoting, as Eridius suggested.

Femto% for FILE in * ; do echo "$FILE" rocks.; done                   
a
string
with
newlines rocks.
a string with spaces rocks.

Hey, that worked perfectly.

Adam Knight's picture

You misunderstand the original issue. I’m taking the output of find and looping over it. for will always break in this case and your example does nothing to resolve that. This is not a matter of quoting but a matter of where for breaks up results.

$ for FILE in `find Library/Application\ Support -name '* *'`; do echo "$FILE"; done
Library/Application
Support
Library/Application
Support/Adium
2.0
Library/Application
Support/Adium
2.0/Contact
List

Even if I saved the output of find in something like $FILES it would have the same effect. read, however, works perfectly fine there.

cp

The read solution isn’t a solution, though; it still fails on filenames with newlines. And the problem isn’t with for itself, it’s with word separation.

“Don’t use find” is one solution, as Eridius and I have shown, or “use find’s -0 safety option” as Rae showed.

Saying “For splits items on a space, regardless of it they’re quoted (if they’re stored in a variable).” is just incorrect and confusing.

Femto% string='a string with spaces'
Femto% for w in $string ; do echo "$w" ; done
a
string
with
spaces
Femto:~/t jtl$ for w in "$string" ; do echo "$w" ; done
a string with spaces
Femto% for w in 'a string with spaces' ; do echo "$w" ; done
a string with spaces

Obviously quotes do matter.

They matter on the echo line, too; without the quotes, the arguments to echo are resplit, losing any non-single-space separators:

Femto:~/t jtl$ string='a  string  with  multiple  spaces'
Femto:~/t jtl$ echo $string
a string with multiple spaces
Femto:~/t jtl$ echo "$string"
a  string  with  multiple  spaces
Femto:~/t jtl$ string=$'a\nstring\nwith\nnewlines'
Femto:~/t jtl$ echo $string
a string with newlines
Femto:~/t jtl$ echo "$string"
a
string
with
newlines
Adam Knight's picture

This really isn’t a matter for a holy war. Not using find is not the answer when find is the program you need to use and parse the results of. While filesystems allow new-lines in the filenames, it’s rare and stupid enough that I don’t care if it breaks on those files.

For the love of Steve, all I’m doing is showing a way to use find with filenames with spaces in it because for loops don’t work. It’s handy, it works, and when it’s needed it’s damned useful. Adding nulls and piping to xargs is handy when possible, but when you need to perform a number of tricks on a file that find found, that’s just not very appropriate. Possible, yes, but not appropriate.

This is an 80% solution. Hell, it’s a 95% solution. In this specific case, a for loop is a 0% solution because of how it interprets the output of find. Thus, this is better than nothing for what I was trying to do and I felt cause to share it. If you think something else is better then do something else.

To properly handle spaces, you need to make sure $FILE is double-quoted anywhere it is used. Inside the for-loop, it should say echo "$FILE". Otherwise, bash thinks that you are trying to pass multiple arguments to echo. The difference would be apparent if a filename had multiple consecutive spaces or if you were performing an operation like cp instead of echo.

Adam Knight's picture

Ahh, very true. My intent was to demo the looping more than run an actual command, but if you were running a command later you would certainly need the quotes around the variable.

cp

rae's picture

find . -name '* *' -print0 | xargs -0 -n 1 -I % echo % rocks

You need the "-print0" and "-0" to delimit things with zeros instead of spaces.

You need "-n 1" to do them one at a time instead of as many as will fit in a command line.

You need "-I %" to put the argument not at the end of the command,

This is the most robust and elegant solution for my purposes, thanks for the tip!

www.scrambledchannel.org

rae's picture

If my find piped to xargs solution doesn’t allow for complex enough operations, you can use a shell function. e.g.:

$ lsd() { for i in "$@"; do; ls -ld "$i"; done; }
$ find . -name '* *' -print0 | xargs -0 lsd

Note that no "-n 1" or "-I %" arguments to find are needed, since that is all handled inside the shell function, which can also span multiple lines if needs be.

rae wrote:

If my find piped to xargs solution doesn’t allow for complex enough operations, you can use a shell function. e.g.:

lsd() { for i in “$@”; do; ls -ld “$i”; done; } find . -name ‘* *’ -print0 | xargs -0 lsd

That does not work in a shell script sir.
(I really wish it did). Alas, it merely barks…

xargs: lsd: No such file or directory

Nice try.

you need to make the subroutine, as rae did, on the line before.

lsd() { for i in ā€œ$@ā€; do; ls -ld ā€œ$iā€; done; }

and make sure your shebang line reads #!/bin/bash.

First off, the suggestion to “not use find” was absolutely absurd. Get real.

Second, the context here is “ if you try to do things in a shell script” and that’s
somewhat different from just typing commands in Terminal. The problem with
piping to xargs -0 ( or even using -exec {} \; ) is that — in a shell script — they
expect some command in the environment’s $PATH to execute. But how can we
make a self-contained script… and have the find results sent to a function inside
the script??? Something like xargs -0 myScriptFunction doesn’t fly.

But… find blah blah blah | myScriptFunction where myScriptFunction() is built with:

while IFS= read -r inputListItem
do
      whatever we want with "$inputListItem"
done

does work.

As has been noted, the only “drawback” is filenames with newline chars (and,
as I’ve discovered \ backslashes as well) don’t parse properly. Not a mission
critical issue for most users, I wouldn’t imagine. This was a great tip… which
(unfortunately) got taken out of context.

FOR what it’s worth,

HI

EDIT: a little googling has revealed that adding the -r option to read
will cure the backslash problem. i.e., while read -r

EDIT#2: YIKES! Speaking of spaces: if an item’s name ends with a space,
read again misbehaves. More googling produced this sublime solution:

while IFS= read -r

Seems like the field separator only affects the one read line (not the entire do loop).

?

Wow, I bet Adam is sorry he brought this up. No good deed goes unpunished. Good discussion anyway.

I was able to use this information to solve my unix-newb problem, which was how to loop over files that included spaces in their names.

for f in c:/somepath/z*.html; do
md5sum “$f”
done

works fine for me, while my original

for f in ‘ls -1 c:/somepath/z*.shtml‘; do
md5sum “$f”
done

dies in flames due to the way ‘for’ splits.

Thanks everyone.

-==-
The real problem is that some supergenius decided to allow whitespace in file names in the first place Eye-wink

Hmmm…. I think there is still a problem when trying to script. Lets say that you are passing in an argument list, where the arguments are files that have spaces.
% do_something_to_files /Users/me/directory\ with\ spaces/file1 /Users/me/directory\ with\ spaces/file2

First, you would probably want to shift away $1. Your argument list, $*, would then have a bunch of text and spaces. How would you loop through the files?

No, that’s trivial:

do_something:

#!/bin/sh
for arg
do
    echo "$arg"
done

That’s it!

luke

I just tested the original assertion under bash on the Mac (and on Linux), since it sounded quite wrong to me.

The test confirmed that it was indeed wrong.

“for” does not break tokens with spaces. (The shell does this job, not each of the built-in commands, like “for”!) Also, the shell “tokenises” only before command execution and after variable expansion. It does it after variable expansion so that you can do things like: for f in $list …

In short, the string for f in * is broken into 4 tokens; the * is expanded into all the files; and that’s that. Filenames with spaces or whatever are not then further broken up.

You must quote any variable when you use it to avoid exposing yourself to this problem. That’s probably what tripped you up and confused you. I.e. it’s bad shell programming to ever use an unquoted variable, unless you specifically want the resulting variable to be tokenised when it has spaces.

You’ve made the thing into a much bigger problem than it really is!

luke

no good answers

need better shell

hmm, i didn’t know we had this many different users of this site

better shell, Hmm is zsh better?

It seems that several people came up with fine solutions, even options for using find, or not. what is the problem? These aren’t new questions, they are old, very well documented UNIX questions. Read the man page for find. Here are some other useful options for doing other things (not using loops, for quick solutions):

-delete - delete all found files
-execdir command {} \; - run specific command on every found file (in dir of file)
-okdir command {} \; - just like execdir, asking user if they are sure

here’s 2 ways to delete every file that has evil in the name:

find -name '*evil*' -delete
find -name '*evil*' -okdir rm {} \;

Look here: http://wooledge.org:8000/BashFAQ

find . -print0 | while read -d $’\0’ file; do mv “$file” “${file// /_}”; done

This discussion helped me achieve what I wanted.

Identify, Quantify, and then Eliminate dumb temporary files from Microsoft Office applications.

What are the files? (human readable list, newline delimited, assume no files with newlines in the name)

find . \( -name '*.TMP' -or -name '*.tmp' -or -name '~*.doc' -or -name '~*.wbk' -or -name '~*.xar'  -or -name '~*.dot'  -or -name '~*' \) -print > /tmp/cleanuplist.txt

How much space do these files consume?

( IFS=$'\n'; du -csh $(cat /tmp/cleanuplist.txt ) )

Delete them.

( IFS=$'\n'; rm $(cat /tmp/cleanuplist.txt ) )

For what it’s worth I owe a big debt of gratitude to the original author. When in a bind, that solution worked well for me in cygwin on Windows Vista. I didn’t understand any of the comments below that. Smiling

Post new comment
The content of this field is kept private and will not be shown publicly.
13 + 2 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.