Removing leading/trailing spaces in the shell
Posted on 2020-01-16 (Updated on 2021-12-13)
Whitespace causes lots of interesting issues with the command line - whether it is present in file names, arguments, or any other data flowing between commands. Quoting can help, but there's some very particular edge cases I've encountered in my scripts.
Leading and trailing whitespace
If you are piping output directly into another command, occasionally you can get whitespace characters before or after your data. You have to use quotes to make sure internal whitspace is preserved, but that also leaves any whitespace characters at the beginning and/or end of your data.
I use three methods of removing leading or trailing whitespace: echo, xargs, and sed.
The echo method involves echoing a value without quotes, and re-assigning it to a variable:
TEST=' lousy spaces! '
TEST="$( echo $TEST )"
echo "$TEST"
# Prints:
#lousy spaces!
Benefits:
- Does not call an external command - echo is a bash builtin
- Short and sweet
Drawbacks:
- Collapses internal spaces as well
- Doesn't lend itself to chaining/pipes
The xargs
method is my personal favorite, although it's a bizarre one.
echo ' lousy spaces! ' | xargs
# Prints:
#lousy spaces!
The xargs
command, as a side-effect, strips out leading and trailing
whitespaces.
If you think about what its doing, it makes sense; it's taking each
unescaped "word" string, and sending it to stdout
(or wherever). It doesn't
care about whitespace, so it gets truncated along the way.
Benefits:
- Can be part of a piped chain of commands
xargs
is part offindutils
, and installed by default on most Debian-based systems- Using a command that begins with 'x' automatically gives you a +1 to your charisma stats
Drawbacks:
- Collapses internal spaces as well
- Runs an external program
The sed
method is not something I've used personally. It's something sed
was
created for - stream editing - but seems like a costly solution to the problem.
It relies on regular expressions, which are usually slower than other forms of text manipulation. This method is probably plenty performant, but I tend to leave any regex as a last resort.
echo ' lousy spaces! ' | sed 's/^[[:space:]]\+//' \
| sed 's/[[:space:]]\+$//'
I've included the long [[:space:]]
format because it's compatible with non-GNU
sed
, but if you're rockin' the GNU toolchain, you can use \s
as a terse and
sensible replacement.
Benefits:
- Only strips spaces off of the beginning and end of a string, leaving multiple spaces inside intact
- Can be quickly adjusted to strip off other leading or trailing strings
Drawbacks:
- Not zero, not one, but two command invocations
- Regex, and all its baggage
sed
may not be installed by default on the system you're working on
The real solution would be to quit doing so much data processing in the shell, and move to an actual scripting/programming language. However, there's a delicate balance between what belongs in the shell, and what needs its own fully-fledged script; and these techniques have let me accomplish a lot in scripts that didn't need a full programming language backing it.
Update
hackerdefo has submitted several other excellent methods for your consideration. I'm particular fond of the awk implementation myself.
Note: The tr
method removes all spaces, not just leading and trailing.
echo -e " Fragmented Development " | tr -d "[:blank:]"
echo -e " Fragmented Development " | awk '{$1=$1};1'
echo -e " Fragmented Development " | ruby -pe 'gsub(/^\s+/, "").gsub(/\s+$/, $/)'
echo -e " Fragmented Development " | perl -plne 's/^\s*//;s/\s*$//;s/\s+/ /;'
Comments
You can have multiple pattern commands with sed, so you only need to call it once. If sed isn't installed the "actual" programming language might not either.Nat!