Using the find command

The find command is used to recursively locate files in a directory hierarchy. Since programmers and system administrators spend a great deal of time working with files, familiarity with this command can make each more efficient at the terminal.

The command is composed of four parts:

  • the command name
  • options that control how the command searches (optional)
  • the path(s) to search (required)
  • expressions (composed of “primaries) and “operators” that filter files by name (optional)

A primary is a switch, such as -name or -regex that may or may not have additional arguments. An operator is a primary such as -or and -not that combines expressions in logical ways.

Basic usage

Finding files by name is perhaps the most common use of the find command. Its output consists of all paths in which the file name appears within the directory structure it searches.

1
2
3
4
5
6
$ find . -name README.md
./node_modules/hexo-renderer-stylus/README.md
./node_modules/is-extendable/README.md
./node_modules/striptags/README.md
...
./README.md

NOTE: All terminal examples were generated within the directory structure of my blog, created by the Hexo static site generator.

The path given to find in the first argument is the path prepended to each result. The . path instructs find to search under the working directory and generate relative paths in the output. If an absolute path to the working directory is used, however, the full path will appear in results. Command substitution may be used to strike a compromise between brevity and more detailed output.

1
2
3
4
5
6
$ find `pwd` -name README.md
/Users/nicholascloud/projects/nicholascloud.com/node_modules/hexo-renderer-stylus/README.md
/Users/nicholascloud/projects/nicholascloud.com/node_modules/is-extendable/README.md
/Users/nicholascloud/projects/nicholascloud.com/node_modules/striptags/README.md
...
/Users/nicholascloud/projects/nicholascloud.com/README.md

If a filtering expression is omitted, find will return all file paths within its search purview.

1
2
3
4
5
6
7
$ find .
.
./scaffolds
./scaffolds/draft.md
./scaffolds/post.md
./scaffolds/page.md
...

Other scenarios

The find command can do much more than locate files by exact name, however. It can find files according to a wide swath of criteria in many different scenarios.

We may not know the exact name of the file for which we are searching

The -name primary supports wildcard searches.

  • an asterisk (*) can replace any consecutive number of characters
  • a question mark (?) can replace any single character
1
2
3
4
5
$ find . -name '*javascript*.md'
./source/_posts/javascript-frameworks-for-modern-web-dev.md
./source/_posts/l33t-literals-in-javascript.md
./source/_posts/historical-javascript-objects.md
./source/_posts/maintainable-javascript-book-review.md

By default, the -name primary is case-sensitive. To conduct a case-insensitive search, we can use the -iname primary.

1
2
3
4
5
6
$ find . -iname '*JAVA*.md'
./source/_posts/java-4-ever.md
./source/_posts/javascript-frameworks-for-modern-web-dev.md
./source/_posts/l33t-literals-in-javascript.md
./source/_posts/historical-javascript-objects.md
./source/_posts/maintainable-javascript-book-review.md

For more complex searches, we can use the power of regular expressions with the -regex and -iregex (case-insensitive) primaries.

NOTE: Use the -E option to specify that extended regular expressions should be used instead of basic regular expressions.

1
2
3
4
5
6
7
8
 $ find -E . -regex '.*package(-lock)?\.json'
./node_modules/hexo-renderer-stylus/package.json
./node_modules/is-extendable/package.json
./node_modules/striptags/package.json
./node_modules/babylon/package.json
...
./package-lock.json
./package.json

We may want to limit our results to a specific path, or a pattern that matches multiple paths

To filter results by a path mask, we can specify a pattern with -path and -ipath (case-insensitive). Both support the asterisk and question mark wildcards.

1
2
3
4
5
6
7
8
9
10
11
$ find . -path './node_modules/*/lib/*' -regex '.*hexo.*'
./node_modules/hexo-renderer-stylus/lib/renderer.js
./node_modules/hexo-renderer-marked/lib/renderer.js
./node_modules/hexo-generator-archive/lib/generator.js
./node_modules/hexo-migrator-wordpress/node_modules/async/lib/async.js
./node_modules/hexo-log/lib/log.js
./node_modules/hexo-generator-category/lib/generator.js
./node_modules/hexo-i18n/lib/i18n.js
./node_modules/hexo-pagination/lib/pagination.js
./node_modules/hexo-generator-index/lib/generator.js
./node_modules/hexo-util/lib/pattern.js

Using the -path primary does not change the top-level directory in which find begins its search; it merely filters results by sub-directory.

We may want detailed information about the files we find

To see detailed information about a file, in a manner similar to ls -l, the -ls primary may be appended to the list of primaries.

1
2
3
4
5
$ find . -name *.md -path '*/_posts/*' -ls
8636130637 8 -rw-r--r-- 1 nicholascloud staff 490 Oct 24 12:25 ./source/_posts/strange-loop-2010-video-release-schedule-posted.md
8637286945 8 -rw-r--r-- 1 nicholascloud staff 1570 Oct 24 12:45 ./source/_posts/god-mode-in-windows-7-not-as-cool-as-rise-of-the-triad.md
8636130716 8 -rw-r--r-- 1 nicholascloud staff 204 Oct 24 12:25 ./source/_posts/what-writing-fiction-taught-me-about-writing-software.md
...

(As an alternative to the -ls primary, the -exec primary may be used to invoke ls, or the xargs command may be used for the same purpose.)

We may want to stop descending into a hierarchy once we’ve found the file(s) for which we’ve searched

The -prune primary causes find to stop traversing a particular directory path once it has found a result that matches its expression. It will, however, continue to search at the same directory level as a found result for other potential matches.

1
2
3
4
5
6
 $ find . -regex '.*middleware.*' -prune
./source/_posts/new-appendto-blog-post-streams-and-middleware-in-strata-js.md
./node_modules/stylus/lib/middleware.js
./node_modules/hexo-server/lib/middlewares
./public/2013/06/new-appendto-blog-post-streams-and-middleware-in-strata-js
./.deploy_git/2013/06/new-appendto-blog-post-streams-and-middleware-in-strata-js

By using the diff tool and IO redirection we can compare the output of a “pruned” result set with the output of unpruned results to see what paths were omitted. For example, in the diff below, the remaining paths that matched /node_modules/hexo-server/lib/middlewares/* were omitted once /node_modules/hexo-server/lib/middlewares had been added to the result set.

1
2
3
4
5
6
7
8
9
10
11
12
$ diff <(find . -regex '.*middleware.*') <(find . -regex '.*middleware.*' -prune)
4,9d3
< ./node_modules/hexo-server/lib/middlewares/route.js
< ./node_modules/hexo-server/lib/middlewares/redirect.js
< ./node_modules/hexo-server/lib/middlewares/logger.js
< ./node_modules/hexo-server/lib/middlewares/gzip.js
< ./node_modules/hexo-server/lib/middlewares/header.js
< ./node_modules/hexo-server/lib/middlewares/static.js
11d4
< ./public/2013/06/new-appendto-blog-post-streams-and-middleware-in-strata-js/index.html
13d5
< ./.deploy_git/2013/06/new-appendto-blog-post-streams-and-middleware-in-strata-js/index.html

We may only want to search to a particular depth OR search beyond a particular depth

Several primaries control depth traversal, or how far find will go to locate results.

-maxdepth controls the path depth to which find will traverse before stopping.

1
2
3
4
5
$ find . -name *.css -maxdepth 3
./public/fancybox/jquery.fancybox.css
./public/css/style.css
./.deploy_git/fancybox/jquery.fancybox.css
./.deploy_git/css/style.css

-mindepth controls the path depth at which find will start to search.

1
2
3
4
5
6
$ find . -name *.css -mindepth 6
./node_modules/hexo/node_modules/hexo-cli/assets/themes/landscape/source/fancybox/jquery.fancybox.css
./node_modules/hexo/node_modules/hexo-cli/assets/themes/landscape/source/fancybox/helpers/jquery.fancybox-thumbs.css
./node_modules/hexo/node_modules/hexo-cli/assets/themes/landscape/source/fancybox/helpers/jquery.fancybox-buttons.css
./themes/landscape/source/fancybox/helpers/jquery.fancybox-thumbs.css
./themes/landscape/source/fancybox/helpers/jquery.fancybox-buttons.css

-depth specifies the exact depth at which find will search.

1
2
3
4
$ find . -name *.css -depth 5
./node_modules/async-limiter/coverage/lcov-report/prettify.css
./node_modules/async-limiter/coverage/lcov-report/base.css
./themes/landscape/source/fancybox/jquery.fancybox.css

We may want to find files that are newer/older relative to another file

The -newer primary will find files that are newer than the specified file by comparing the modification times of each.

1
2
3
4
5
6
7
$ ls -l source/_drafts/
-rw-r--r-- 1 nicholascloud staff 189 Nov 7 11:51:20 2018 the-importance-of-names.md
-rw-r--r-- 1 nicholascloud staff 353 Nov 7 11:50:49 2018 the-most-satisfying-thing.md
-rw-r--r-- 1 nicholascloud staff 10812 Nov 8 19:13:09 2018 using-the-find-command.md

$ find . -newer source/_drafts/the-importance-of-names.md -path '*_drafts*'
./source/_drafts/using-the-find-command.md

For more fine-grained control, use the -newer[XY] primary, where values of X and Y represent different kinds of file timestamps (see table below). The timestamp for X applies to the files that find evaluates; that of Y applies to the file path argument supplied for comparison

X/Y flags value
a access time
B inode creation time
c change time (file attributes)
m modification time (file contents)
t (y only) file is interpreted as a date understood by cvs(1)

For example, the command find . -neweram foo.txt will find all files that have a newer access time than the modification time of foo.txt.

For each X flag there are shortcut primaries that make a comparison against the modification time of the file argument.

  • -anewer compares the access time of each file in the result set to the modification time of the specified file.
  • -bnewer compares the inode creation time of each file in the result set to the modification time of the specified file.
  • -cnewer compares the change time of each file in the result set to the modification time of the specified file.
  • -mnewer compares the modification time of each file in the result set to the modification time of the specified file, and is identical to -newer.

In Unix-like systems, “everything is a file”, and these files have types. The find command can detect file type, and filter results accordingly. Regular files (for which we search most often) have a type of f; directories have a type of d. Block files – disks, for example – have a type of b.

In OSX it is easy to find all block files that represent disks (physical and logical).

1
2
3
4
5
6
7
8
9
$ find /dev -name 'disk*' -type b
/dev/disk0
/dev/disk0s1
/dev/disk0s2
/dev/disk1
/dev/disk1s2
/dev/disk1s3
/dev/disk1s1
/dev/disk1s4

The table below lists each file type that the find command may detect.

Flag Meaning
b block special
c character special
d directory
f regular file
l symbolic link
p FIFO
s socket

We may want to search for files that a particular user or group owns (or inversely, that are not owned by a known user or group)

Users and groups are identified by name and numeric ID on Unix-like systems. In OSX the id command tells me my user ID and group ID(s).

1
2
$ id
uid=501(nicholascloud) gid=20(staff)...

The find command accepts primaries that filter file results by user and/or group.

  • -uid <uid> and -user <username> filter results by owning user. If the argument to -user is numeric, and no group exists with that name, it is assumed to be a user ID.
  • -gid <id> and -group <groupname> filter results by owning group. The same caveat applies to groupname as username.

I write code for a website called OwlEyes.org, which is a PHP application served by the apache2 web server. If I search for files in my home directory owned by the www-data user (the typical apache2 user), I see some interesting results.

1
2
3
$ find . -user www-data
./projects/enotes/owleyesorg/app/logs/apache-error.log
./projects/enotes/owleyesorg/app/logs/apache-custom.log

Every other file in my project directory is owned by my user, but the apache2 log files are written by the web server, and are therefore owned by its user.

To find files that aren’t owned by any known user and/or group, the inverse primaries may be used.

  • -nouser <username> shows results that do not belong to a known user.
  • -nogroup <groupname> shows results that do not belong to a known group.

We may want to find empty files or directories

To find empty (0 byte files or directories with no files) files append the -empty primary to the find command. This can be useful, for example, to see what log files are empty on your system.

1
2
3
4
5
6
7
8
9
$ sudo find /var/log -empty
./appfirewall.log
./ppp
./alf.log
./apache2
./com.apple.xpc.launchd
./cups
./CoreDuet
./uucp

We may want to run a utility on the files identified by find

The -exec and -ok primaries may be used to run a command on each file in find‘s result set. The two primaries are identical but -ok will request user confirmation for each file before executing the specified command.

The syntax for executing a command with find is:

1
find <expression(s)> -exec <command> \;

The command is written in standard form, as you would type it in a terminal. If the string '{}' appears anywhere in the command, it will be replaced by the file path of each result as find iterates over them. Commands must be terminated by a \;. (The escape character is necessary when executing within a shell environment.)

The command find . -newer db.json -type f -exec cp '{}' ~/tmp \;:

  • starts in the current directory
  • finds files that were modified after db.json (the database file that stores blog post information)
  • finds files of type “regular file”
  • and copies each one to ~/tmp
1
2
3
4
5
6
7
8
$ ls -lh db.json
-rw-r--r-- 1 nicholascloud staff 2.6M Nov 8 19:15 db.json

$ find . -newer db.json -type f -exec cp '{}' ~/tmp \;

$ ls -l ~/tmp
-rw-r--r-- 1 nicholascloud staff 61B Nov 9 10:46 README.md
-rw-r--r-- 1 nicholascloud staff 14K Nov 9 10:46 using-the-find-command.md

Two corresponding primaries, -execdir and -okdir do the same thing as -exec and -ok, however '{}' is replaced with as many file paths as possible from the result set, making these primaries akin to xargs. For example, to archive files in a find result set, one could use -execdir to create a tarball.

1
2
3
$ find . -newer db.json -type f -execdir tar cvzf ~/tmp/back.tar.gz '{}' \;
a using-the-find-command.md
a README.md

We may want to format find‘s output

The output from find can be formatted in two ways.

By specifying the -print primary, the file path of each result in find‘s result set is printed to standard output, terminated by a newline. This is the way find displays results by default. However, some primaries, such as -exec, might not print each file to the terminal. The command find . -newer db.json -type f -print -exec cp '{}' ~/tmp \; will copy all files newer than db.json to ~/tmp, but the output will remain empty (the default behavior of the cp command). To force each file to be displayed, the -print primary may be added before -exec.

1
2
3
$ find . -newer db.json -type f -print -exec cp '{}' ~/tmp \;
./source/_drafts/using-the-find-command.md
./README.md

The -print0 primary creates a space-delimited string of all file paths, and can be useful when piping the output of find to xargs or some similar command that expects input in such a format.

By default primaries are combined and applied together to form an expression, but find supports two operators that change the way expressions are applied. If two expressions are separated by the -or operator, then they will be applied in a boolean OR fashion; results will be returned that match either expression, or both.

1
2
3
4
5
6
7
$ find . -name '*eclipse*' -or -name '*clean*'
./source/images/2011/10/eclipse-example.png
./source/images/2011/10/eclipse-example-150x127.png
...
./source/images/2011/08/clean-coders1.png
./source/images/2011/08/clean-coders-150x117.png
...

If the -not (or !) operator precedes an expressison, it will negate it and remove matching file paths from the result set.

1
2
3
4
5
6
7
8
9
$ find . -name '*eclipse*'
./source/images/2011/10/eclipse-example.png
./source/images/2011/10/eclipse-example-150x127.png
./source/images/2011/10/eclipse-example-2-300x97.png
./source/images/2011/10/eclipse-example-2.png

$ find -E . -name '*eclipse*' ! -regex '.*[0-9]+x[0-9]+.*'
./source/images/2011/10/eclipse-example.png
./source/images/2011/10/eclipse-example-2.png

(Recall that the -E option in the example above forces find to use extended regular expressions when evaluating the -regex primary.)

We may want to delete found files

While possible to use -execdir rm '{}' \; to delete files in a result set, find supports a shorter primary, -delete that accomplishes the same task. By default, -delete will not show output for each file that is removed; use the -print primary in conjunction with -delete to see which files were removed from the file system.

ES2015 Generators

I have written a guest blog post, “ES2015 Generators“, on the eNotes developer blog:

Recently I had the opportunity to re-write the content tree control that we use to manage content nodes in www.enotes.com. We’ve all worked with the DOM, which represents HTML nodes in a tree structure but has some challenging deficiencies and a relatively grumpy API, the tortures of which prompted me to take a stab at a smoother tree-like design. During this process I experimented with several approaches to managing nodes in a tree structure, the first of which was a “flat hierarchy” which is as obtuse as it sounds and didn’t get much traction. I then opted for the more traditional parent/child approach, but still wanted a way to treat an entire tree of nodes in a “flat” manner. The ES2015 generator function was my solution…

Read the rest of this article on the eNotes developer blog.

LaunchCode 101, Unit 1

The first unit of LaunchCode 101 is almost complete. Class size has probably halved (based on visual inspection, not exact numbers) since the beginning because, though the material is introductory, it is also challenging. Those who remain have made serious personal effort to stay on track, and the buds of knowledge are finally beginning to bloom. What’s really interesting to me is how different minds approach the same problems. Just tonight I reviewed three assignments wherein each student achieved the same goal by taking a unique approach. And all three were different than my solution. The class also stretches my communication skills. When I explain things to other seasoned programmers, I can make assumptions about the knowledge that we already share and talk at a higher conceptual level. For students who have no prior programming knowledge, it is necessary to build a hierarchy of concepts from the ground up. Talking about complicated things in simple terms is both trying and rewarding. This unit has also been more math intensive than I anticipated, which is good for *me* because I am mathematically weak. Working problems along with students gives me an opportunity to expand my own knowledge beyond the realm of code. And that’s music to a nerd’s ears. Other than some technical difficulties, LC101 has been a success. Everyone is ready for winter break, but I think students and TFs will return for a strong start in January.

The necessity and beauty of motion

I get lost in my own inner philosophical world when I ride my bike. I don’t know, maybe it’s the scenery. Maybe it’s just all that fresh air rushing by, or the pulse of blood through every inch of my body. It gets me thinking, whatever it is. And today, as I rode, I thought about motion. Have you ever tried to stand still on your bike and maintain balance? One might think that standing still is the safest way to be, that moving on a two-wheeled bit of aluminum at high speeds is the danger. But the truth is that you only have stability when you’re moving. When you stop, no matter how hard you try to stay upright, you’ll be putting a foot down to stop the fall eventually. Maybe life is a bit like that. Standing still is deceptively comfortable, but it’s really the most unstable state in which one can be. Movement, motion, forward momentum–those are the soul’s lusts. I tried to imagine a world with no motion, and it came to me that all of our senses–sight, smell, touch, taste, hearing–all of them require motion to convey anything to our minds. You only feel what moves against your skin. You only smell what wafts your way. Sight is light mercilessly barraging your eyeballs and hearing is vibrations and pressure changes stroking your eardrum. The chemical interactions that produce vibrant flavors is taste to your tongue. All of it requires motion. So motion gives us stability and knowledge of the world. And attempts to stop motion are frowned on by nature. Inertia compels us in the directions we move. When I ride the switchbacks to the river my brakes remind me the whole way down how much work they do to stop my forward motion. It makes me feel unstoppable and free, and I think that is something humans need to feel often.

JavaScript Frameworks for Modern Web Dev

JavaScript Frameworks for Modern Web DevI am thrilled to announce the publication of my newest book, JavaScript Frameworks for Modern Web Dev, available now from APress! A year ago my friend and former co-worker Tim Ambler approached me with a project: a list of strong frameworks and libraries in the JavaScript space that can each be effectively introduced in a single chapter. Over the last twelve months we wrote numerous drafts and revisions, and created a somewhat frightening amount of source code for these sixteen topics:

  1. Bower
  2. Grunt
  3. Yeoman
  4. PM2
  5. RequireJS
  6. Browserify
  7. Knockout
  8. AngularJS
  9. Kraken
  10. Mach
  11. Mongoose
  12. Knex and Bookshelf
  13. Faye
  14. Q
  15. Async.js
  16. Underscore and Lodash

Obviously, the scope of each topic is greater than its chapter, but the book’s goal is to be a quick but thorough introduction to the core concepts in each framework and library. The source code is non-trivial and executable, so a reader can see concepts in action while following along in the text. Some of the technologies covered have aged (like fine wine!), and some are much younger, but we believe that each has staying power and stands well among its peers. Between us we have used all of these technologies in our own projects and in production deployments, and while we cannot claim complete expertise, we humbly submit that, to the best of our knowledge, the information here is both sound and practical. We sincerely hope that this book brings much value to you and your team!

Learning Gulp? Start here.

My only beef with Bleeding Edge Press is that I was asked to review Developing a gulp Edge after I had already spent weeks pouring over documentation, github issues, and StackOverflow questions ad nauseam while trying to learn Gulp on my own. Developing a Gulp Edge is the book with which I should have started.

This book covers all the Gulp basics by having the reader create and run standard tasks in a demo application, publicly available on github. Anyone familiar with build systems will find common case examples aplenty in this book, from SASS compilation, JavaScript concatenation, linting, watching, etc. Each example clearly covers the gulp plugins necessary to accomplish each task, and examples build on each other as the reader progresses.

The real value of this book, however, lies in the fact that it helps the reader navigate a rapidly shifting Gulp landscape by identifying community-blessed alternatives to blacklisted plugins, such as gulp-browserify. This, more than anything else, caused me a great deal of confusion when I initially started using gulp. Google search results lack historical context, so articles that often appeared first in search results actually delivered bad or outdated advice. This book avoids all that by keeping the reader on the straight-and-narrow, by explaining why certain practices have been abandoned in favor of others.

The last three chapters are particularly strong. They move the reader beyond basics into advanced Gulp automation. This includes tasks which enrich the development environment itself, such as running a development server, syncing browsers across devices, and automatic cache busting. The reader is also shown how to speed up builds with caches, so that continuous integration processes and watches run as fast as possible.

The last chapter is a full tutorial on building custom Gulp plugins. This is where the authors really lets it all hang out, as the reader is introduced to vinyl File objects, through streams and buffer manipulation techniques. Once the reader has finished implementing each example he is left with a useful plugin accompanied by unit tests, a code coverage implementation, and a Travis CI configuration–all project features that the Gulp community strongly encourages, nay outright demands, of any Gulp plugin author or contributor.

Developing a Gulp Edge has a single Appendix containing a long and very useful list of Gulp plugins for most any development task imaginable. I will refer to this Appendix often, and with fondness.

While this book does have a few spelling and grammatical errors its overall style is clean, friendly, and concise, and the content is structured quite well. Each chapter, each example, builds on previous knowledge in such a way that reader has a strong grasp on what Gulp is, what it does, and how to leverage it for daily use.