2025-02-09

Offline Wikipedia (and more!) with Kiwix

Kiwix is a multi-platform content browser that is designed to support offline access to large content websites, like Wikipedia or Stack Overflow, in under-developed countries. These sites have a significant amount of content, and are invaluable for researchers, professionals, or hobbyists. Offline access guarantees that the wealth of knowledge they contain is available in unpredictable circumstances, or when online access is not guaranteed, such as during power outages, long airplane trips, remote ventures, etc. More importantly, to the paranoid data hoarder, Kiwix offers an opportunity to scratch the itch of possessing the Library of Alexandria, served from a spare budget computer squirreled away in the dark corner of your basement.

The Kiwix server reads ZIM) files (a file format designed to store large-scale Wiki websites) and serves their content over HTTP to any connecting browser. The Kiwix organization provides pre-packages ZIM files in an online library. Because content size can be quite large (for example, a full Wikipedia ZIM file is ~100GB), different ZIM configurations exist for various sources. For example, a paired down Wikipedia exists without images which reduces the file size while preserving valuable written content. A configured Kiwix server can serve content from a variety of ZIM files.

Kiwix installers exist for Android, Windows, Mac, iOS, Linux and Raspberry Pi. A small suite of command line tools can be downloaded for headless servers, and a Docker image is available for machines running Docker containers.

For my personal setup I installed the command line tools on a spare Debian server I run for experimental projects. In my setup:

the Kiwix server is run automatically when the system is fully booted, launched as a systemd service
the server runs under a special kiwix system user account, and only has access to the directory that contains the Kiwix library file (an XML catalog of available ZIMs) and the downloaded ZIM files

In Debian it is relatively easy to create a system user and directory for the Kiwix libraries:

$ sudo groupadd --gid 200 kiwix
$ sudo adduser --system --no-create-home --disabled-password --disabled-login --uid 200 --gid 200 kiwix
$ sudo mkdir -p /var/local/kiwix
$ sudo chown -R kiwix:kiwix /var/local/kiwix

Now to download some ZIMs. I browsed the Kiwix online library and found the Wikipedia ZIM, and downloaded the file to my data directory. Since the download was sizable, I used aria2 to download the ZIM file because of its support for resume-able downloads. I throttled the download to keep bandwidth usage low (the download server is provided gratis, after all).

1
2
3

$ cd /var/local/kiwix
$ sudo -u kiwix aria2c --max-download-limit=512K \
  https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2024-01.zim

Once the ZIM file finished downloading, I created a library file with kiwix-manage and added the Wikipedia file as an entry. (By convention the library file created by kiwix-manage should be named library_zim.xml.)

1 2	$ sudo -u kiwix kiwix-manage /var/local/kiwix/library_zim.xml \ add /var/local/kiwix/wikipedia_en_all_maxi_2024-01.zim

Later I experimented with adding additional ZIM in exactly the same way. (To remove a ZIM file from the library, the remove keyword is used instead of add.)

Before setting up my systemd service, I ran the kiwix-serve command directly to see if I could actually connect to my Kiwix process.

1 2	$ sudo -u kiwix kiwix-serve --library --port=8000 --verbose \ /var/local/kiwix/library_zim.xml

I pointed my desktop web browser at my Debian server (http://192...:8000) and verified that my Wikipedia instance could be browsed.

Wikipedia, along with other ZIMs I installed later.

Browsing the Wikipedia ZIM file.

Having verified that my Kiwix instance was configured correctly, I created a systemd service file to launch the instance at machine boot, running as the kiwix user…

1	$ sudo vim /etc/systemd/system/kiwix.service

…which looks like this:

[Unit]
Description=Kiwix service
After=network.target network-online.target

[Service]
Type=simple
Restart=always
RestartSec=15
User=kiwix
Group=kiwix
WorkingDirectory=/var/local/kiwix
ExecStart=/usr/bin/kiwix-serve --library --port=8000 --verbose /var/local/kiwix/library_zim.xml

[Install]
WantedBy=multi-user.target

The User, Group, and ExecStart options control how the kiwix-serve process starts. The After and WantedBy options control when, and under what conditions, the service is actually started – in this case after the server has network access, and when it is running in multi-user mode (so normal operating conditions).

The systemd daemon then needed to be told about this new service, and the service itself, enabled.

$ sudo systemctl daemon-reload
$ sudo systemctl enable kiwix.service
$ sudo systemctl start kiwix.service
$ sudo journalctl -u kiwix

The journalctl command shows the systemd journal output of the service during initialization. If the kiwix-serve command failed for any reason, or if there was a path/permission issue, it was recorded here. (I did have a few false starts.) Refreshing my desktop browser confirmed that the service was actually running.

Overall Kiwix is a very neat project, and I’m particularly impressed that service runs so fast while reading data from a single (or multiple!), enormous file. Unfortunately the data refresh period seems to be a bit slow. The latest full Wikipedia ZIM file is over a year old, which is an eternity in the information age. But for someone who loves to read and references Wikipedia often, having offline information available in a pinch is nevertheless a luxury.

2023-02-24

Teaming.com - ProductHunt launch

Today my company, Teaming has launched our application on Product Hunt.

When I evaluate new technologies and tools for my own use, I like to look for a few things.

First, which tools give me the simplest and most obvious means to accomplish my tasks and achieve my goals? Second, how easy is it for me to discover additional functionality once I’ve mastered a tool’s basics? And finally, does a tool free me by removing obstacles or hinder me by spawning more?

At Teaming, we ask ourselves these same questions about the software we produce.

We recognize the real work isn’t what happens in the machine, it’s what happens between the members of healthy teams, the relationships they build, and the trust they establish with each other. We strive to help people win the war of attrition they fight every day against the unnecessary busy-work that dilutes natural chemistry and personal connections.

You don’t need Yet Another App hogging your diminishing screen real-estate. You need a real force multiplier that helps you do heavy lifting.

Our software focuses on your teammates, how they work, and what makes them tick. It provides resources for conducting meaningful one-on-ones, and tools for measuring overall team health as it varies over time. Goals, key results, and action items turn decisions into measurable action. Our rich collection of meeting templates and topics keep your team focused and reduce ambiguity in meetings.

Archimedes claimed that the simplicity of the lever would let him move the world. We think the simplicity of Teaming can move your peers, team, and organization closer to real success.

Kick the tires, and let us know what you think!

2023-01-05

ChatGPT isn't wrong...

A better plot than anything coming out of Hollywood right now.

2022-09-21

Finding files with different extensions

In a previous blog post I wrote about using the find command to search a file system. I recently had need to locate a number of ebook files with different extensions in my personal library, and wanted a quick way to enumerate them all with a single script. On BSD and OSX systems, the find command processes regular expressions differently than on Linux systems, so using it wouldn’t work for me as I wanted my script to work on both OSX and Linux. Fortunately Alvin Alexander’s blog post “Using the Linux ‘find’ command with multiple filename patterns” pointed me in the right direction. The -name filter can be chained to create the serarch pattern I needed:

1	find . -type f $ -name "pdf" -o -name "epub" -o -name "*mobi" $ > index.txt

2022-05-11

Unix bin directories

Understanding the bin, sbin, usr/bin, usr/sbin split is both a fun stroll through UNIX history and also a good illustration about how far-reaching decisions are sometimes made based on short-term limitations, creating long-term complexity where it otherwise might not exist. This is probably unavoidable, but nudges me toward the conclusion that all decisions – behind which there may be serious conviction, or the best of intentions – should be reviewed on some routine basis (once a generation? every other generation?) to determine if their underlying prompts have changed significantly.

Even Paul Atredies struggled with future decisions. And he was Muad’dib!

2022-04-14

The Jester

by: dialup (Nicholas Cloud), AO The Citadel, St. Charles MO

(This article originally appeared in the March 27th edition of the St. Louis F3 Accelerator. F3 is a “national network including 3,198 free, peer-led workouts for men in 242 regions [whose] mission is to plant, grow and serve small workout groups for men for the invigoration of male community leadership.” All opinions expressed in this article are my own.)

Doing hard things to gain small improvements over time is a core F3 principle. It is a core masculine principle, fundamental to building the character of a man. And there is no harder thing than building good character through constant vigilance against one’s Jester – that inner seductive voice that assures a man that short-term indulgences will produce long-term happiness and fulfillment in his life.

From a man’s perspective, the Jester is immortal. It exists as long as a man is above ground, breathing. It is an unwelcome but consistent companion along life’s entire journey. And it is a very old companion. The Jester has an enormous advantage over a man because it relies on a few fundamental biological and psychological truths to spin its yarn.

Humanity is relatively immature: we claim to rationally value things like temperance, compassion, charity, etc., but are biologically ill-suited for these values. The parts of a man’s brain that are responsible for long-term thinking and planning are relatively young and weak (in the scope of evolutionary history), and the parts of his brain that are responsible for short-term gratification are very old and powerful. Instead of temperance, our bodies value excess; instead of compassion, war; instead of charity, exploitation. Emotion and desire are such fundamental parts of the human decision-making process, that James Clear devotes entire chapters on how to game them to change negative behavior in his book Atomic Habits[1]. Our psychologies favor immediate certainties over eventual possibilities, and the Jester exploits this characteristic.

The Jester also exploits the brain’s chemical reward system. Certain actions lead to significant biological gratification. Any man who indulges in promiscuity, pornography, alcohol, weed, unnecessarily risky behavior, excessive media consumption, etc. does so because there is a very visceral sense of pleasure gained from each of these. That sense of pleasure is the brain’s way of rewarding behavior that it has summarily blessed as valuable. In these situations, the Jester has tricked the brain, however. The brain “thinks” it is rewarding sexual connection, culinary prowess, bravery, and intelligence – it assumes each behavior is exercised for a man’s benefit, because it assumes a) that a man’s life is “nasty, brutish, and short”[2], and 2) that any pleasure a man can get from life should be rewarded because it gives him an incentive to enjoy living and reproduce as much as possible (in the face of very depressing and demotivating circumstances).

And because the brain is a jerk, it intentionally decreases the amount of reward doled out for rewardable actions, so that we work for them more often, to a more intense degree, to recapture the same euphoric feelings we experienced when we first acted. This gives us the motivation to double-down on good behaviors: the more we exercise, the more we have to exercise to achieve the next gain. The more we have sex with our Ms, the more we desire sex with our Ms (and the more refined our technique becomes). But this system also has a dark side. When we listen to the Jester and trick the brain with negative behaviors, the more we have to perform those negative behaviors to keep the brain’s rewards coming. And because these negative behaviors are destructive, our obsession becomes our own demise. It is a terrible form of insanity.

The brain’s reward system is maladapted to long lifespans and prosperity. That puts it directly at odds with modern living. Every day demands long-term planning and deliberate lifestyle choices which require significant mental focus. And those choices must be more important to us than what the Jester promises in lieu of them. A man’s character is a composite of his choices and actions, guaranteed through repetition. Henry Hazlitt, an early 20th century author, concluded that will-power over the Jester is “the desire to become a certain type of character.”

“When popular language says that a man is the master of his own desires, that he holds them in leash and under his control, it means that this desire to be a certain kind of character is at all times vivid and powerful enough to be acted upon in preference to any other fleeting or recurrent desire that may beckon him… if your ability to refuse to yield to [a] particular impulse becomes in your mind a challenge to and a test of your entire character, you have thrown into the scale a mighty force to ensure your taking the right action.”[3]

Denying the Jester is not a negative pattern of abstinence, then, but a positive matter of construction; of constructing the kind of character that ensures the long-term success of a man, his family, and his community. This is a large task, and the Jester uses this fact to overwhelm us with a sense of impossibility. Indulging in a small vice seems comparably insignificant and much less daunting than “building character”. But this is misdirection. Character is an accumulation of consistent decisions and actions, made one at a time. The only decision that matters in the moment is the next one we make. This is how our character gets 1% better, and our Jester gets 1% weaker, every day.

Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones, James Clear
The philosopher Thomas Hobbes understood that the general mode of living for many throughout history has been difficult and ugly.
The Way to Will-Power, Henry Hazlitt

2021-04-10

HyperNormalisation (2016) by Adam Curtis

I’ve watched a few BBC documentaries by Adam Curtis now, and I have a lot of respect for his ability to articulate ideas, and delve into history. A friend sent HyperNomalisation my way a little while ago. I watched it last night, and this is my initial impression.

Curtis starts in the mid-70s and moves up to present day, bouncing back and forth between economic changes in New York, and political changes in the Middle East. The two seem completely unrelated, until he starts to develop his thesis: that those in power made a conscious decision to move away from the old political norms to a mode of “managed perception”, in which they represent reality as something other than it really is, to keep those under their power in a constant state of unease, and disorientation (which of course prevents them from actually questioning or challenging power). It’s a fascinating thesis, and he does a good job of presenting it.

I wrote my friend:

I watched HyperNormalisation last night. Jesus there’s a LOT to unpack. I’m going to have to watch it again and jot down some notes. It REALLY helped fill in my knowledge of Middle East politics and American involvement. I knew bits and pieces of some things, but not how it all fit together. Curtis does a great job connecting dots. While I’m not sure about his interpretation of all the details, the general premise of “managed perception” is something that’s been fomenting in the back of my mind for a while, I just didn’t have a term for it, much less a theory of how it came about or was being systematically leveraged, but it makes total sense. Especially in light of the 9/11 and Iraq War years. How many WTF moments I had during that period where things just did not line up, but everyone in power just pretended they did. Makes total sense now.
His interpretation of Trump’s campaign was interesting too (and the history of Trump’s dealings in NY real estate). I always just assumed Trump lied because, well that’s just what he did (and it was obvious that he did). Thinking of it as a genuine, concrete strategy though is very interesting. Another thing that occurred to me is that manipulating a certain group of people by giving them a specific impression of the world does not mean that the opposite impression of the world is correct. In fact, the opposite is probably just as bad as the intended. Curtis made a point to bring up the complexity of the real world, and how people just don’t know how (or want) to deal with it. So taking that as a given, we could say that if an idea or concept seems “too simple” or “too straightforward”, there’s a good reason to be suspicious of it – it is either an over-generalization, or calculated, managed perception. Very interesting. I’ll write you again after I watch it a second time. Thanks again for recommending!

One piece of advice: take the time to watch the whole thing (almost 3 hours) in one sitting. Trust me, it won’t seem like 3 hours. But you need to carry his continuity of thought without interruption or you’ll probably lose the plot and have to backtrack.

If you are interested in history (in this case, America’s history in the Middle East), politics, and culture, I highly recommend it.

As a follow-up, his documentary The Century of the Self is also excellent.

2021-01-31

Goodbye Evernote

I’ve been a paying Evernote user for years; and before that, a “free” Evernote user for even longer. Evernote has some seriously powerful features, among which is the Evernote Web Clipper extension available on all major browsers, the excellent PDF markup features, flawless sync across devices, etc. It is solid software, backed by a solid service.

But I’ve left Evernote, likely for good.

In the wake of efforts by “Big Tech” companies to censor, deplatform, or control the data that belongs to customers, I’ve been cutting ties with as many Big Tech companies as I’m able. And Evernote, though I have never experienced any negative service from them, Evernote does not offer end-to-end encryption for it’s services (meaning notes stored on Evernote servers are accessible by Evernote employees), and that has become a deal-breaker for me. I value my privacy, and my personal notes are where I can explore ideas and record my thoughts. And I don’t want to use any service that doesn’t respect and protect that.

But what to do with GIGABYTES of notes, clipped articles, recipes, photos, and annotated PDFs?

When I ditched GMail for ProtonMail, it took me months to dig through all my archived mail. I deleted, exported, or printed each piece, then had to update the sender’s settings with my new email address (or unsubscribe, if it was a newsletter). I transferred all of my contacts to ProtonMail, then reviewed them all to put the exported information into ProtonMail’s custom contact fields. It was a long, tedious, tiresome process, but I did it.

I expected my transition away from Evernote to be just as challenging. I came up with a list of goals for my new notebook scheme:

My notes should be plain text (well, technically Markdown) files that link to any relevant external assets, such as images, PDF files, etc.
Markdown files and assets should be organized in a uniform way.
Markdown files should have a uniform naming convention (all lower-case words, separated by hyphens).
I should be able to easily search for note content.
My notes should be available on multiple devices.
I should be able to clip content from the web and easily add it as a Markdown file to my notebook.

Step 1: Get my notes out of Evernote

There are two Evernote applications you can use on your device: the slick, newer version, and the old, legacy version. The new version looks nice, but it dropped a significant feature I thought would be available to me, and that is the ability to export all notes at once in HTML format. Even the legacy version seems to lack this feature (maybe just on OSX?), though in the legacy version you can still export individual notebooks as HTML collections. In the newer version you can only export notebooks as Evernote’s own ENEX file format (a kind of XML archive of notebook content). This seemed like it was going to be a show-stopper, since I had no clue how I would convert ENEX files into Markdown files, but a friend pointed me to the excellent utility evernote2md which does exactly that. Since I had 132 individual notebooks, it took me a while to export them all, then to convert them to Markdown with this utility, but once done, I had all of my saved notes in Markdown format (along with their attachments).

The total size of my exported Markdown notes and attachments is around 2GB.

Step 2: Fixing file names

I noticed two things pretty quickly after my initial export:

Most Markdown file names were derived from the Evernote note title, which means they were typically in Title Case with underscores for space separators.
MANY Markdown files, for one reason or another, had leading or trailing underscores.
Some Markdown files – mostly ones clipped from Reddit threads – had strange naming conventions, e.g., _r_<subreddit-name>_<super-long-post-title>. This is because Evernote automatically uses the webpage title tag as the title of a note imported with it’s web clipper.
MANY Markdown files were named “untitled-XX.md” (where XX is some number). Did these notes not have titles?
MANY Markdown files had the word undefined randomly peppered through the file name, e.g., A_Historyundefined_and_Timeline_of_the_World.md. (I later realized this only occurred immediately preceding the word and, of which I speculate that the ampersand was actually used in the original title and evernote2md has a bug that does not translate it correctly.)

So I had my work cut out for me. The first thing I decided to tackle was normalizing the file name case and space separator concerns. I prefer all lower-case file names, with hyphens for space separators. I hacked together a simple node.js script to traverse all of my exported notebooks and make this change.

// npm install globby
const globby = require("globby");
const path = require("path");
const renameSync = require("fs").renameSync;
const execSync = require("child_process").execSync;

const dirs = [
    // top-level notebook names omitted for REASONS
].map(dir => path.join(__dirname, dir, "**/*.md"));

(async () => {
    const filePaths = await globby(dirs);
    const fixedPaths = filePaths.map(filePath => {
        const pathDir = path.dirname(filePath);
        const pathName = path.basename(filePath, ".md");
        let newPathName = (pathName.replace(/[^\w\d-]/g, "") + ".md").toLowerCase();
        newPathName = newPathName.replace(/_/g, "-");
        const newPath = path.join(pathDir, newPathName);
        return {
            oldPath: filePath,
            newPath,
        };
    });
    fixedPaths.forEach(fixedPath => {
        console.info(`fixing path ${fixedPath.oldPath}...`);
        try {
            renameSync(fixedPath.oldPath, fixedPath.newPath);
        } catch (e) {
            console.error(e);
            process.exit(1);
        }
    });
    console.info("all done.");
    process.exit(0);
})();

This script worked well and addressed the first naming problem – all files names are lowercase, and instead of underscores, hyphens delimit words – but there were still problems to address.

My initial gut instinct was to begin modifying my script to handle the remaining naming problems, but that just made me tired, so I turned to the INTERNET to figure out if there was a better way to do this.

Sweet Baby Jesus there is.

There is a wonderful utility called perl-rename that uses Perl’s regular expression engine to bulk rename files in-place. It’s very similar to how Vim performs find/replace, and it helped me solve two of my other problems in quick order.

Getting rid of undefined

To get rid of the pesky word undefined in my note file names, I used the find command to traverse my entire notebook structure, find all the Markdown files that contained that word, then pass along those file paths to the perl-rename utility which renamed the file without its troublesome intruder.

1 2	cd $NOTEBOOK find . -iname "undefined.md" -exec perl-rename --verbose --dry-run -- 's/undefined//g' '{}' \;

The actual heavy lifting is done in the substitution string: s/undefined//g, which reads like this: <substitute>/<the word undefined>/<with nothing>/<anywhere in the file name (globally)>.

(Note that the --dry-run flag will show you what would happen if the perl-rename command succeeded; to actually make the changes permanent the flag must be removed from the command.)

So far so good – no more undefined in file names. What about leading and trailing spaces? Easy peasy.

1
2
3

cd $NOTEBOOK
find . -iname "*.md" -exec perl-rename --verbose --dry-run -- 's/^-//' '{}' \;
find . -iname "*.md" -exec perl-rename --verbose --dry-run -- 's/-$//' '{}' \;

Again, the magic is in the substitution.

In the first command, the substitution reads: <substitute>/<a dash at the beginning of the file name>/<with nothing>. (The caret ^ symbol represents the beginning of a series of characters.)
In the second command, the substitution reads: <substitute>/<a dash at the end of the file name>/<with nothing>. (The dollar sign $ symbol represents the end of a series of characters.)

Now for those pesky Reddit notes. Since I’d eliminated leading dashes in file names, clipped notes from Reddit would now have a file name like r-<subreddit>-<note-title>. I still wanted to know these notes were from Reddit, so I decided the following substitution was best.

1 2	cd $NOTEBOOK find . -iname "r-*.md" -exec perl-rename --verbose --dry-run -- 's/^r-/reddit-/' '{}' \;

The substitution reads (as you probably know by now): <substitute>/<an r- at the beginning of the file name>/<with reddit- >.

But Nick, what about all those untitled-XX.md notes?

I’m glad you asked. There’s nothing to do with those notes but manually examine them and rename them according to their content. Which would absolutely be a pain in the ass if not for the terminal file manager ranger.

Step 3: Renaming untitled notes

Ever since I watched Luke Smith demonstrate the ranger file manager I’ve had major boner for it, and wanted a real chance to kick its tires. The challenge of renaming all these untitled files gave me the opportunity.

Briefly, ranger is a terminal file manager that emulates some of Vim’s modal editor behavior. For example, to move through directories you use the home-row keys h, j, k, and l. To run commands you press the colon key, then enter the command name. It’s both sexy and dangerous, and since I’m a kinky guy it was love at first sight.

Since I had never used ranger for any serious file system work before, this was a great way to get used to its navigation controls and command capabilities. I quickly figured out that the home row was my navigation center, but ALSO that I wanted to move through pages of files at a time rather than just hitting j and k repeatedly. Turns out if you hold shift and hit those same keys ranger will move you half-page at a time. Excellent. I traversed each notebook and used ranger’s find command – hitting / followed by a file name string – to quickly jump to the first instance of a file named untitled.... Ranger has a great file preview pane that immediately let me inspect the contents of each file, from which I could easily determine what the real file name should be. Renaming each file was easy enough – I typed the command :rename <new-file-name> and that did the trick. If I perchance needed to edit the file, I simply hit the l key to enter the file itself, which opened my default text editor (set by the EDITOR and VISUAL environment variables) for immediate access. Quitting the editor returned me immediately to ranger. Hitting the n key repeated my search. And so it went, until I had renamed all untitled-XX.md files in each notebook directory.

Occasionally I realized that a note I was viewing in ranger really needed to be in another directory (notebook). So I initiated an external shell command by typing ! (alternatively I could have typed :shell) and then typed my typical shell command: mv <file-name> <other-directory>/.

All without leaving ranger.

Step 4: Prune unused assets

By far the bulk of the disk space in each notebook is allotted to assets attached to notes – be they images, or PDFs, or audio files. Markdown files, being plain text, require little space to store – but assets, being binary, are pigs.

When I exported my notes to markdown, evernote2md created two directories in each notebook for assets: file and image. This was uniform across notebooks, which worked to my advantage. After I exported my notes I started rummaging through each notebook directory, purging notes that were either no longer important, or too badly mangled by the export process to be of any value. But how to remove their assets as well? I hacked together another node.js script to help me find assets that were no longer referenced by any notes in a given notebook.

#!node
const path = require("path");
const execSync = require("child_process").execSync;

const args = process.argv.slice(2);
const assetDir = path.resolve(args[0] || "."); // e.g., file, or image -- assume this script is executed in an asset directory
const mdDir = path.resolve(args[1] || "..");

const lsResults = execSync(`ls ${ assetDir }`).toString().split("\n").filter(n => !!n);

const noResults = [];
lsResults.forEach(a => {
  const cmd = `grep -c -H -l "${ a }" ${ mdDir }/*.md`;
  let grepResults = [];
  try {
    grepResults = execSync( cmd ).toString().split("\n").filter(n => !!n);
  } catch ( e ) {
    // empty
  }
  console.info(">>", a);
  console.info(grepResults);
  if ( grepResults.length === 0 ) { // does not appear in any file
    noResults.push( a );
  }
});

console.info( noResults );
if ( noResults.length > 0 ) {
  console.info(`to remove - rm ${ noResults.join(" ") }`);
}

This script uses the grep command to determine if an asset filename appears in the text content of any note; if it does not, it is included in output at the end that builds up a long rm command string that can be copied and then run to eliminate unused assets for a given notebook directory.

The grep command flags are important here:

-c means to generate a count of the matching lines in a file for the given search string (in this case, the asset file name)
-H means to print the file name in which the match occurred
-l means to restrict output to matching file names only (instead of matching lines within a file)

This combination of flags produces one line per search file that will only be present if the asset name is found within the file, allowing the script to know how many times the asset itself is referenced. If it isn’t referenced at all, it’s safe to delete. And so it goes.

This process is still a work in progress. As I review each notebook, I’m pruning its assets, and keeping track of those I’ve completed.

Step 5: Add front-matter to notebooks

Several Evernote alternatives (e.g., Boostnote) and many static website generators use YAML metadata markup at in Markdown files to render them appropriately. This front-matter appears at the top of the file, and follows a schema similar to the following:

---
link:
title:
description:
keywords:
author:
date:
publisher:
stats:
tags:
---

My exported Evernote notes do not have this front-matter, but as will be demonstrated later, it is critically important for targeted note searches.

So since you’re wondering, yes, I did hack together another script to inject this front-matter into every existing Markdown note within a notebook directory.

#!node
const path = require("path");
const execSync = require("child_process").execSync;
const { writeFileSync, readFileSync } = require("fs");

const args = process.argv.slice(2);
const mdDir = path.resolve(args[0] || "."); // assume this command is in a notebook directory


const frontMatter = `
---
link:
title: <title>
description:
keywords:
author:
date:
publisher:
stats:
tags:
---
`.trim();

const capitalize = (s) => {
  return s.charAt(0).toUpperCase() + s.slice(1);
};

const lsResults = execSync(`ls ${ mdDir }/*.md`).toString().split("\n").filter(n => !!n);
console.info(lsResults);

lsResults.forEach(m => {
  const fileContent = readFileSync( m ).toString();
  if ( fileContent.startsWith("---\n") ) {
    console.info(`${ m } has front matter, skipping...`);
    return;
  }

  const fileName = path.basename( m );
  const formattedFrontMatter = frontMatter.replace(
      '<title>',
      capitalize(fileName.replace(/-/g, " ").replace(".md", ""))
  )

  const newContent = `${ formattedFrontMatter }\n${ fileContent }`;

  writeFileSync( m, newContent )
});

Since most notes already had filenames derived from their Evernote titles, I took advantage of that fact and turned those filenames into the note’s front-matter title – sans hyphens, and with sentence case. It’s rough, I know, but better than nothing. The rest of the information I will have to add manually. The most important fields to me are title, author, and tags. (Are tags different than keywords? I don’t know.) On these fields – and the note’s file name – I will most frequently perform targeted searches.

Step 6: Searching for notes

Searching for files by name is easy. If I want to search for a file with the word taxes in it, I simply use the find command:

1 2	cd $NOTEBOOK find . -iname "taxes"

This will give me a list of file paths in which the word taxes appears. I try to name my notes intelligently so this kind of search can be productive. But sometimes I want to be more specific.

In that case I can rely on the tags front-matter that I’ve added to each note. For example, I have a recipe for a mixed drink that my brother recommends. I’ve tagged this mixed drink with alcohol, and can quickly find it using the ack command (you could use grep as well, but I prefer ack):

1
2
3

$ ack '^tags.*alcohol.*'
alcohol/super-complex-highly-rewarding-concotion-to-drink.md
10:tags: alcohol, don-julio, grand-marnier, kombucha, cocktail

This simple command reveals that the file I’m looking for is alcohol/super-complex-highly-rewarding-concotion-to-drink.md.

In fact, I could use ack to search for any front-matter field, simply by using the correct search expression. (The observant reader will notice that the expression resembles those I used when renaming files with perl-rename. The syntax is very similar.) In this case, the search expression reads: files that contain a line that beings with 'tags' (^tags) followed by any other characters (.*) but ALSO that has the world 'alcohol' in it, followed by any other characters (.*).

If I want to cast a wider net, I can also use ack to search for any term that occurs in any of my notes with the simple command: ack <search-term>.

Step 7: Creating new notes

Now that my notes are exported, cleaned, organized, and “front-mattered”, how do I add new notes to my notebooks?

Adding a new Markdown file is as simple as using your favorite text editor to save a file with the .md extension. Because the evernote2md export favored file and image directories for external assets, I use those same conventions for my own notes. If I had a notebook directory called economics, for example, and I had a note called the-history-of-economics.md, I might reference assets like this:

The Author Adam Smith wrote the seminal work, The Wealth of Nations.

<!-- this is an image of Adam Smith -->
![Picture of Adam Smith](image/adam-smith.png)

<!-- this is a link to the das-kapital.pdf file -->
Later, Karl Marx challenged Adam Smith's ideas in his work, [Das Kapital](files/das-kapital.pdf).

Now, the vast majority of my notes are articles clipped from the Internet with the Evernote Web Clipper. As I’ve stated, this is one of the strongest features of Evernote, and the one I’ll probably miss the most.

However, I’ve since discovered the clean-mark npm package which will do the exact same thing a) by exporting a web page in well-formatted Markdown, and b) adding front-matter by default. This is now my go-to method of snipping articles from the Internet. The only caveat is that all assets referenced by an article will not be downloaded, but instead referenced by their individual Internet URLs. If images or external files are an integral part of an article, it will be up to me to download them manually and adjust the links accordingly.

Step 8: Accessing notes from multiple devices

So far I’ve entertained two methods for accessing my notes on multiple devices.

I use the MegaSync cloud storage service to back up and synchronize files across devices. It is, by far, the best cloud storage service I’ve used. Mega has clients that work on Windows, OSX, Linux, and Android – which is awesome since I have devices that run each of those. (Also supports iOS but I have an Android phone so I don’t care :D ). Synchronizing notes and files is flawless – the only downside is that the Android client does not render Markdown files, or show their plain text content, which obviously makes it an non-ideal mobile client solution. This is my only real gripe about Mega’s mobile offering. It’s 2021: everything should render Markdown.

As an alternative, I’ve also contemplated using Github to manage all of my notes. I am very familiar with Git and letting it manage versions of my files and track individual commits is very appealing to me. Synchronizing across devices is trivial, as Github’s web interface will render Markdown files (including embedded images) in any web browser – mobile or not. My only hesitation is that Github (unlike Mega) does not offer end-to-end encryption (my original issue with Evernote) which does not offer me the measure of privacy I desire.

This is the last big issue I need to solve before I have a complete Evernote replacement that meets all of my needs.

Conclusion

Leaving Evernote has been an adventure, but I’ve learned a lot along the way – mostly that the tools I need to achieve my values are already within my reach, and they demand nothing but the time to learn. It’s amazing how much of our personal lives we heft into the “cloud”, to Big Tech services that don’t actually give a crap about our privacy, and will use our own data against us when we don’t bring our thoughts in line with whatever pre-established narrative to which they beat their drums. If you don’t control your data, you roll the dice on ever more tenuous odds.

Reclaiming my data – and making it my own again – has been one of the most humanizing experiences I’ve had in a long time. I hope this inspires others to embark on a similar quest, for freedom – for knowledge – for autonomy.

2020-08-04

Black Rednecks and White Liberals by Thomas Sowell

This morning I finished reading Thomas Sowell’s book Black Rednecks and White Liberals.

Sowell has spent a lifetime (he’s 90 years old) studying the root causes of racial tension throughout the world, especially in the United States. Written in 2005, this book is far more relevant today – fifteen years later – than when he first wrote it. But Sowell saw the fomenting racial tensions in this country and vocally protested the policies and ideas that have continued to push us all into corners defined by the past, instead of securing solutions in the present. He offers solutions – not based on popular slogans or emotional sentiment – but based on the hard facts of history; based on how other peoples in other times successfully adapted to each other to make the lives of each more prosperous. If we can’t do that, he warns, the result will be pain and misery for all.

“While the lessons of history can be valuable, the twisting of history and the mining of the past for grievances can tear a society apart. Past grievances, real or imaginary, are equally irremediable in the present, for nothing that is done among living contemporaries can change in the slightest the sins and the sufferings of generations who took those sins and sufferings to the grave with them in centuries past. Galling as it may be to be helpless to redress the crying injustices of the past, symbolic expiation in the present can only create new injustices among the living and new problems for the future, when newborn babies enter the world with pre-packaged grievances against other babies born the same day. Both have their futures jeopardized, not only by their internal strife but also by the increased vulnerability of a disunited society to external dangers… To be relevant to our times, history must not be controlled by our times. Its integrity as a record of the past is what allows us to draw lessons from it.”

2020-06-10

Scott Hanselman is Wrong

If you’ve done any Microsoft development in the last two decades you probably know the name Scott Hanselman, and are probably familiar with his blog at hanselman.com. I used to enjoy reading Hanselman’s articles, back when I wrote code for the Microsoft platform, and generally considered him to be a pretty even-keeled individual with generally insightful thoughts and balanced opinions.

That was back before virtue signaling was all the rage of course. Now it appears that he’s fallen into the trap of proving how woke he is by showing us how to change our initial git repository name from master to main.

Why make this change, you ask?

I’ll quote from his blog:

The Internet Engineering Task Force (IEFT) points out that “Master-slave is an oppressive metaphor that will and should never become fully detached from history” as well as “In addition to being inappropriate and arcane, the master-slave metaphor is both technically and historically inaccurate.” There’s lots of more accurate options depending on context and it costs me nothing to change my vocabulary, especially if it is one less little speed bump to getting a new person excited about tech.

Aside from completely misunderstanding the meaning of master in git (think: master copy, not slave-driver), Scott has managed to lower himself to the point of sweating over a metaphor that encompasses all of human history, not just that of the American Antebellum South. (Check out Thomas Sowell’s chapter The Real History of Slavery in his book Black Rednecks and White Liberals. Sowell is a black, award-winning economist who grew up in the Bronx.)

The word master has quite a few other definitions that would be blotted out if we just tossed it to the side because wokeness. For example:

having complete control over a situation
learning to do something properly
the title of famous painters
the first and original copy of a recording
the head of a ship that carries passengers or goods
a college degree
a revered religious teacher
an abstract thing that has power or influence
a main or principal thing
etc.

Should we expunge these parts of our language because Scott feels bad about things he didn’t do, hundreds of years before he was born? Should we ditch any word that acts as a trigger mechanism for someone else’s discomfort?

I can tell you right now, moist isn’t making the keep list.

It makes me sad to see Hanselman become a useful idiot, and it makes me sad that, hundreds of years after the destruction of the Western slave trade (by the West!), white people are stooping to this level of bullshit because they’ve adopted some perverse doctrine of racial original sin, and are constantly trying to atone for it.

If Scott thinks self-flagellation will fix any actual racial problems in this country, he’s deluded. Changing a git repo name won’t eliminate qualified immunity, bring corrupt cops to justice, or stop our elected officials from shredding the Bill of Rights. It won’t fix fatherlessness in black America, or the blight of narcissistic, checked-out parents in white. It won’t provide healthcare, it won’t feed the poor, it won’t address mental health issues. It won’t protect us from white school shooters, and it won’t save us from black gang bangers. It’s a token gesture without teeth; emotions on meth.

People like this black American nationalist give me hope, though. He knows that real change will come through brutal honesty and personal responsibility, from all sides, and that’s the only way America can get over the blood feud we’ve been nursing for roughly two hundred years. He gets it, and he doesn’t need the contrition of false white guilt to satisfy a grudge. It’s a breath of fresh air, hearing this man speak, and I hope for all our sakes – even for the sake of Scott Hanselman – that many more like him will bring sanity to the sanitarium that is present-day America.

nicholascloud.com

There is no such thing as strong coffee, only weak people.