automate your way to self-assembling documentation

Automate Your Way to Self-Assembling Documentation

Documentation is what makes it possible for people to use your software without having to put in almost as much work to understand it as you did to write it. It's also one of the dreariest chores of maintaining code, the kind of housekeeping work programmers are notoriously averse to. I'm no exception to that rule, but at the same time I run a moderately popular library, Massive.js, which absolutely needs docs if it's to be useful to anyone else on the planet. So in the spirit of Larry Wall's first virtue, I've gone to considerable lengths to do as little as possible about it.

What is Documentation?

Documentation has taken many forms over the years, from actual dead-tree books to man pages to API documentation sites generated from specially formatted comments and everything in between. There are various advantages and disadvantages to each: anything else beats the book in terms of searchability, but if you need a more structured introduction to something, or are working behind an air gap, books absolutely have their place. Format is something of an independent concern.

A more important question is: what makes documentation good? This is naturally subjective, but a few basic principles make sense:

Good documentation is current: new features and changes are documented at the time they're integrated, and documentation for the latest release is always up-to-date
Good documentation is complete: it covers every notable API function, configuration setting, option, and gotcha in the system that end users can expect to deal with
Good documentation is readable, even -- especially -- for people with limited experience (they need it more than the experts will!)
Good documentation takes as little time and effort to maintain without sacrificing too much of the above three as possible

Since the only ways to get Massive are from npm or from GitHub, it's a fairly safe assumption that anyone who needs the documentation will be online. This makes things easier: I can provide documentation as a static site. By "static", I don't mean that it's eternally unchanging, but that it's just plain HTML and CSS, maybe a little JavaScript to liven things up a bit. There's no database, no backend API, no server-side processing.

Full Automation

The absolute easiest way to get something up is to use a documentation generator. These have been around for ages; perldoc and JavaDoc are probably the best-known, but JSDoc has existed for almost 20 years too. With it, I can decorate every function and module with a comment block containing detailed usage information, then run a program which assembles those blocks into a static website.

The JSDoc comment blocks, like JavaDoc, are indicated by a /** header. This one shows a function, with @param and @return tags indicating its arguments and return value respectively. Other tags cover attributes of modules and classes, or provide hints for the JSDoc compiler to change how it organizes pages (distinguishing entities can be tricky in a language like JavaScript!).

/**
 * Perform a full-text search on queryable fields. If options.document is true,
 * looks in the document body fields instead of the table columns.
 *
 * @param {Object} plan - Search definition.
 * @param {Array} plan.fields - List of the fields to search.
 * @param {String} plan.term - Search term.
 * @param {Object} [options] - {@link https://massivejs.org/docs/options-objects|Select options}.
 * @return {Promise} An array containing any query results.
 */
Queryable.prototype.search = function (plan, options = {}) {

I don't need a complicated .jsdoc.json config for this:

{
  "source": {
    "include": ["index.js", "lib", "README.md"]
  },
  "opts": {
    "recurse": true
  }
}

All that's left is to add a script in my package.json to run JSDoc:

"docs": "rm -rf ./docs/api && jsdoc -d ./docs/api -c ./.jsdoc.json -r"

Now npm run docs generates a fresh API documentation site -- all I have to do is keep my comment blocks up to date and remember to run it!

There are two problems with this picture:

First, that particular bit of documentation raises as many questions as it answers. What are document body fields? I'm just assuming people know what those are. And the description of the options object is -- well, that's getting a bit ahead of myself. Queryable.search doesn't exist in a void: in order to understand what that function does, a developer needs to understand what the options object can do and what documents and their body fields are. That's a lot to dump into a single JSDoc comment. Especially when you consider that the options object applies to most of Massive's data access functions, many of which concern documents! Clearly, I need a second level of documentation which serves as a conceptual rather than a purely technical reference. But: I can't generate something like that automatically.

Second, I have to remember to run it. It's a one-line shell script. I shouldn't have to remember to run it. Let's get that one out of the way first:

Lifecycle Events

Several npm tasks provide hooks for you to execute scripts from your package.json before or after execution. Some, like npm test, require you to implement the task itself as a script. One such task with hooks is npm version. The preversion script executes before it bumps the version number; the version script executes after the bump, but before it commits the changed package definition into source control; and the postversion script executes after the commit.

I really only have to make sure the API documentation is up to date when I'm releasing a new version. Running JSDoc in preversion is perfect. If I want to keep the documentation update separate from the version bump, I can just put together a shell script that runs in the hook:

#!/bin/bash

echo "regenerating API docs"

npm run docs

echo "committing updated API docs"

git add docs/api

git commit -m "regenerate api docs"

Conceptual Reference: Jekyll and GitHub Pages

JSDoc is a great tool, but it can't introduce and connect the concepts users need to understand in order to work with Massive. The only way that's happening is if I write it myself, but I don't want to write raw HTML when I could work with the much more friendly Markdown instead. Fortunately, there's no shortage of static site generators which can convert Markdown to HTML. I use Fledermaus for my blog. Or I could use ReadTheDocs, a documentation-focused generator as a service, again. That's where the legacy docs are already hosted. But it's pretty much just me on Massive, so I want to centralize. GitHub Pages uses Jekyll; that makes that an easy decision.

I think the hardest part of using Jekyll is deciding on a theme. Other than that, the _config.yml is pretty basic, and once I figure out I can customize the layout by copying the theme's base to my own _layouts/default.html and get the path to my stylesheet straightened out all that's left is writing the content.

Pages in a Jekyll site, like articles on dev.to and (probably) other platforms, are Markdown files with an optional "front matter" section at the top of the file (the front matter is required for blog posts).

Seeing what the documentation looks like locally takes a few steps:

Install Ruby via package manager
gem install bundler
Create a Gemfile which pulls in the github-pages Ruby gem
bundle install
Then, unless I add more dependencies to the Gemfile, I can bundle exec jekyll serve and point my browser to the local address Jekyll is running on

At this point, I have a docs/ directory in my working tree:

docs
├── api                 # JSDoc output
├── assets
│   └── css
│       └── style.scss  # Jekyll handles processing SCSS
├── _config.yml         # Main Jekyll config
├── Gemfile             # Jekyll dependency management
├── Gemfile.lock        # Auto-generated Jekyll dependency manifest
├── index.md            # Documentation landing page
├── _layouts
│   └── default.html    # Customized HTML layout template
├── some-docs.md        # Some documentation!
└── _site               # Jekyll output (this is .gitignored)

GitHub Pages can host an entire repository from the master branch, a docs directory in master, or a separate gh-pages branch. While I do have a docs directory, I don't want my documentation to update every time I land a commit on master. Massive's docs need to be current for the version of the library people get from npm install, not for every little change I make. So I create a gh-pages branch, clean it out, and copy my docs directory into the root (minus _site since GitHub Pages runs Jekyll itself). The JSDoc output is included so the static site is complete, containing both the conceptual and the technical references.

After pushing and a bit of trial and error, I have the site up and working! But I really, really don't want to have to do all this manually every time I cut a release.

Automating Documentation Management

My script for the preversion lifecycle event lets me basically ignore the JSDoc as long as I keep it up to date. If I can script out the steps to update the gh-pages branch, I can use another lifecycle event to take the work out of managing the rest of it. Since everything's happening in another branch, kicking off after the version bump with postversion is sufficient.

First things first: what version am I updating the docs for? That information is in a couple of places: I could look for the latest git tag, or I could pull it out of package.json. Which to use is mostly a matter of taste. I'm pretty familiar with jq (think sed for JSON), so I go with that over git describe:

type jq >/dev/null 2>&1 && { VERSION=$(jq .version package.json); } || exit 1

This line first ensures that jq exists on the system. If it does, it sets the VERSION variable to the version field in package.json; otherwise, it aborts with a failing error code to stop execution.

The next step is to get the current branch name and the commit SHA for the version bump:

BRANCH=$(git symbolic-ref --short HEAD)
COMMIT=$(git rev-parse --short "$BRANCH")

Then, it's time to git checkout gh-pages and get to work. I want to make sure no old files are present in the working tree, but I do have a customized .gitignore that I need to keep.

git clean -dfqx
git ls-tree --name-only gh-pages | grep -v "\(.gitignore\)" | xargs -I {} rm -r {}

git clean deletes all untracked files from the working tree. Then I git ls-tree the branch's root directory, perform an inverse grep to filter out my .gitignore, and pass every other file in it into rm -r with xargs. At the end of this, the working tree should be completely empty except for the .gitignore. Now to pull the up-to-date documentation over from the original branch:

git checkout "$BRANCH" -- docs

mv docs/* .

rm -r docs

Fairly straightforward: it checks out only the docs directory, moves its contents into the working tree root, and cleans up the now-empty directory. This is the home stretch.

git add .

git commit -m "regenerate documentation for $VERSION ($BRANCH $COMMIT)"

git checkout "$BRANCH"

Add the files, commit them with the new version number and source commit information. Then with that all done, checkout the original branch again. I could push gh-pages, but I'm a little paranoid about automating uploads, so my script just echoes a reminder to do that manually.

This all goes in another shell script and then I just have to make sure that that script runs on postversion!

Start to Finish

Now, when I npm version to create a new release of Massive, my scripts fire on the lifecycle events. The preversion script updates my API documentation and commits it before anything else happens. The standard version functionality takes over at that point, setting the new version in package.json, committing the change, and tagging it with the new version. Finally, my postversion script assembles the latest documentation and commits it to the gh-pages branch. The only thing left for me to do manually is to push that branch along with master and the new tag. As long as I keep my JSDoc comments and reference documentation up to date, the rest of it takes care of itself!