Automate Your Way to Self-Assembling Documentation
Documentation is what makes it possible for people to use your software without having to put in almost as much work to understand it as you did to write it. It's also one of the dreariest chores of maintaining code, the kind of housekeeping work programmers are notoriously averse to. I'm no exception to that rule, but at the same time I run a moderately popular library, Massive.js, which absolutely needs docs if it's to be useful to anyone else on the planet. So in the spirit of Larry Wall's first virtue, I've gone to considerable lengths to do as little as possible about it.
What is Documentation?
Documentation has taken many forms over the years, from actual dead-tree books to man pages to API documentation sites generated from specially formatted comments and everything in between. There are various advantages and disadvantages to each: anything else beats the book in terms of searchability, but if you need a more structured introduction to something, or are working behind an air gap, books absolutely have their place. Format is something of an independent concern.
A more important question is: what makes documentation good? This is naturally subjective, but a few basic principles make sense:
- Good documentation is current: new features and changes are documented at the time they're integrated, and documentation for the latest release is always up-to-date
- Good documentation is complete: it covers every notable API function, configuration setting, option, and gotcha in the system that end users can expect to deal with
- Good documentation is readable, even -- especially -- for people with limited experience (they need it more than the experts will!)
- Good documentation takes as little time and effort to maintain without sacrificing too much of the above three as possible
Since the only ways to get Massive are from npm
or from GitHub, it's a fairly safe assumption that anyone who needs the documentation will be online. This makes things easier: I can provide documentation as a static site. By "static", I don't mean that it's eternally unchanging, but that it's just plain HTML and CSS, maybe a little JavaScript to liven things up a bit. There's no database, no backend API, no server-side processing.
Full Automation
The absolute easiest way to get something up is to use a documentation generator. These have been around for ages; perldoc and JavaDoc are probably the best-known, but JSDoc has existed for almost 20 years too. With it, I can decorate every function and module with a comment block containing detailed usage information, then run a program which assembles those blocks into a static website.
The JSDoc comment blocks, like JavaDoc, are indicated by a /**
header. This one shows a function, with @param
and @return
tags indicating its arguments and return value respectively. Other tags cover attributes of modules and classes, or provide hints for the JSDoc compiler to change how it organizes pages (distinguishing entities can be tricky in a language like JavaScript!).
/**
* Perform a full-text search on queryable fields. If options.document is true,
* looks in the document body fields instead of the table columns.
*
* @param {Object} plan - Search definition.
* @param {Array} plan.fields - List of the fields to search.
* @param {String} plan.term - Search term.
* @param {Object} [options] - {@link https://massivejs.org/docs/options-objects|Select options}.
* @return {Promise} An array containing any query results.
*/
Queryable.prototype.search = function (plan, options = {}) {
I don't need a complicated .jsdoc.json
config for this:
{
"source": {
"include": ["index.js", "lib", "README.md"]
},
"opts": {
"recurse": true
}
}
All that's left is to add a script in my package.json
to run JSDoc:
"docs": "rm -rf ./docs/api && jsdoc -d ./docs/api -c ./.jsdoc.json -r"
Now npm run docs
generates a fresh API documentation site -- all I have to do is keep my comment blocks up to date and remember to run it!
There are two problems with this picture:
First, that particular bit of documentation raises as many questions as it answers. What are document body fields? I'm just assuming people know what those are. And the description of the options
object is -- well, that's getting a bit ahead of myself. Queryable.search
doesn't exist in a void: in order to understand what that function does, a developer needs to understand what the options
object can do and what documents and their body fields are. That's a lot to dump into a single JSDoc comment. Especially when you consider that the options
object applies to most of Massive's data access functions, many of which concern documents! Clearly, I need a second level of documentation which serves as a conceptual rather than a purely technical reference. But: I can't generate something like that automatically.
Second, I have to remember to run it. It's a one-line shell script. I shouldn't have to remember to run it. Let's get that one out of the way first:
Lifecycle Events
Several npm
tasks provide hooks for you to execute scripts from your package.json before or after execution. Some, like npm test
, require you to implement the task itself as a script. One such task with hooks is npm version
. The preversion
script executes before it bumps the version number; the version
script executes after the bump, but before it commits the changed package definition into source control; and the postversion
script executes after the commit.
I really only have to make sure the API documentation is up to date when I'm releasing a new version. Running JSDoc in preversion
is perfect. If I want to keep the documentation update separate from the version bump, I can just put together a shell script that runs in the hook:
#!/bin/bash
echo "regenerating API docs"
npm run docs
echo "committing updated API docs"
git add docs/api
git commit -m "regenerate api docs"
Conceptual Reference: Jekyll and GitHub Pages
JSDoc is a great tool, but it can't introduce and connect the concepts users need to understand in order to work with Massive. The only way that's happening is if I write it myself, but I don't want to write raw HTML when I could work with the much more friendly Markdown instead. Fortunately, there's no shortage of static site generators which can convert Markdown to HTML. I use Fledermaus for my blog. Or I could use ReadTheDocs, a documentation-focused generator as a service, again. That's where the legacy docs are already hosted. But it's pretty much just me on Massive, so I want to centralize. GitHub Pages uses Jekyll; that makes that an easy decision.
I think the hardest part of using Jekyll is deciding on a theme. Other than that, the _config.yml
is pretty basic, and once I figure out I can customize the layout by copying the theme's base to my own _layouts/default.html
and get the path to my stylesheet straightened out all that's left is writing the content.
Pages in a Jekyll site, like articles on dev.to and (probably) other platforms, are Markdown files with an optional "front matter" section at the top of the file (the front matter is required for blog posts).
Seeing what the documentation looks like locally takes a few steps:
- Install Ruby via package manager
gem install bundler
- Create a
Gemfile
which pulls in thegithub-pages
Ruby gem bundle install
- Then, unless I add more dependencies to the
Gemfile
, I canbundle exec jekyll serve
and point my browser to the local address Jekyll is running on
At this point, I have a docs/
directory in my working tree:
docs
├── api # JSDoc output
├── assets
│ └── css
│ └── style.scss # Jekyll handles processing SCSS
├── _config.yml # Main Jekyll config
├── Gemfile # Jekyll dependency management
├── Gemfile.lock # Auto-generated Jekyll dependency manifest
├── index.md # Documentation landing page
├── _layouts
│ └── default.html # Customized HTML layout template
├── some-docs.md # Some documentation!
└── _site # Jekyll output (this is .gitignored)
GitHub Pages can host an entire repository from the master
branch, a docs directory in master
, or a separate gh-pages
branch. While I do have a docs directory, I don't want my documentation to update every time I land a commit on master
. Massive's docs need to be current for the version of the library people get from npm install
, not for every little change I make. So I create a gh-pages
branch, clean it out, and copy my docs directory into the root (minus _site
since GitHub Pages runs Jekyll itself). The JSDoc output is included so the static site is complete, containing both the conceptual and the technical references.
After pushing and a bit of trial and error, I have the site up and working! But I really, really don't want to have to do all this manually every time I cut a release.
Automating Documentation Management
My script for the preversion
lifecycle event lets me basically ignore the JSDoc as long as I keep it up to date. If I can script out the steps to update the gh-pages
branch, I can use another lifecycle event to take the work out of managing the rest of it. Since everything's happening in another branch, kicking off after the version bump with postversion
is sufficient.
First things first: what version am I updating the docs for? That information is in a couple of places: I could look for the latest git tag, or I could pull it out of package.json. Which to use is mostly a matter of taste. I'm pretty familiar with jq
(think sed
for JSON), so I go with that over git describe
:
type jq >/dev/null 2>&1 && { VERSION=$(jq .version package.json); } || exit 1
This line first ensures that jq
exists on the system. If it does, it sets the VERSION
variable to the version
field in package.json; otherwise, it aborts with a failing error code to stop execution.
The next step is to get the current branch name and the commit SHA for the version bump:
BRANCH=$(git symbolic-ref --short HEAD)
COMMIT=$(git rev-parse --short "$BRANCH")
Then, it's time to git checkout gh-pages
and get to work. I want to make sure no old files are present in the working tree, but I do have a customized .gitignore that I need to keep.
git clean -dfqx
git ls-tree --name-only gh-pages | grep -v "\(.gitignore\)" | xargs -I {} rm -r {}
git clean
deletes all untracked files from the working tree. Then I git ls-tree
the branch's root directory, perform an inverse grep to filter out my .gitignore, and pass every other file in it into rm -r
with xargs
. At the end of this, the working tree should be completely empty except for the .gitignore. Now to pull the up-to-date documentation over from the original branch:
git checkout "$BRANCH" -- docs
mv docs/* .
rm -r docs
Fairly straightforward: it checks out only the docs directory, moves its contents into the working tree root, and cleans up the now-empty directory. This is the home stretch.
git add .
git commit -m "regenerate documentation for $VERSION ($BRANCH $COMMIT)"
git checkout "$BRANCH"
Add the files, commit them with the new version number and source commit information. Then with that all done, checkout the original branch again. I could push gh-pages
, but I'm a little paranoid about automating uploads, so my script just echo
es a reminder to do that manually.
This all goes in another shell script and then I just have to make sure that that script runs on postversion
!
Start to Finish
Now, when I npm version
to create a new release of Massive, my scripts fire on the lifecycle events. The preversion
script updates my API documentation and commits it before anything else happens. The standard version
functionality takes over at that point, setting the new version in package.json, committing the change, and tagging it with the new version. Finally, my postversion
script assembles the latest documentation and commits it to the gh-pages
branch. The only thing left for me to do manually is to push that branch along with master
and the new tag. As long as I keep my JSDoc comments and reference documentation up to date, the rest of it takes care of itself!