View on GitHub

check-spelling-docs

Documentation for check-spelling

Just in time for Halloween 🎃 (2021), a spooky 👻 fast 🏃 speed improvement.

Background

In the distant past, I ran into versions of xargs which would somehow fail because the constructed command-line exceeded the available space.

In response, I made it absolutely pessimistic about how xargs worked.

With the introduction of parallelism (initially via parallel, but more recently directly via xargs), it was also afraid of leaving one job with a long list of work while the other threads were finished. I did a bit of testing and settled on 8 as better than 1 and unlikely to leave a huge imbalance.

GitHub Action Runtime has a fairly large default environment which impacts the argument capacity for running subtasks.

With all this, the parallel task was instructed to run the actual core spell checker on files in batches of 8.

Each time the spell checker runs, it opens the dictionary (a text file) and reads in its contents into a hashmap (implemented using a standard Perl hash(table)).

Changes

Instead of checking 8 files at a time, I'm splitting the work into approximately 4 rounds per thread.

Future work

I've been toying with sorting the check list to put the largest files first. Configurable file size limits means that the meta analyzer potentially has access to this information before it starts assigning files to workers.

In theory this could reduce the likelihood that the last worker will be stuck working on a really large file.