Parallelization

If you’ve ever checked your computer’s resource usage while waiting for a script that just wouldn’t finish, you may have been dismayed to find that while one CPU core was chugging along at top speed, the others were sitting idle. If they just worked together, you may have thought, I’d be done by now.

Enter parallelization!

Parallelization is a programming paradigm in which a problem is split into small parts to be executed simultaneously – i.e., in parallel. This allows multiple processors, or even multiple computers, to contribute to solving the problem, cutting execution time dramatically (Adefemi). You’ve probably benefited from your browser performing tasks in the background to avoid slowing down a webpage or your CPU offloading graphics processing to the GPU to get hyper-realistic graphics at 60 FPS – both examples of parallel computing. However, applications of parallelization run the gamut from these personal uses to scientific research on supercomputers at my alma mater, or even on distributed networks of thousands or millions of volunteer computers worldwide, such as the network used by folding@home to simulate protein folding and understand its role in disease.

Unfortunately, parallelization has some drawbacks. First, the mechanisms to transfer data between threads or processors and to synchronize results introduce overhead costs in time, memory, and program complexity. Second, not all programs can be parallelized. Code that relies on earlier results must wait until those results are calculated. Thus, highly sequential programs cannot be parallelized, or can be parallelized so minimally that the costs outweigh the gains. The problems that benefit most from parallel programming have many parts that can be run independently in no particular order (Adefemi, Factors Impacting Parallelization).

With these constraints in mind, let’s see how to use parallelization in JavaScript. JavaScript is single-threaded, but it has some features, such as web workers, that allow us to split processes off into separate threads. If we delegate parts of our workload to several of those web workers, they can handle our calculations in parallel and give us the results when they finish.

Fortunately, we don’t have to create and manage those web workers ourselves. Instead, we can use parallel.js, a library which does exactly what we want. It creates web workers, sets up an environment for them, assigns jobs, and allows access to the processed data. We’ll use parallel.js to parallelize my solution to the n-queens problem to see how it works.

First, we have to decide how to split up the problem into chunks that can run in parallel. Since the first queen has to go somewhere in the first row, I can refactor my code into a function that takes a column position for the first queen and solves the rest of the board from there. Since this is a class assignment, I won’t show the whole algorithm, but the function signature is useful to see:

function countSolutionsFromStartingColumn (n, col) {
...
}

To solve the whole board, I simply iterate over every possible first column position. (Actually, I can do better that – I can exploit the symmetry of the problem to iterate over half of the board and multiply by two. Odd boards take a little more finesse to not double count the center column but proceed similarly.)

solveQueensInSeries = function (n) {
const halfwayPoint = Math.floor(n / 2) - 1;

let solutionCount = 0;

//count the solutions for half the board
for (let i = 0; i <= halfwayPoint; i++) {
solutionCount += countSolutionsFromStartingColumn(n, i);
}

//multiply the first half of the board by 2 to count the other half of the board
solutionCount *= 2;

//if n is odd, count the solutions for the center file of the board (this doesn't get multiplied by 2)
if (n % 2 === 1) {
solutionCount += countSolutionsFromStartingColumn(n, halfwayPoint + 1);
}

return solutionCount;
}

In the parallel case, I can divide each of these starting columns among my workers. Parallel.js has a handy .map() function that, like the native .map(), applies a callback function to each item of an array. We create the array of starting columns, and our library takes care of assigning jobs to workers. (Note that the results get written into the input array in place.)

async function solveQueensInParallel(n, maxWorkers) {
const halfwayPoint = Math.floor(n / 2) - 1;

let startingColumns = [];

for (let i = 0; i <= halfwayPoint; i++) {
startingColumns.push(i);
}

if (n % 2 === 1) {
startingColumns.push(halfwayPoint + 1);
}

let p = new Parallel(startingColumns, {
env: {
n: n
},
evalPath: '/eval.js',
maxWorkers: maxWorkers
}).require(countSolutionsFromStartingColumn, "src/BitwiseBoardCopy.js");

function curriedSolver(col) {
return countSolutionsFromStartingColumn(global.env.n, col);
}

await p.map(curriedSolver);

let solutionCount = 0;

for (let i = 0; i <= halfwayPoint; i++) {
solutionCount += p.data[i];
}

solutionCount *= 2;

if (n % 2 === 1) {
solutionCount += p.data[halfwayPoint + 1];
}

console.log(solutionCount);
return solutionCount;
}

I did have some problems setting up the environment the workers. Web workers operate in their own global scope (either shared among several or specific to one), not the window object of the main script, so I had to pass in functions and external scripts using .require() and variables with the env object in the constructor options. I actually used a less efficient solution that had fewer dependencies and so didn’t require me to figure out how to give the workers access to other libraries.

So, after all of this restructuring, did the parallel solver do any better? I timed my solvers by simply subtracting the end time from the start time using Date.now(). I don’t think this is the most precise benchmarking method, but is good enough for a single, long-running function like these. My results were:

Run	Time in seconds	Approx. time in minutes
No parallelization	1164.807	19.4
2 cores	748.732	12.5
4 cores	588.162	9.8
8 cores	444.225	7.4

(These are each recorded from one run. It would be better to do several runs and take the average, but given that even the fastest took several minutes to complete, I opted not to.)

So the parallel version is faster, but it’s far from the “divide the runtime by the number of workers” rule that I expected. I thought perhaps the columns took different amounts of time to run, so the work wasn’t really being evenly distributed across the workers. Timing each starting column individually (without the parallel wrapper) gave me:

Column	Time in seconds
0	111.319
1	128.719
2	140.936
3	147.07
4	153.836
5	157.853
6	161.02
7	164.17
Sum	1164.923
Worst 2	325.19
Worst 4	636.879

So there is some variance in the runtimes, but not enough to explain the discrepancy. The 8 workers case is particularly surprising, since each worker should handle one column, so I’d expect the total runtime to be approximately the same as the slowest column. However the actual runtime is more than twice the length of column 7. I don’t understand what’s causing so much delay. Is that simply the expected overhead due to parallelization? Perhaps my CPU gets less efficient the more cores are in use, or perhaps the web worker setup is more expensive than I expected.

Despite some lingering questions, I have dramatically improved my runtime for this problem, and I’m confident I will find more use cases for parallel programming in the future. If you think you might too, I absolutely recommend the article What Every Computer Scientist Needs to Know About Parallelization by Temitayo Adefemi, which I referenced several times. The article goes into much greater detail, especially on implementation and as the types of problems that benefit from parallelization, and was a very thorough introduction to the topic.

Similar Posts