Chapter 10: Neural Networks
The human brain has 100 billion neurons,
each neuron connected to 10 thousand
other neurons. Sitting on your shoulders
is the most complicated object
in the known universe.
—Michio Kaku
Khipu on display at the Machu Picchu Museum, Cusco, Peru (photo by Pi3.124)
The khipu (or quipu) is an ancient Incan device used for recordkeeping and communication. It comprised a complex system of knotted cords to encode and transmit information. Each colored string and knot type and pattern represented specific data, such as census records or calendrical information. Interpreters, known as quipucamayocs, acted as a kind of accountant and decoded the stringed n…
Chapter 10: Neural Networks
The human brain has 100 billion neurons,
each neuron connected to 10 thousand
other neurons. Sitting on your shoulders
is the most complicated object
in the known universe.
—Michio Kaku
Khipu on display at the Machu Picchu Museum, Cusco, Peru (photo by Pi3.124)
The khipu (or quipu) is an ancient Incan device used for recordkeeping and communication. It comprised a complex system of knotted cords to encode and transmit information. Each colored string and knot type and pattern represented specific data, such as census records or calendrical information. Interpreters, known as quipucamayocs, acted as a kind of accountant and decoded the stringed narrative into understandable information.
I began with inanimate objects living in a world of forces, and I gave them desires, autonomy, and the ability to take action according to a system of rules. Next, I allowed those objects, now called creatures, to live in a population and evolve over time. Now I’d like to ask, What is each creature’s decision-making process? How can it adjust its choices by learning over time? Can a computational entity process its environment and generate a decision?
To answer these questions, I’ll once again look to nature for inspiration—specifically, the human brain. A brain can be described as a biological neural network, an interconnected web of neurons transmitting elaborate patterns of electrical signals. Within each neuron, dendrites receive input signals, and based on those inputs, the neuron fires an output signal via an axon (see Figure 10.1). Or something like that. How the human brain actually works is an elaborate and complex mystery, one that I’m certainly not going to attempt to unravel in rigorous detail in this chapter.
Figure 10.1: A neuron with dendrites and an axon connected to another neuron
Fortunately, as you’ve seen throughout this book, developing engaging animated systems with code doesn’t require scientific rigor or accuracy. Designing a smart rocket isn’t rocket science, and neither is designing an artificial neural network brain science. It’s enough to simply be inspired by the idea of brain function.
In this chapter, I’ll begin with a conceptual overview of the properties and features of neural networks and build the simplest possible example of one, a network that consists of a single neuron. I’ll then introduce you to more complex neural networks by using the ml5.js library. This will serve as a foundation for Chapter 11, the grand finale of this book, where I’ll combine GAs with neural networks for physics simulation.
Introducing Artificial Neural Networks
Computer scientists have long been inspired by the human brain. In 1943, Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, developed the first conceptual model of an artificial neural network. In their paper “A Logical Calculus of the Ideas Immanent in Nervous Activity,” they describe a **neuron **as a single computational cell living in a network of cells that receives inputs, processes those inputs, and generates an output.
Their work, and the work of many scientists and researchers who followed, wasn’t meant to accurately describe how the biological brain works. Rather, an artificial neural network (hereafter referred to as just a neural network) was intended as a computational model based on the brain, designed to solve certain kinds of problems that were traditionally difficult for computers.
Some problems are incredibly simple for a computer to solve but difficult for humans like you and me. Finding the square root of 964,324 is an example. A quick line of code produces the value 982, a number my computer can compute in less than a millisecond, but if you asked me to calculate that number myself, you’d be in for quite a wait. On the other hand, certain problems are incredibly simple for you or me to solve, but not so easy for a computer. Show any toddler a picture of a kitten or puppy, and they’ll quickly be able to tell you which one is which. Listen to a conversation in a noisy café and focus on just one person’s voice, and you can effortlessly comprehend their words. But need a machine to perform one of these tasks? Scientists have spent entire careers researching and implementing complex solutions, and neural networks are one of them.
Here are some of the easy-for-a-human, difficult-for-a-machine applications of neural networks in software today:
- Pattern recognition: Neural networks are well suited to problems when the aim is to detect, interpret, and classify features or patterns within a dataset. This includes everything from identifying objects (like faces) in images, to optical character recognition, to more complex tasks like gesture recognition.
- **Time-series prediction and anomaly detection: **Neural networks are utilized both in forecasting, such as predicting stock market trends or weather patterns, and in recognizing anomalies, which can be applied to areas like cyberattack detection and fraud prevention.
- **Control and adaptive decision-making systems: **These applications range from autonomous vehicles like self-driving cars and drones to adaptive decision-making used in game playing, pricing models, and recommendation systems on media platforms.
- Signal processing and soft sensors: Neural networks play a crucial role in devices like cochlear implants and hearing aids by filtering noise and amplifying essential sounds. They’re also involved in soft sensors, software systems that process data from multiple sources to give a comprehensive analysis of the environment.
- Natural language processing (NLP): One of the biggest developments in recent years has been the use of neural networks for processing and understanding human language. They’re used in various tasks including machine translation, sentiment analysis, and text summarization, and are the underlying technology behind many digital assistants and chatbots.
- Generative models: The rise of novel neural network architectures has made it possible to generate new content. These systems can synthesize images, enhance image resolution, transfer style between images, and even generate music and video.
Covering the full gamut of applications for neural networks would merit an entire book (or series of books), and by the time that book was printed, it would probably be out of date. Hopefully, this list gives you an overall sense of the features and possibilities.
How Neural Networks Work
In some ways, neural networks are quite different from other computer programs. The computational systems I’ve been writing so far in this book are procedural: a program starts at the first line of code, executes it, and goes on to the next, following instructions in a linear fashion. By contrast, a true neural network doesn’t follow a linear path. Instead, information is processed collectively, in parallel, throughout a network of nodes, with each node representing a neuron. In this sense, a neural network is considered a **connectionist **system.
In other ways, neural networks aren’t so different from some of the programs you’ve seen. A neural network exhibits all the hallmarks of a complex system, much like a cellular automaton or a flock of boids. Remember how each individual boid was simple to understand, yet by following only three rules—separation, alignment, cohesion—it contributed to complex behaviors? Each individual element in a neural network is equally simple to understand. It reads an input (a number), processes it, and generates an output (another number). That’s all there is to it, and yet a network of many neurons can exhibit incredibly rich and intelligent behaviors, echoing the complex dynamics seen in a flock of boids.
In fact, a neural network isn’t just a complex system, but a complex adaptive system, meaning it can change its internal structure based on the information flowing through it. In other words, it has the ability to learn. Typically, this is achieved by adjusting weights. In Figure 10.2, each arrow represents a connection between two neurons and indicates the pathway for the flow of information. Each connection has a weight, a number that controls the signal between the two neurons. If the network generates a good output (which I’ll define later), there’s no need to adjust the weights. However, if the network generates a poor output—an error, so to speak—then the system adapts, altering the weights with the hope of improving subsequent results.
Neural networks may use a variety of strategies for learning, and I’ll focus on one of them in this chapter:
- Supervised learning: Essentially, this strategy involves a teacher that’s smarter than the network itself. Take the case of facial recognition. The teacher shows the network a bunch of faces, and the teacher already knows the name associated with each face. The network makes its guesses; then the teacher provides the network with the actual names. The network can compare its answers to the known correct ones and make adjustments according to its errors. The neural networks in this chapter follow this model.
- Unsupervised learning: This technique is required when you don’t have an example dataset with known answers. Instead, the network works on its own to uncover hidden patterns in the data. An application of this is clustering: a set of elements is divided into groups according to an unknown pattern. I won’t be showing any instances of unsupervised learning, as the strategy is less relevant to the book’s examples.
- R****einforcement learning: This strategy is built on observation: a learning agent makes decisions and looks to its environment for the results. It’s rewarded for good decisions and penalized for bad decisions, such that it learns to make better decisions over time. I’ll discuss this strategy in more detail in Chapter 11.
The ability of a neural network to learn, to make adjustments to its structure over time, is what makes it so useful in the field of machine learning. This term can be traced back to the 1959 paper “Some Studies in Machine Learning Using the Game of Checkers,” in which computer scientist Arthur Lee Samuel outlines a “self-learning” program for playing checkers. The concept of an algorithm enabling a computer to learn without explicit programming is the foundation of machine learning.
Think about what you’ve been doing throughout this book: coding! In traditional programming, a computer program takes inputs and, based on the rules you’ve provided, produces outputs. Machine learning, however, turns this approach upside down. Instead of you writing the rules, the system is given example inputs and outputs, and generates the rules itself! Many algorithms can be used to implement machine learning, and a neural network is just one of them.
Machine learning is part of the broad, sweeping field of artificial intelligence (AI), although the terms are sometimes used interchangeably. In their thoughtful and friendly primer A People’s Guide to AI, Mimi Onuoha and Diana Nucera (aka Mother Cyborg) define AI as “the theory and development of computer systems able to perform tasks that normally require human intelligence.” Machine learning algorithms are one approach to these tasks, but not all AI systems feature a self-learning component.
Machine Learning Libraries
Today, leveraging machine learning in creative coding and interactive media isn’t only feasible but increasingly common, thanks to third-party libraries that handle a lot of the neural network implementation details under the hood. While the vast majority of machine learning development and research is done in Python, the world of web development has seen the emergence of powerful JavaScript-based tools. Two libraries of note are TensorFlow.js and ml5.js.
TensorFlow.js** **is an open source library that lets you define, train, and run neural networks directly in the browser using JavaScript, without the need to install or configure complex environments. It’s part of the TensorFlow ecosystem, which is maintained and developed by Google. TensorFlow.js is a powerful tool, but its low-level operations and highly technical API can be intimidating to beginners. Enter ml5.js, a library built on top of TensorFlow.js and designed specifically for use with p5.js. Its goal is to be beginner friendly and make machine learning approachable for a broad audience of artists, creative coders, and students. I’ll demonstrate how to use ml5.js in “Machine Learning with ml5.js”.
A benefit of libraries like TensorFlow.js and ml5.js is that you can use them to run pretrained models. A machine learning model is a specific setup of neurons and connections, and a pretrained model is one that has already been prepared for a particular task. For example, popular pretrained models are used for classifying images, identifying body poses, recognizing facial landmarks or hand positions, and even analyzing the sentiment expressed in a text. You can use such a model as is or treat it as a starting point for additional learning (commonly referred to as transfer learning).
Before I get to exploring the ml5.js library, however, I’d like to try my hand at building the simplest of all neural networks from scratch, using only p5.js, to illustrate how the concepts of neural networks and machine learning are implemented in code.
The Perceptron
A perceptron is the simplest neural network possible: a computational model of a single neuron. Invented in 1957 by Frank Rosenblatt at the Cornell Aeronautical Laboratory, a perceptron consists of one or more inputs, a processor, and a single output, as shown in Figure 10.3.
Figure 10.3: A simple perceptron with two inputs and one output
A perceptron follows the feed-forward model: data passes (feeds) through the network in one direction. The inputs are sent into the neuron, are processed, and result in an output. This means the one-neuron network diagrammed in Figure 10.3 reads from left to right (forward): inputs come in, and output goes out.
Say I have a perceptron with two inputs, the values 12 and 4. In machine learning, it’s customary to denote each input with an xx, so I’ll call these inputs x0x_0 and x1x_1:
| Input | Value |
|---|---|
| x0x_0 | 12 |
| x1x_1 | 4 |
Perceptron Steps
To get from these inputs to an output, the perceptron follows a series of steps.
Step 1: Weight the Inputs
Each input sent into the neuron must first be weighted, meaning it’s multiplied by a value, often a number from –1 to +1. When creating a perceptron, the inputs are typically assigned random weights. I’ll call my weights w0w_0 and w1w_1:
| Weight | Value |
|---|---|
| w0w_0 | 0.5 |
| w1w_1 | –1 |
Each input needs to be multiplied by its corresponding weight:
| Input | Weight | Input ×\boldsymbol{\times} Weight |
|---|---|---|
| 12 | 0.5 | 6 |
| 4 | –1 | –4 |
Step 2: Sum the Inputs
The weighted inputs are then added together:
6+−4=26 + -4 = 2
Step 3: Generate the Output
The output of a perceptron is produced by passing the sum through an activation function that reduces the output to one of two possible values. Think of this binary output as an LED that’s only off or on, or as a neuron in an actual brain that either fires or doesn’t fire. The activation function determines whether the perceptron should “fire.”
Activation functions can get a little bit hairy. If you start reading about them in an AI textbook, you may soon find yourself reaching in turn for a calculus textbook. However, your new friend the simple perceptron provides an easier option that still demonstrates the concept. I’ll make the activation function the sign of the sum. If the sum is a positive number, the output is 1; if it’s negative, the output is –1:
sign(2)=+1\text{sign}(2) = +1
Putting It All Together
Putting the preceding three parts together, here are the steps of the perceptron algorithm:
- For every input, multiply that input by its weight.
- Sum all the weighted inputs.
- Compute the output of the perceptron by passing that sum through an activation function (the sign of the sum).
I can start writing this algorithm in code by using two arrays of values, one for the inputs and one for the weights:
let inputs = [12, 4];
let weights = [0.5, -1];
The “for every input” in step 1 implies a loop that multiplies each input by its corresponding weight. To obtain the sum, the results can be added up in that same loop:
let sum = 0;
for (let i = 0; i < inputs.length; i++) {
sum += inputs[i] * weights[i];
}
With the sum, I can then compute the output:
let output = activate(sum);
function activate(sum) {
if (sum > 0) {
return 1;
} else {
return -1;
}
}
You might be wondering how I’m handling the value of 0 in the activation function. Is 0 positive or negative? The deep philosophical implications of this question aside, I’m choosing here to arbitrarily return a –1 for 0, but I could easily change the > to >= to go the other way. Depending on the application, this decision could be significant, but for demonstration purposes here, I can just pick one.
Now that I’ve explained the computational process of a perceptron, let’s look at an example of one in action.
Simple Pattern Recognition Using a Perceptron
I’ve mentioned that neural networks are commonly used for pattern recognition. The scenarios outlined earlier require more complex networks, but even a simple perceptron can demonstrate a fundamental type of pattern recognition in which data points are classified as belonging to one of two groups. For instance, imagine you have a dataset of plants and want to identify them as either xerophytes (plants that have evolved to survive in an environment with little water and lots of sunlight, like the desert) or hydrophytes (plants that have adapted to living submerged in water, with reduced light). That’s how I’ll use my perceptron in this section.
One way to approach classifying the plants is to plot their data on a 2D graph and treat the problem as a spatial one. On the x-axis, plot the amount of daily sunlight received by the plant, and on the y-axis, plot the amount of water. Once all the data has been plotted, it’s easy to draw a line across the graph, with all the xerophytes on one side and all the hydrophytes on the other, as in Figure 10.4. (I’m simplifying a little here. Real-world data would probably be messier, making the line harder to draw.) That’s how each plant can be classified. Is it below the line? Then it’s a xerophyte. Is it above the line? Then it’s a hydrophyte.
Figure 10.4: A collection of points in 2D space divided by a line, representing plant categories according to their water and sunlight intake
In truth, I don’t need a neural network—not even a simple perceptron—to tell me whether a point is above or below a line. I can see the answer for myself with my own eyes, or have my computer figure it out with simple algebra. But just like solving a problem with a known answer—“to be or not to be”—was a convenient first test for the GA in Chapter 9, training a perceptron to categorize points as being on one side of a line versus the other will be a valuable way to demonstrate the algorithm of the perceptron and verify that it’s working properly.
To solve this problem, I’ll give my perceptron two inputs: x0x_0 is the x-coordinate of a point, representing a plant’s amount of sunlight, and x1x_1 is the y-coordinate of that point, representing the plant’s amount of water. The perceptron then guesses the plant’s classification according to the sign of the weighted sum of these inputs. If the sum is positive, the perceptron outputs a +1, signifying a hydrophyte (above the line). If the sum is negative, it outputs a –1, signifying a xerophyte (below the line). Figure 10.5 shows this perceptron (note the shorthand of w0w_0 and w1w_1 for the weights).
Figure 10.5: A perceptron with two inputs (x0x_0 and x1x_1), a weight for each input (w0w_0 and w1w_1), and a processing neuron that generates the output
This scheme has a pretty significant problem, however. What if my data point is (0, 0), and I send this point into the perceptron as inputs x0=0x_0 = 0 and x1=0x_1=0? No matter what the weights are, multiplication by 0 is 0. The weighted inputs are therefore still 0, and their sum will be 0 too. And the sign of 0 is . . . hmmm, there’s that deep philosophical quandary again. Regardless of how I feel about it, the point (0, 0) could certainly be above or below various lines in a 2D world. How is the perceptron supposed to interpret it accurately?
To avoid this dilemma, the perceptron requires a third input, typically referred to as a bias input. This extra input always has the value of 1 and is also weighted. Figure 10.6 shows the perceptron with the addition of the bias.
Figure 10.6: Adding a bias input, along with its weight, to the perceptron
How does this affect point (0, 0)?
| Input | Weight | Result |
|---|---|---|
| 0 | w0w_0 | 0 |
| 0 | w1w_1 | 0 |
| 1 | wbiasw_\text{bias} | wbiasw_\text{bias} |
The output is then the sum of the weighted results: 0+0+wbias0 + 0 + w_\text{bias}. Therefore, the bias by itself answers the question of where (0, 0) is in relation to the line. If the bias’s weight is positive, (0, 0) is above the line; if negative, it’s below. The extra input and its weight bias the perceptron’s understanding of the line’s position relative to (0, 0)!
The Perceptron Code
I’m now ready to assemble the code for a Perceptron class. The perceptron needs to track only the input weights, which I can store using an array:
class Perceptron {
constructor() {
this.weights = [];
}
The constructor can receive an argument indicating the number of inputs (in this case, three: x0x_0, x1x_1, and a bias) and size the weights array accordingly, filling it with random values to start:
constructor(n) {
this.weights = [];
for (let i = 0; i < n; i++) {
this.weights[i] = random(-1, 1);
}
}
A perceptron’s job is to receive inputs and produce an output. These requirements can be packaged together in a feedForward() method. In this example, the perceptron’s inputs are an array (which should be the same length as the array of weights), and the output is a number, +1 or –1, as returned by the activation function based on the sign of the sum:
feedForward(inputs) {
let sum = 0;
for (let i = 0; i < this.weights.length; i++) {
sum += inputs[i] * this.weights[i];
}
return this.activate(sum);
}
}
Presumably, I could now create a Perceptron object and ask it to make a guess for any given point, as in Figure 10.7.
Figure 10.7: An (x, y) coordinate from the 2D space is the input to the perceptron.
Here’s the code to generate a guess:
let perceptron = new Perceptron(3);
let inputs = [50, -12, 1];
let guess = perceptron.feedForward(inputs);
Did the perceptron get it right? Maybe yes, maybe no. At this point, the perceptron has no better than a 50/50 chance of arriving at the correct answer, since each weight starts out as a random value. A neural network isn’t a magic tool that can automatically guess correctly on its own. I need to teach it how to do so!
To train a neural network to answer correctly, I’ll use the supervised learning method I described earlier in the chapter. Remember, this technique involves giving the network inputs with known answers. This enables the network to check whether it has made a correct guess. If not, the network can learn from its mistake and adjust its weights. The process is as follows:
-
Provide the perceptron with inputs for which there is a known answer.
-
Ask the perceptron to guess an answer.
-
Compute the error. (Did it get the answer right or wrong?)
-
Adjust all the weights according to the error.
-
Return to step 1 and repeat!
This process can be packaged into a method on the Perceptron class, but before I can write it, I need to examine steps 3 and 4 in more detail. How do I define the perceptron’s error? And how should I adjust the weights according to this error?
The perceptron’s error can be defined as the difference between the desired answer and its guess:
error=desired output−guess output\text{error} = \text{desired output} - \text{guess output}
Does this formula look familiar? Think back to the formula for a vehicle’s steering force that I worked out in Chapter 5:
steering=desired velocity−current velocity\text{steering} = \text{desired velocity} - \text{current velocity}
This is also a calculation of an error! The current velocity serves as a guess, and the error (the steering force) indicates how to adjust the velocity in the correct direction. Adjusting a vehicle’s velocity to follow a target is similar to adjusting the weights of a neural network toward the correct answer.
For the perceptron, the output has only two possible values: +1 or –1. Therefore, only three errors are possible. If the perceptron guesses the correct answer, the guess equals the desired output and the error is 0. If the correct answer is –1 and the perceptron guessed +1, then the error is –2. If the correct answer is +1 and the perceptron guessed –1, then the error is +2. Here’s that process summarized in a table:
| Desired | Guess | Error |
|---|---|---|
| –1 | –1 | 0 |
| –1 | +1 | –2 |
| +1 | –1 | +2 |
| +1 | +1 | 0 |
The error is the determining factor in how the perceptron’s weights should be adjusted. For any given weight, what I’m looking to calculate is the change in weight, often called Δweight\Delta\text{weight} (or delta weight, Δ\Delta being the Greek letter delta):
new weight=weight+Δweight\text{new weight} = \text{weight} + \Delta\text{weight}
To calculate Δweight\Delta\text{weight}, I need to multiply the error by the input:
Δweight=error×input\Delta\text{weight} = \text{error} \times \text{input}
Therefore, the new weight is calculated as follows:
new weight=weight+error×input\text{new weight} = \text{weight} + \text{error} \times \text{input}
To understand why this works, think again about steering. A steering force is essentially an error in velocity. By applying a steering force as an acceleration (or Δvelocity\Delta\text{velocity}), the velocity is adjusted to move in the correct direction. This is what I want to do with the neural network’s weights. I want to adjust them in the right direction, as defined by the error.
With steering, however, I had an additional variable that controlled the vehicle’s ability to steer: the maximum force. A high maximum force allowed the vehicle to accelerate and turn quickly, while a lower force resulted in a slower velocity adjustment. The neural network will use a similar strategy with a variable called the learning constant:
new weight=weight+(error×input)×learning constant\text{new weight} = \text{weight} + (\text{error} \times \text{input}) \times \text{learning constant}
A high learning constant causes the weight to change more drastically. This may help the perceptron arrive at a solution more quickly, but it also increases the risk of overshooting the optimal weights. A small learning constant will adjust the weights more slowly and require more training time, but will allow the network to make small adjustments that could improve overall accuracy.
Assuming the addition of a learningConstant property to the Perceptron class, I can now write a training method for the perceptron following the steps I outlined earlier:
train(inputs, desired) {
let guess = this.feedforward(inputs);
let error = desired - guess;
for (let i = 0; i < this.weights.length; i++) {
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
}
}
Here’s the Perceptron class as a whole:
class Perceptron {
constructor(totalInputs) {
this.weights = [];
this.learningConstant = 0.01;
for (let i = 0; i < totalInputs; i++) {
this.weights[i] = random(-1, 1);
}
}
feedforward(inputs) {
let sum = 0;
for (let i = 0; i < this.weights.length; i++) {
sum += inputs[i] * this.weights[i];
}
return this.activate(sum);
}
activate(sum) {
if (sum > 0) {
return 1;
} else {
return -1;
}
}
train(inputs, desired) {
let guess = this.feedforward(inputs);
let error = desired - guess;
for (let i = 0; i < this.weights.length; i++) {
this.weights[i] = this.weights[i] + error * inputs[i] * this.learningConstant;
}
}
}
To train the perceptron, I need a set of inputs with known answers. However, I don’t happen to have a real-world dataset (or time to research and collect one) for the xerophytes and hydrophytes scenario. In truth, though, the purpose of this demonstration isn’t to show you how to classify plants. It’s about how a perceptron can learn whether points are above or below a line on a graph, and so any set of points will do. In other words, I can just make up the data.
What I’m describing is an example of synthetic data, artificially generated data that’s often used in machine learning to create controlled scenarios for training and testing. In this case, my synthetic data will consist of a set of random input points, each with a known answer indicating whether the point is above or below a line. To define the line and generate the data, I’ll use simple algebra. This approach allows me to clearly demonstrate the training process and show how the perceptron learns.
The question therefore becomes, how do I pick a point and know whether it’s above or below a line (without a neural network, that is)? A line can be described as a collection of points, where each point’s y-coordinate is a function of its x-coordinate:
y=f(x)y = f(x)
For a straight line (specifically, a linear function), the relationship can be written like this:
y=mx+by = mx + b
Here m is the slope of the line, and b is the value of y when x is 0 (the y-intercept). Here’s a specific example, with the corresponding graph in Figure 10.8.
y=12x−1y = \frac{1}2x - 1
Figure 10.8: A graph of y=12x−1y = \frac{1}2x - 1
I’ll arbitrarily choose that as the equation for my line, and write a function accordingly:
function f(x) {
return 0.5 * x - 1;
}
Now there’s the matter of the p5.js canvas defaulting to (0, 0) in the top-left corner with the y-axis pointing down. For this discussion, I’ll assume I’ve built the following into the code to reorient the canvas to match a more traditional Cartesian space.
translate(width / 2, height / 2);
scale(1, -1);
I can now pick a random point in the 2D space:
let x = random(-100, 100);
let y = random(-100, 100);
How do I know if this point is above or below the line? The line function f(x) returns the y value on the line for that x-position. I’ll call that yliney_\text{line}:
let yline = f(x);
If the y value I’m examining is above the line, it will be greater than yliney_\text{line}, as in Figure 10.9.
Figure 10.9: If yliney_\text{line} is less than y, the point is above the line.
Here’s the code for that logic:
let desired = -1;
if (y > yline) {
desired = 1;
}
I can then make an input array to go with the desired output:
let trainingInputs = [x, y, 1];
Assuming that I have a perceptron variable, I can train it by providing the inputs along with the desired answer:
perceptron.train(trainingInputs, desired);
If I train the perceptron on a new random point (and its answer) for each cycle through draw(), it will gradually get better at classifying the points as above or below the line.
let perceptron;
let training = [];
let count = 0;
function f(x) {
return 0.5 * x + 1;
}
function setup() {
createCanvas(640, 240);
perceptron = new Perceptron(3, 0.0001);
for (let i = 0; i < 2000; i++) {
let x = random(-width / 2, width / 2);
let y = random(-height / 2, height / 2);
training[i] = [x, y, 1];
}
}
function draw() {
background(255);
translate(width / 2, height / 2);
scale(1, -1);
stroke(0);
strokeWeight(2);
line(-width / 2, f(-width / 2), width / 2, f(width / 2));
let x = training[count][0];
let y = training[count][1];
let desired = -1;
if (y > f(x)) {
desired = 1;
}
perceptron.train(training[count], desired);
count = (count + 1) % training.length;
for (let dataPoint of training) {
let guess = perceptron.feedforward(dataPoint);
if (guess > 0) {
fill(127);
} else {
fill(255);
}
strokeWeight(1);
stroke(0);
circle(dataPoint[0], dataPoint[1], 8);
}
}
In Example 10.1, the training data is visualized alongside the target solution line. Each point represents a piece of training data, and its color is determined by the perceptron’s current classification—gray for +1 or white for –1. I use a small learning constant (0.0001) to slow down how the system refines its classifications over time.
An intriguing aspect of this example lies in the relationship between the perceptron’s weights and the characteristics of the line dividing the points—specifically, the line’s slope and y-intercept (the m and b in y = mx + b). The weights in this context aren’t just arbitrary or “magic” values; they bear a direct relationship to the geometry of the dataset. In this case, I’m using just 2D data, but for many machine learning applications, the data exists in much higher-dimensional spaces. The weights of a neural network help navigate these spaces, defining hyperplanes or decision boundaries that segment and classify the data.
Exercise 10.1
Modify the code from Example 10.1 to also draw the perceptron’s current decision boundary during the training process—its best guess for where the line should be. Hint: Use the perceptron’s current weights to calculate the line’s equation.
While this perceptron example offers a conceptual foundation, real-world datasets often feature more diverse and dynamic ranges of input values. For the simplified scenario here, the range of values for x is larger than that for y because of the canvas size of 640×\times240. Despite this, the example still works—after all, the sign activation function doesn’t rely on specific input ranges, and it’s such a straightforward binary classification task.
However, real-world data often has much greater complexity in terms of input ranges. To this end, data normalization is a critical step in machine learning. Normalizing data involves mapping the training data to ensure that all inputs (and outputs) conform to a uniform range—typically 0 to 1, or perhaps –1 to 1. This process can improve training efficiency and prevent individual inputs from dominating the learning process. In the next section, using the ml5.js library, I’ll build data normalization into the process.
Exercise 10.2
Instead of using supervised learning, can you train the neural network to find the right weights by using a GA?
Exercise 10.3
Incorporate data normalization into the example. Does this improve the learning efficiency?
Putting the “Network” in Neural Network
A perceptron can have multiple inputs, but it’s still just a single, lonely neuron. Unfortunately, that limits the range of problems it can solve. The true power of neural networks comes from the network part. Link multiple neurons together and you’re able to solve problems of much greater complexity.
If you read an AI textbook, it will say that a perceptron can solve only linearly separable problems. If a dataset is linearly separable, you can graph it and classify it into two groups simply by drawing a straight line (see Figure 10.10, left). Classifying plants as xerophytes or hydrophytes is a linearly separable problem.
Figure 10.10: Data points that are linearly separable (left) and data points that are nonlinearly separable, as a curve is required to separate the points (right)
Now imagine you’re classifying plants according to soil acidity (x-axis) and temperature (y-axis). Some plants might thrive in acidic soils but only within a narrow temperature range, while other plants prefer less acidic soils but tolerate a broader range of temperatures. A more complex relationship exists between the two variables, so a straight line can’t be drawn to separate the two categories of plants, acidophilic and alkaliphilic (see Figure 10.10, right). A lone perceptron can’t handle this type of nonlinearly separable problem. (Caveat here: I’m making up these scenarios. If you happen to be a botanist, please let me know if I’m anywhere close to reality.)
One of the simplest examples of a nonlinearly separable problem is XOR (exclusive or). This is a logical operator, similar to the more familiar AND and OR. For A AND *B *to be true, both A and B must be true. With OR, either A or B (or both) can be true. These are both linearly separable problems. The truth tables in Figure 10.11 show their solution space. Each true or false value in the table shows the output for a particular combination of true or false inputs. See how you can draw a straight line to separate the true outputs from the false ones?
Figure 10.11: Truth tables for the AND and OR logical operators. The true and false outputs can be separated by a line.
The XOR operator is the equivalent of (OR) AND (NOT AND). In other words, A XOR *B *evaluates to true only if one of the inputs is true. If both inputs are false or both are true, the output is false. To illustrate, let’s say you’re having pizza for dinner. You love pineapple on pizza, and you love mushrooms on pizza, but put them together—yech! And plain pizza, that’s no good either!
The XOR truth table in Figure 10.12 isn’t linearly separable. Try to draw a straight line to separate the true outputs from the false ones—you can’t!
Figure 10.12: The truth tables for whether you want to eat the pizza (left) and XOR (right). Note how the true and false outputs can’t be separated by a single line.
The fact that a perceptron can’t even solve something as simple as XOR may seem extremely limiting. But what if I made a network out of two perceptrons? If one perceptron can solve the linearly separable OR and one perceptron can solve the linearly separate NOT AND, then two perceptrons combined can solve the nonlinearly separable XOR.
When you combine multiple perceptrons, you get a multilayered perceptron, a network of many neurons (see Figure 10.13). Some are input neurons and receive the initial inputs, some are part of what’s called a hidden layer (as they’re connected to neither the inputs nor the outputs of the network directly), and then there are the output neurons, from which the results are read.
Up until now, I’ve been visualizing a singular perceptron with one circle representing a neuron processing its input signals. Now, as I move on to larger networks, it’s more typical to represent all the elements (inputs, neurons, outputs) as circles, with arrows that indicate the flow of data. In Figure 10.13, you can see the inputs and bias flowing into the hidden layer, which then flows to the output.
Figure 10.13: A multilayered perceptron has the same inputs and output as the simple perceptron, but now it includes a hidden layer of neurons.
Training a simple perceptron is pretty straightforward: you feed the data through and evaluate how to change the input weights according to the error. With a multilayered perceptron, however, the training process becomes more complex. The overall output of the network is still generated in essentially the same manner as before: the inputs multiplied by the weights are summed and fed forward through the various layers of the network. And you still use the network’s guess to calculate the error (desired result – guess). But now so many connections exist between layers of the network, each with its own weight. How do you know how much each neuron or connection contributed to the overall error of the network, and how it should be adjusted?
The solution to optimizing the weights of a multilayered network is backpropagation. This process takes the error and feeds it backward through the network so it can adjust the weights of all the connections in proportion to how much they’ve contributed to the total error. The details of backpropagation are beyond the scope of this book. The algorithm uses a variety of activation functions (one classic example is the sigmoid function) as well as some calculus. If you’re interested in continuing down this road and learning more about how backpropagation works, you can find my “Toy Neural Network” project at the Coding Train website with accompanying video tutorials. They go through all the steps of solving XOR using a multilayered feed-forward network with backpropagation. For this chapter, however, I’d instead like to get some help and phone a friend.
Machine Learning with ml5.js
That friend is ml5.js. This machine learning library can manage the details of complex processes like backpropagation so you and I don’t have to worry about them. As I mentioned earlier in the chapter, ml5.js aims to provide a friendly entry point for those who are new to machine learning and neural networks, while still harnessing the power of Google’s TensorFlow.js behind the scenes.
To use ml5.js in a sketch, you must import it via a <script> element in your index.html file, much as you did with Matter.js and Toxiclibs.js in Chapter 6:
HTML
<script src="https://unpkg.com/ml5@1/dist/ml5.min.js"></script>
My goal for the rest of this chapter is to introduce ml5.js by developing a system that can recognize mouse gestures. This will prepare you for Chapter 11, where I’ll add a neural network “brain” to an autonomous steering agent and tie machine learning back into the story of the book. First, however, I’d like to talk more generally through the steps of training a multilayered neural network model using supervised learning. Outlining these steps will highlight important decisions you’ll have to make before developing a learning model, introduce the syntax of the ml5.js library, and provide you with the context you’ll need before training your own machine learning models.
The Machine Learning Life Cycle
The life cycle of a machine learning model is typically broken into seven steps:
-
Collect the data. Data forms the foundation of any machine learning task. This stage might involve running experiments, manually inputting values, sourcing public data, or a myriad of other methods (like generating synthetic data).
-
Prepare the data. Raw data often isn’t in a format suitable for machine learning algorithms. It might also have duplicate or missing values, or contain outliers that skew the data. Such inconsistencies may need to be manually adjusted. Additionally, as I mentioned earlier, neural networks work best with normalized data, which has values scaled to fit within a standard range. Another key part of preparing data is separating it into distinct sets: training, validation, and testing. The training data is used to teach the model (step 4), while the validation and testing data (the distinction is subtle—more on this later) are set aside and reserved for evaluating the model’s performance (step 5).
-
Choose a model. Design the architecture of the neural network. Different models are more suitable for certain types of data and outputs.
-
Train the model. Feed the training portion of the data through the model and allow the model to adjust the weights of the neural network based on its errors. This process is known as optimization: the model tunes the weights so they result in the fewest number of errors.
-
Evaluate the model. Remember the testing data that was set aside in step 2? Since that data wasn’t used in training, it provides a means to evaluate how well the model performs on new, unseen data.
-
Tune the parameters. The training process is influenced by a set of parameters (often called hyperparameters) such as the learning rate, which dictates how much the model should adjust its weights based on errors in prediction. I called this the
learningConstantin the perceptron example. By fine-tuning these parameters and revisiting steps 4 (training), 3 (model selection), and even 2 (data preparation), you can often improve the model’s performance. -
**Deploy the model. **Once the model is trained and its performance is evaluated satisfactorily, it’s time to use the model out in the real world with new data!
These steps are the cornerstone of supervised machine learning. However, even though 7 is a truly excellent number, I think I missed one more critical step. I’ll call it step 0.
- Identify the problem. This initial step defines the problem that needs solving. What is the objective? What are you trying to accomplish or predict with your machine learning model?
This zeroth step informs all the other steps in the process. After all, how are you supposed to collect your data and choose a model without knowing what you’re even trying to do? Are you predicting a number? A category? A sequence? Is it a binary choice, or are there many options? These sorts of questions often boil down to choosing between two types of tasks that the majority of machine learning applications fall into: classification and regression.
Classification and Regression
Classification is a type of machine learning problem that involves predicting a label (also called a category or class) for a piece of data. If this sounds familiar, that’s because it is: the simple perceptron in Example 10.1 was trained to classify points as above or below a line. To give another example, an image classifier might try to guess if a photo is of a cat or a dog and assign the corresponding label (see Figure 10.14).
Figure 10.14: Labeling images as cats or dogs
Classification doesn’t happen by magic. The model must first be shown many examples of dogs and cats with the correct labels in order to properly configure the weights of all the connections. This is the training part of supervised learning.
The classic “Hello, world!” demonstration of machine learning and supervised learning is a classification problem of the MNIST dataset. Short for Modified National Institute of Standards and Technology, MNIST is a dataset that was collected and processed by Yann LeCun (Courant Institute, NYU), Corinna Cortes (Google Labs), and Christopher J.C. Burges (Microsoft Research). Widely used for training and testing in the field of machine learning, this dataset consists of 70,000 handwritten digits from 0 to 9; each is a 28×\times28-pixel grayscale image (see Figure 10.15 for examples). Each image is labeled with its corresponding digit.
Figure 10.15: A selection of handwritten digits 0–9 from the MNIST dataset (courtesy of Suvanjanprasai)
MNIST is a canonical example of a training dataset for image classification: the model has a discrete number of categories to choose from (10 to be exact—no more, n