Published on January 4, 2026 12:58 AM GMT
This post continues from the concepts in Zoom Out: Distributions in Semantic Spaces. I will be considering semantic spaces (input, output, and latent) from an informal topological perspective.
Topology and geometry
An extremely terse explanation of topology is that it is the math focused on what it means for a space to be continuous, abstracted from familiar geometric properties. You may be familiar with the example that the surface of a doughnut is homeomorphic to the surface of a coffee mug. What this means is that any image you could put on the surface of a doughnut could be put on the surface of a mug withou…
Published on January 4, 2026 12:58 AM GMT
This post continues from the concepts in Zoom Out: Distributions in Semantic Spaces. I will be considering semantic spaces (input, output, and latent) from an informal topological perspective.
Topology and geometry
An extremely terse explanation of topology is that it is the math focused on what it means for a space to be continuous, abstracted from familiar geometric properties. You may be familiar with the example that the surface of a doughnut is homeomorphic to the surface of a coffee mug. What this means is that any image you could put on the surface of a doughnut could be put on the surface of a mug without changing which points of the image connect to which others. The image will be warped, parts of it getting stretched, scaled, or squished, but those are all geometric properties, not topological properties.
Applying this idea to semantic spaces gives the hypothetical idea that, for a neural network, the input space may be homeomorphic to the output space.
For example, looking at the cat-dog-labeling net again, the input is the space of possible images, and within that space is the continuous distribution of images of cats and/or dogs. The distribution is continuous because some images may contain both cats and dogs, while other images may contain animals that look ambiguous, maybe a cat, maybe a dog. It is possible that this same distribution from the net’s input space is also found in the net’s output space.
Geometrically the distribution would be different, since the geometry of the input space maps dimensions to rgb pixels while the output space maps the dimensions to whether the image is of a cat or a dog. But these are geometric properties; topologically the distribution could be the same, meaning that for any path you could move along in the cat-dog distribution, you can move along that exact same path in the image space (input) and the label space (output). Furthermore, any other space that looks at cat-dog space from any other geometric perspective also contains that same path.
The distribution is the same in every space, but the geometry in which it is embedded allows us to see different aspects of that distribution.
Each layer does nothing or throws something away
But… that isn’t quite true, because it is possible for a network to perform geometric transformations that lose some of the information that existed in the input distribution. I identify two possible ways to lose information:
Projecting into lower dimensional spaces
The first way of losing information is projecting into a lower dimensional space. Imagine squishing a sphere in 3d space into a circle in 2d space so that the two sides of the sphere meet. It is now impossible to recover which side of the sphere a point within the circle came from.
To justify this in our cat distribution example, suppose I fix a cat in a specific pose at a specific distance from a camera and view the cat from every possible angle. (This is a thought experiment, please do not try this with a real cat.) The space of possible angles defines a 4d hypersphere. Normalizing so the cat is always upright gives us a 3d sphere, which is easier to think about.
Now, just as before, this sphere from image space may be projected down to a circle, line, or point in the labelling space. It may be that some angles make the cat look more or less dog like, so it still spans some of the labelling space, rather than being projected to a single point representing confidence in the “cat” label[1].
But regardless of the exact details of how information is lost while projecting, it is no longer possible to recover which exact image, in image space, corresponds to a given position in labelling space. Instead, each point in labelling space corresponds to a region in image space.
In a neural network, projection to a lower dimensional space occurs whenever a layer has more inputs than outputs[2].
Folding space on itself
The other way to lose information is to fold space on itself. This is what is happening in all examples of projecting into a lower dimensional space, but it is worth examining this this more general case because of how it relates to activation functions.
If an activation function is monotonic, as in leaky ReLU, then the space will not fold on itself, and so information is not lost. The input and output of the function will be homeomorphic. If, on the other hand, an activation function is non-monotonic, as in ReLU, then some unrelated parts of the input space will get folded together into the same parts of the output space[3].
Interpolated geometric transformation
Hopefully this is making sense and you are understanding how a semantic space could be the same topological space in the input and output of a neural network, even though it is very different geometrically.
I’ll now state a hypothesis for the purpose of personal exploration:
| Claim 1: The series of latent spaces between a network’s input and output space are geometric transformations, of roughly equal distance, along a continuous path between the geometry of the input space and the geometry of the output space. |
This might be the sort of thing that someone has proven mathematically. If so, I’d like to find and understand their proof. Alternatively, this may be a novel direction of thought. If you are interested, please reach out or ask questions.
Input subspace isomorphism
A weaker claim I can make based on the topological examination of semantic space is:
| Claim 2: The output space of any neural network is homeomorphic to some subspace of the input space. |
I find it difficult to imagine that this claim isn’t true, but as before, I’m not aware of a proof. Regardless, I think the implications for thinking about networks as mapping semantic topologies from one geometric space to another are quite interesting.
What’s it mean for interpretability?
One implication I noticed about claim 1 is that it suggests how the difficulty of interpreting latent spaces should relate to the difficulty of interpreting input and output spaces.
I suspect lower dimensional spaces are usually easier to interpret than higher dimensional spaces, but let’s set that aside for now.
Image space is easy to interpret as images, but difficult to interpret in terms of labels, so I imagine layers closer to an image space would be easier to interpret in some visual way. On the other hand, layers closer to label space should be harder to interpret visually, but easier to interpret in terms of labels.
This leaves aside the question of “what is a space that is halfway between being an image and being a label?” I think that is an interesting question to explore, but implies that layers halfway between modalities will likely be unfamiliar, and therefore hardest to interpret.
This idea of semantically similar spaces implies a possible failure mode for inquiring about latent spaces: Trying to analyze them with the incorrect modality. For example, one might want to apply labels to features in spaces with visual semantic geometry without realizing this is as difficult a problem as applying labels to features in the image space itself. To say that another way, if you do not expect to be able to find meaningful feature directions in the high dimensional space of rgb image pixels, you should not necessarily expect to find meaningful feature directions in the latent space from the activations of the first layer of a network processing that image space[4].
Even though claim 1 and claim 2 imply some interpretability work may be more difficult than expected, the claims also imply hope that there should be a familiar (topological) structure inside every semantic space. I think Mingwei Li’s Toward Comparing DNNs with UMAP Tour, provides a very compelling exploration of this idea. If you haven’t already viewed the UMAP Tour, I highly recommend it.
Other reasons the distribution may span some amount of labelling space include the way the net is trained, or the architecture of the net, rather than any property of the idealized cat-dog distribution.
Note, just because a sphere is 3d doesn’t mean it can’t be projected down to 2d and reconstructed into 3d. After all, world maps exist, but this is just reconstructing the surface of the sphere embedded in 3d, it isn’t possible to reconstruct the entirety of the 3d space once projected to 2d, or more generally, to reconstruct (n)d once projected to (n-1)d.
Specifically all orthants with any negative components will be folded into the subspace between themselves and the orthant with only positive components.
What you should expect to see are things more like “colours” as I discuss in N Dimensional Interactive Scatter Plot (ndisp)
Discuss