Souvenir · Scour

On Monday, June 6th, 2011, after Steve Jobs’ last public appearance as a keynote speaker, took place the “Developer Tools Kickoff” session at Apple’s annual Worldwide Developers Conference, also known as WWDC. That day, Chris Lattner, creator of the LLVM compiler infrastructure and the Swift programming language, introduced a new feature of the Objective-C language to thunderous applause. This feature, still present in Swift, is known as “Automatic Reference Counting”, or ARC.

Thankfully we can refer to the [transcript](https://nonstrict.eu/wwdcindex/wwdc2011/300/#:~:text=Thank%20you%2C%20Andreas.%20So,this%20…

Thankfully we can refer to the transcript of Chris Lattner’s words that day:

So Arc is based on a really simple idea. Let’s take all the advantages of retain and release programming without those little disadvantages, like having to actually call retain and release. So to do this, we’re taking memory management and pulling it into Objective-C, the language. Well, what does that mean? Well, that means primarily that retain and release are just going away. You don’t have to worry about this anymore.

What is that “retain and release” thing Mr. Lattner was talking about? In short, yet another approach to make software less forgetful. Because since the dawn of time, developers have had to come up with curious mechanisms to keep track of things scattered in the memory or their computers. Entire manuals have been published about memory management. Whole magazine issues were devoted to the subject. Countless papers have been reviewed. Videos have been produced.

This single task of managing memory has proven to be one of the most difficult, let alone to grasp and understand, but most importantly, to get right.

Because not getting this right meant crashes, security issues, resource shortages, unhappy customers, and lots of white hair. To make things worse, pretty much every programming language comes these days with their own ideas of how to keep track of things on the heap.

Yes, the heap. Have you not heard about it? One of the first things that puzzles younger software developers the most when exposed to lower-level programming languages (read: compiled or non-scripting) is the fact that memory is “segmented”, with two major sections called “stack” and “heap”, each with their own shenanigans and whims. Sometimes there are even other sections called “static” and “text”, and at that point, brains are spinning out of control.

A Segment About Segments

Let us just focus on the stack and the heap, shall we?

Users double-click on the icon of your application; what is an operating system to do in those cases? Well, it allocates a certain amount of memory for the process, and copies the code of the program on the “text” segment, giving the CPU the information required to start executing it. The rest of the memory allocated to the program is roughly split into two other segments, called “stack” and “heap”.

(I am totally aware of the tremendous oversimplification of this scenario. Bear with me.)

Stacks are managed in quite an automatic way; every time your program encounters an opening curly bracket (let me simplify things here, please) your program will add a new “frame” to that stack. All variables defined in that section of your program (usually a function or subroutine) will go there, and as soon as you return from that subsection, the stack frame is “popped” and all of those values are gone from memory.

This is very convenient, and it consists of an automatic form of memory management; just put stuff on the stack, and let the system get rid of those things after your calculation took place. The problem is that the stack is not very big, so if you really need a lot of memory… well, you might encounter a “stack overflow” very quickly.

Enter the heap. That is the other segment available to your application; it is vast and wild and free, and you can put whatever you want in it: arrays, documents, videos, large language models, you name it.

But here is the catch: the heap is dark, dauntingly dark, just like an open field in the middle of a stormy night, and you had better have a torch at hand before leaving the cabin.

Or, in computer terms, you had better have a pointer on the stack to find your stuff. Lose that torch pointer, and you will encounter that good old fiend called the “memory leak”. All of a sudden, not only is your object lost in that darkness, you also have less memory available; because the all-powerful operating system will respect your memory space, and will not clean it before your application exits.

Given this conundrum, programming languages since the dawn of time (well, since the 1950s, really) have tried to provide mechanisms to keep track of those things you left in the wilderness of the heap.

Manual Management: C, C++, & Turbo Pascal

The C programming language comes up together with Unix at the beginning of the roaring seventies, with its malloc() and free() functions, accompanied by the eternal rule that “for every malloc there must be a free somewhere”.

C is like a mountain ranger giving you access to the whole Heap National Park together with a map and a torch, but it is your responsibility to keep track of what is where. Not only that, but it gave you several flavors of malloc to use depending on your needs: calloc, alloca, realloc, reallocarray, emalloc, ecalloc, and if that was not enough, you could probably use jemalloc instead.

A decade later C++ added the new and delete keywords to the mix (featuring the same eternal rule mentioned above), followed by Turbo Pascal’s own New() and Dispose() functions (same eternal rule applies here), this time for keeping track of those things called “objects” uncontrollably popping up on the heap.

(Of course, objects created on the stack, albeit small, were taken care of automatically. Nothing to see here.)

But for C++ developers, all of a sudden memory management became even more complicated, because you had to remember to make your destructors virtual and also to beware of leaks due to exceptions, and if this last sentence does not make sense, well, you should consider yourself lucky.

(Honorable mention to all those C++ developers who went down the rabbit hole of implementing their own allocators, overriding the new keyword for whatever reason. Another shoutout: this time to C++ developers who had to write and debug COM objects, featuring the all-powerful IUnkwown interface and the AddRef and Release functions therein (we will get back to this idea later). I hope you are all doing great. Oh, by the way, did you know that you could develop COM objects on the Mac back in the day? I bet you did not.)

Not satisfied with C++ being already complicated enough, template metaprogramming (also called “Generics” in some modern languages) came into the scene in the nineties, and with it, we not only got unreadable compiler errors, but also some new ideas to manage memory: in this case, “smart pointers”. A name that is highly ironic if you think about it, but quite appropriate at the same time.

The result is that if you read the latest C++ best practices, you will hardly see a single new or delete statement thrown around; they say you should be using unique_ptr, shared_ptr, and weak_ptr instead. Those smart pointers are kept in the stack, and as soon as their surrounding frame is gone, poof, their associated object in the heap is also automatically removed. Smart, huh? But these days C++ also comes bundled with other things called “move semantics” and RAII, and seriously, let us not go down this rabbit hole.

Garbage Collection: Java, C#, Go, & D

In lieu of all this madness, around 1995 Java mass-marketed the (already existing) idea of a “garbage collector”, one that was a staple of languages like Lisp and Smalltalk. Just like the name implies, a garbage collector acts like a valet with a broom, stopping the execution of the code every so often, verifying that each thing on the heap is being referenced by somebody on the stack, and wiping clean whatever is not on behalf of the running program itself.

Interestingly enough, Stroustrup never really ruled out the possibility of garbage collectors for C++, and there have even been some proposals going around, but they never got off the ground, really. However, as explained by Dennis Ritchie himself,

Similarly, C itself provides two durations of storage: “automatic” objects that exist while control resides in, or below, a procedure, and “static,” existing throughout execution of a program. Off-stack, dynamically allocated storage is provided only by a library routine and the burden of managing it is placed on the programmer: C is hostile to automatic garbage collection.

(Ritchie, Dennis M. “The Development of the C Programming Language.” In “History of Programming Languages II”, edited by Thomas J. Bergin and Richard G. Gibson. ACM, 1996. https://doi.org/10.1145/234286.1057834.)

Pay attention to this phrase in the previous paragraph: “stopping the execution”. Garbage collectors seem like a great idea, but they can be really be a PITA for many performance-sensitive applications. Those literal hiccups during execution can seem pathological after a while. And there is another hidden issue here: with Java you cannot know exactly when and object is going to be disposed of; and this is a level of control that C++ is very happy to provide to you.

To solve this issue precisely, the C# language and the .NET infrastructure chose to introduce the using keyword, in order to provide some determinism in the way resources are used and disposed of. Its usage is trivial: just wrap the code using a resource like a network or file handle, and be sure that they will be released as soon as you finish, well, using them.

Of course, just like with many other things in life, there are garbage collectors and garbage collectors. The Go programming language features one that is apparently much less obnoxious than Java’s or C#’s, and many developers are very pleased to use it, and rightfully so.

Another language that benefitted from a built-in garbage collector was D, albeit not without its share of criticism:

3.4.2 Automatic Memory Management. Walter (Bright)’s experience implementing a garbage collector for Symantec’s Java implementation had convinced him of the benefits of garbage collection and motivated him to make it an integral part of D’s initial design. (…)

Garbage collection is not a mandatory feature in D. If one does not allocate memory via new, directly call one of the GC’s allocation functions, or make use of a language feature that allocates from the GC memory pool, then no scanning of memory or collecting of garbage will ever take place. (…)

Garbage collection in D would become a perennial point of criticism, often cited as a reason to avoid the language.

(Bright, Walter, Andrei Alexandrescu, and Michael Parker. “Origins of the D Programming Language.” Proceedings of the ACM on Programming Languages 4, no. HOPL (2020): 1–38. https://doi.org/10.1145/3386323.)

(Automatic) Reference Counting: Objective-C & Swift

More or less at the same time as C++ started adding bloat keywords, Objective-C took a different approach. First, it encapsulated good old malloc and free and into things called the alloc and dealloc messages, adding a new concept to the mix: “retain counts”; second, it progressively moved all objects to the heap.

As a result, developers not only had to be good software engineers, all of a sudden they had to be good accountants as well, keeping track of some hidden value that indicated how many hands were holding a single object in memory at the same time, using the retain and release messages to respectively increase and decrease said value.

(Remember the eternal rule mentioned above? Same idea applies here: for every retain, you should have a release somewhere.)

As mentioned at the beginning of the article, and after unsuccessfully toying with the idea of a Java-like garbage collector for a couple of years (and suffering its associated backlash), Objective-C adopted in 2011 this idea called “ARC”, aka “automatic reference counting”, an evolution of the idea of retain and release but with the bookkeeping entirely managed by the compiler, thankyousomuch.

Nevertheless, and as convenient as it is, ARC introduces the problem of two objects strongly referencing one another, in which case you have a “reference cycle” and, boom, yet another memory leak. The solution is still in the hands of developers: remember to make references weak depending on “who owns who” at any given time. Finally, and unsurprisingly, this idea of ARC made its way into Swift, and quite successfully so.

Modern Approaches: Rust & Zig

During the past decade, a new breed of “modern” programming languages has brought new ideas to the drawing table of memory management techniques; most notably we will mention Rust and Zig, both born out of the powerful LLVM infrastructure created by the aforementioned Chris Lattner.

Rust has a powerful, albeit confusing at first, concept of ownership, something that reminisces of those weak references me mentioned previously. At all points in time, some variable on the stack is the owner of something somewhere else, and that ownership can be transferred to others (“borrowing”), in a controlled way with very clear rules. The compiler keeps track of all the bookkeeping, and will greet the developer with daunting yet precise error messages in case those rules are broken.

On the other side, Zig prefers to stay closer to the C programming language, operating through a family of allocators, each with specific rules, and thereby simplifying the task… and at the same time, ensuring compatibility with existing C code, a feature which Rust does offer. Worth of mention is also the defer keyword, similar to the ones available with the same name in both Go and Swift, used to remind the runtime to explicitly free some object when the current scope finishes executing.

Trends

So many different ideas to solve the same daunting problem. Keeping track of things in memory has been, historically, one of the most delicate problems developers have had to face, and every decade or so, a new solution appears on the horizon.

The trend, however, is quite clear at this point: to try to remove memory management from human hands as much as possible, preferably at compile time, but also at run time if all things fail. The tradeoffs are simple to understand: compile-time memory management makes languages more complicated to learn and understand, but provide better performance and safety; run-time memory management is the staple of simpler, easier to maintain source code, but at the cost of performance penalties.

Pick your battles wisely, and let somebody else manage memory for you, will you?

On the acclaimed 2000 album “The Night” by the former jazz band Morphine, the raucous and remote voice of the late Mark Sandman evokes the anguish of forgetfulness in a song aptly named “Souvenir”.

I remember meeting you, we were super low Surrounded by the sounds of saxophones And I remember being this close, but never alone You gave me a little something to take home I dropped it on the floor I dropped it on the floor Dropped it on the floor I dropped it.

If I can only remember the name that’s enough for me because names hold the key.

Somehow the lyrics (and, why not, the somber tones of the music behind them) feel eerily familiar to this programmer, who has had to hunt memory leaks at various points in his career.

Cover photo by Artem Maltsev on Unsplash.

Continue reading Ryan Baker or go back to Issue 085: Memory Management. Download this issue as a PDF or EPUB file and read it on your preferred device. Did you like this article? Consider contributing to the sustainability of this magazine. Thanks!