Book Version
Dive into Systems — Version 1.2
Copyright
© 2020 Dive into Systems, LLC
License: CC BY-NC-ND 4.0
Disclaimer
The authors made every effort to ensure that the information in this book was correct. The programs in this book have been included for instructional purposes only. The authors do not offer any warranties with respect to the programs or contents of this book. The authors do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause.
The views expressed in this book are those of the authors and do not reflect the official policy or position of the Department of the Army, Department of Defense, or …
Book Version
Dive into Systems — Version 1.2
Copyright
© 2020 Dive into Systems, LLC
License: CC BY-NC-ND 4.0
Disclaimer
The authors made every effort to ensure that the information in this book was correct. The programs in this book have been included for instructional purposes only. The authors do not offer any warranties with respect to the programs or contents of this book. The authors do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause.
The views expressed in this book are those of the authors and do not reflect the official policy or position of the Department of the Army, Department of Defense, or the U.S. Government.
Acknowledgements
The authors would like to acknowledge the following individuals for helping make Dive into Systems a success:
Formal Reviewers
Each chapter in Dive into Systems was peer-reviewed by several CS professors around the United States. We are extremely grateful to those faculty who served as formal reviewers. Your insight, time, and recommendations have improved the rigor and precision of Dive into Systems. Specifically, we would like to acknowledge the contributions of:
Jeannie Albrecht (Williams College) for her review and feedback on Chapter 15.
John Barr (Ithaca College) for his review and feedback on chapters 6, 7, and 8, and providing general advice for the x86_64 chapter.
Jon Bentley for providing review and feedback on section 5.1, including line-edits.
Anu G. Bourgeois (Georgia State University) for her review and feedback on Chapter 4.
Martina Barnas (Indiana University Bloomington) for her review and insightful feedback on Chapter 14, especially section 14.4.
David Bunde (Knox College) for his review, comments and suggestions on Chapter 14.
Stephen Carl (Sewanee: The University of the South) for his careful review and detailed feedback on chapters 6 and 7.
Bryan Chin (U.C. San Diego) for his insightful review of the ARM assembly chapter (chapter 9).
Amy Csizmar Dalal (Carleton College) for her review and feedback on Chapter 5.
Debzani Deb (Winston-Salem State University) for her review and feedback on Chapter 11.
Saturnino Garcia (University of San Diego) for his review and feedback on Chapter 5.
Tim Haines (University of Wisconsin) for his comments and review of Chapter 3.
Bill Jannen (Williams College) for his detailed review and insightful comments on Chapter 11.
Ben Marks (Swarthmore College) for comments on chapters 1 and 2.
Alexander Mentis (West Point) for insightful comments and line-edits of early drafts of this book.
Rick Ord (U.C. San Diego) for his review and suggested corrections for the Preface, and reviewing over 60% (!!) of the book, including chapters 0, 1, 2, 3, 4, 6, 7, 8 and 14. His feedback has helped us keep our notation and code consistent over the different chapters!
Joe Politz (U.C. San Diego) for his review and detailed suggestions for strengthening Chapter 12.
Brad Richards (University of Puget Sound) for his rapid feedback and suggestions for Chapter 12.
Kelly Shaw (Williams College) for her review and suggestions for Chapter 15.
Simon Sultana (Fresno Pacific University) for his review and suggested corrections for Chapter 1.
Cynthia Taylor (Oberlin College) for her review and suggested corrections of Chapter 13.
David Toth (Centre College) for his review and suggested corrections for Chapters 2 and 14.
Bryce Wiedenbeck (Davidson College) for his review and suggested corrections for Chapter 4.
Daniel Zingaro (University of Toronto Mississauga) for catching so many typos.
Additional Feedback
The following people caught random typos and other sundries. We are grateful for your help in finding typos!
Kevin Andrea (George Mason University)
Tanya Amert (Denison University)
Ihor Beliuha
Christiaan Biesterbosch
Daniel Canas (Wake Forest University)
Chien-Chung Shen (University of Delaware)
Vasanta Chaganti (Swarthmore College)
Stephen Checkoway (Oberlin College)
John DeGood (The College of New Jersey)
Joe Errey
Artin Farahani
Sat Garcia (University of San Diego)
Aaron Gember-Jacobson (Colgate University)
Stephen Gilbert
Arina Kazakova (Swarthmore College)
Akiel Khan
Deborah Knox (The College of New Jersey)
Kevin Lahey (Colgate University)
Raphael Matchen
Sivan Nachaum (Smith College)
Aline Normolye (Bryn Mawr College)
SaengMoung Park (Swarthmore College)
Rodrigo Piovezan (Swarthmore College)
Roy Ragsdale (West Point) who gave advice for restructuring the guessing game for the ARM buffer overflow exploit in chapter 9.
Zachary Robinson (Swarthmore College)
Joel Sommers (Colgate University)
Peter Stenger
Richard Weiss (Evergreen State College)
David Toth (Centre College)
Alyssa Zhang (Swarthmore College)
Early Adopters
An alpha release of Dive into Systems was piloted at West Point in Fall 2018; The beta release of the textbook was piloted at West Point and Swarthmore College in Spring 2019. In Fall 2019, Dive into Systems launched its Early Adopter Program, which enabled faculty around the United States to pilot the stable release of Dive into Systems at their institutions. The Early Adopter Program is a huge help to the authors, as it helps us get valuable insight into student and faculty experiences with the textbook. We use the feedback we receive to improve and strengthen the content of Dive into Systems, and are very thankful to everyone who completed our student and faculty surveys.
2019-2020 Early Adopters
The following individuals piloted Dive into Systems as a textbook at their institutions during the Fall 2019- Spring 2020 Academic Year:
John Barr (Ithaca College) - Computer Organization & Assembly Language (Comp 210)
Chris Branton (Drury University) - Computer Systems Concepts (CSCI 342)
Dick Brown (St. Olaf College) - Hardware Design (CSCI 241)
David Bunde (Knox College) - Introduction to Computing Systems (CS 214)
Bruce Char (Drexel University) - Systems Programming (CS 283)
Vasanta Chaganti (Swarthmore College) - Introduction to Computer Systems (CS 31)
Bryan Chin (U.C. San Diego) - Computer Organization and Systems Programming (CSE 30)
Stephen Carl (Sewanee: The University of the South) - Computer Systems and Organization (CSci 270)
John Dougherty (Haverford College) - Computer Organization (cs240)
John Foley (Smith College) - Operating Systems (CSC 262)
Elizabeth Johnson (Xavier University) - Programming in C
Alexander Kendrowitch (West Point) - Computer Organization (CS380)
Bill Kerney (Clovis Community College) - Assembly Programming (CSCI 45)
Deborah Knox (The College of New Jersey) - Computer Architecture (CSC 325)
Doug MacGregor (Western Colorado University) - Operating Systems/Architecture (CS 330)
Jeff Matocha (Ouachita Baptist University) - Computer Organization (CSCI 3093)
Keith Muller (U.C. San Diego) - Computer Organization and Systems Programming (CSE 30)
Crystal Peng (Park University) - Computer Architecture (CS 319)
Leo Porter (U.C. San Diego) - Introduction to Computer Architecture (CSE 141)
Lauren Provost (Simmons University) - Computer Architecture and Organization (CS 226)
Kathleen Riley (Bryn Mawr College) - Principles of Computer Organization (CMSC B240)
Roger Shore (High Point University) - Computer Systems (CSC-2410)
Tony Tong (Wheaton College, Norton MA) - Advanced Topics in Computer Science: Parallel and Distributed Computing (COMP 398)
Brian Toone (Samford University) - Computer Organization and Architecture (COSC 305)
David Toth (Centre College) - Systems Programming (CSC 280)
Bryce Wiedenbeck (Davidson College) - Computer Organization (CSC 250)
Richard Weiss (The Evergreen State College) - Computer Science Foundations: Computer Architecture (CSF)
Preface
In today’s world, much emphasis is placed on learning to code, and programming is touted as a golden ticket to a successful life. Despite all the code boot camps and programming being taught in elementary schools, the computer itself is often treated as an afterthought — it’s increasingly becoming invisible in the discussions of raising the next generations of computer scientists.
The purpose of this book is to give readers a gentle yet accessible introduction to computer systems. To write effective programs, programmers must understand a computer’s underlying subsystems and architecture. However, the expense of modern textbooks often limits their availability to the set of students that can afford them. This free online textbook seeks to make computer systems concepts accessible to everyone. It is targeted toward students with an introductory knowledge of computer science who have some familiarity with Python. If you’re looking for a free book to introduce you to basic computing principles in Python, we encourage you to read How To Think Like a Computer Scientist with Python first.
If you’re ready to proceed, please come in — the water is warm!
What This Book Is About
Our book is titled Dive into Systems and is meant to be a gentle introduction to topics in computer systems, including C programming, architecture fundamentals, assembly language, and multithreading. The ocean metaphor is very fitting for computer systems. As modern life is thought to have risen from the depths of the primordial ocean, so has modern programming risen from the design and construction of early computer architecture. The first programmers studied the hardware diagrams of the first computers to create the first programs.
Yet as life (and computing) began to wander away from the oceans from which they emerged, the ocean began to be perceived as a foreboding and dangerous place, inhabited by monsters. Ancient navigators used to place pictures of sea monsters and other mythical creatures in the uncharted waters. Here be dragons, the text would warn. Likewise, as computing has wandered ever further away from its machine-level origins, computer systems topics have often emerged as personal dragons for many computing students.
In writing this book, we hope to encourage students to take a gentle dive into computer systems topics. Even though the sea may look like a dark and dangerous place from above, there is a beautiful and remarkable world to be discovered for those who choose to peer just below the surface. So too can a student gain a greater appreciation for computing by looking below the code and examining the architectural reef below.
We are not trying to throw you into the open ocean here. Our book assumes only a CS1 knowledge and is designed to be a first exposure to many computer systems topics. We cover topics such as C programming, logic gates, binary, assembly, the memory hierarchy, threading, and parallelism. Our chapters are written to be as independent as possible, with the goal of being widely applicable to a broad range of courses.
Lastly, a major goal for us writing this book is for it to be freely available. We want our book to be a living document, peer reviewed by the computing community, and evolving as our field continues to evolve. If you have feedback for us, please drop us a line. We would love to hear from you!
Ways to Use This Book
Our textbook covers a broad range of topics related to computer systems, specifically targeting intermediate-level courses such as introduction to computer systems or computer organization. It can also be used to provide background reading for upper-level courses such as operating systems, compilers, parallel and distributed computing, and computer architecture.
This textbook is not designed to provide complete coverage of all systems topics. It does not include advanced or full coverage of operating systems, computer architecture, or parallel and distributed computing topics, nor is it designed to be used in place of textbooks devoted to advanced coverage of these topics in upper-level courses. Instead, it focuses on introducing computer systems, common themes in systems in the context of understanding how a computer runs a program, and how to design programs to run efficiently on systems. The topic coverage provides a common knowledge base and skill set for more advanced study in systems topics.
Our book’s topics can be viewed as a vertical slice through a computer. At the lowest layer we discuss binary representation of programs and circuits designed to store and execute programs, building up a simple CPU from basic gates that can execute program instructions. At the next layer we introduce the operating system, focusing on its support for running programs and for managing computer hardware, particularly on the mechanisms of implementing multiprogramming and virtual memory support. At the highest layer, we present the C programming language and how it maps to low-level code, how to design efficient code, compiler optimizations, and parallel computing. A reader of the entire book will gain a basic understanding of how a program written in C (and Pthreads) executes on a computer and, based on this understanding, will know some ways in which they can change the structure of their program to improve its performance.
Although as a whole the book provides a vertical slice through the computer, the book chapters are written as independently as possible so that an instructor can mix and match chapters for their particular needs. The chapter dependency graph is shown below, though individual sections within chapters may not have as deep a dependency hierarchy as the entire chapter.

Summary of Chapter Topics
Chapter 0, Introduction: Introduction to computer systems and some tips for reading this book.
Chapter 1, Introduction to C Programming: Covers C programming basics, including compiling and running C programs. We assume readers of this book have had an introduction to programming in some programming language. We compare example C syntax to Python syntax so that readers familiar with Python can see how they may translate. However, Python programming experience is not necessary for reading or understanding this chapter.
Chapter 2, A Deeper Dive into C: Covers most of the C language, notably pointers and dynamic memory. We also elaborate on topics from Chapter 1 in more detail and discuss some advanced C features.
Chapter 3, C Debugging Tools: Covers common C debugging tools (GDB and Valgrind) and illustrates how they can be used to debug a variety of applications.
Chapter 4, Binary and Data Representation: Covers encoding data into binary, binary representation of C types, arithmetic operations on binary data, and arithmetic overflow.
Chapter 5, Gates, Circuits, and Computer Architecture: Covers the von Neumann architecture from logic gates to the construction of a basic CPU. We characterize clock-driven execution and the stages of instruction execution though arithmetic, storage, and control circuits. We also briefly introduce pipelining, some modern architecture features, and a short history of computer architecture.
Chapters 6-10, Assembly Programming: Covers translating C into assembly code from basic arithmetic expressions to functions, the stack, and array and struct access. In three separate chapters we cover assembly from three different instruction set architectures: 32-bit x86, 64-bit x86, and 64-bit ARM.
Chapter 11, Storage and the Memory Hierarchy: Covers storage devices, the memory hierarchy and its effects on program performance, locality, caching, and the Cachegrind profiling tool.
Chapter 12, Code Optimization: Covers compiler optimizations, designing programs with performance in mind, tips for code optimization, and quantitatively measuring a program’s performance.
Chapter 13, Operating Systems: Covers core operating system abstractions and the mechanisms behind them. We primarily focus on processes, virtual memory, and interprocess communication (IPC).
Chapter 14, Shared Memory Parallelism: Covers multicore processors, threads and Pthreads programming, synchronization, race conditions, and deadlock. This chapter includes some advanced topics on measuring parallel performance (speed-up, efficiency, Amdahl’s law), thread safety, and cache coherence.
Chapter 15, Advanced Parallel Systems and Programming Models: Introduces the basics of distributed memory systems and the Message Passing Interface (MPI), hardware accelerators and CUDA, and cloud computing and MapReduce.
Example Uses of This Book
Dive into Systems can be used as a primary textbook for courses that introduce computer systems topics, or individual chapters can be used to provide background information in courses that cover topics in more depth.
As examples from the authors’ two institutions, we have been using it as the primary textbook for two different intermediate-level courses:
Introduction To Computer Systems at Swarthmore College. Chapter ordering: 4, 1 (some 3), 5, 6, 7, 10, 2 (more 3), 11, 13, 14.
Computer Organization at West Point. Chapter ordering: 1, 4, 2 (some 3), 6, 7, 10, 11, 12, 13, 14, 15.
Additionally, we use individual chapters as background reading in many of our upper-level courses, including:
| Upper-level Course Topic | Chapters for Background Readings | 
|---|---|
| Architecture | 5, 11 | 
| Compilers | 6, 7, 8, 9, 10, 11, 12 | 
| Database Systems | 11, 14, 15 | 
| Networking | 4, 13, 14 | 
| Operating Systems | 11, 13, 14 | 
| Parallel and Distributed Systems | 11, 13, 14, 15 | 
Finally, Chapters 2 and 3 are used as C programming and debugging references in many of our courses.
Available Online
0. Introduction
Dive into the fabulous world of computer systems! Understanding what a computer system is and how it runs your programs can help you to design code that runs efficiently and that can make the best use of the power of the underlying system. In this book, we take you on a journey through computer systems. You will learn how your program written in a high-level programming language (we use C) executes on a computer. You will learn how program instructions translate into binary and how circuits execute their binary encoding. You will learn how an operating system manages programs running on the system. You will learn how to write programs that can make use of multicore computers. Throughout, you will learn how to evaluate the systems costs associated with program code and how to design programs to run efficiently.
What Is a Computer System?
A computer system combines the computer hardware and special system software that together make the computer usable by users and programs. Specifically, a computer system has the following components (see Figure 1):
Input/output (IO) ports enable the computer to take information from its environment and display it back to the user in some meaningful way.
A central processing unit (CPU) runs instructions and computes data and memory addresses.
Random access memory (RAM) stores the data and instructions of running programs. The data and instructions in RAM are typically lost when the computer system loses power.
Secondary storage devices like hard disks store programs and data even when power is not actively being provided to the computer.
An operating system (OS) software layer lies between the hardware of the computer and the software that a user runs on the computer. The OS implements programming abstractions and interfaces that enable users to easily run and interact with programs on the system. It also manages the underlying hardware resources and controls how and when programs execute. The OS implements abstractions, policies, and mechanisms to ensure that multiple programs can simultaneously run on the system in an efficient, protected, and seamless manner.
The first four of these define the computer hardware component of a computer system. The last item (the operating system) represents the main software part of the computer system. There may be additional software layers on top of an OS that provide other interfaces to users of the system (e.g., libraries). However, the OS is the core system software that we focus on in this book.

Figure 1. The layered components of a computer system
We focus specifically on computer systems that have the following qualities:
They are general purpose, meaning that their function is not tailored to any specific application.
They are reprogrammable, meaning that they support running a different program without modifying the computer hardware or system software.
To this end, many devices that may “compute” in some form do not fall into the category of a computer system. Calculators, for example, typically have a processor, limited amounts of memory, and I/O capability. However, calculators typically do not have an operating system (advanced graphing calculators like the TI-89 are a notable exception to this rule), do not have secondary storage, and are not general purpose.
Another example that bears mentioning is the microcontroller, a type of integrated circuit that has many of the same capabilities as a computer. Microcontrollers are often embedded in other devices (such as toys, medical devices, cars, and appliances), where they control a specific automatic function. Although microcontrollers are general purpose, reprogrammable, contain a processor, internal memory, secondary storage, and are I/O capable, they lack an operating system. A microcontroller is designed to boot and run a single specific program until it loses power. For this reason, a microcontroller does not fit our definition of a computer system.
What Do Modern Computer Systems Look Like?
Now that we have established what a computer system is (and isn’t), let’s discuss what computer systems typically look like. Figure 2 depicts two types of computer hardware systems (excluding peripherals): a desktop computer (left) and a laptop computer (right). A U.S. quarter on each device gives the reader an idea of the size of each unit.

Figure 2. Common computer systems: a desktop (left) and a laptop (right) computer
Notice that both contain the same hardware components, though some of the components may have a smaller form factor or be more compact. The DVD/CD bay of the desktop was moved to the side to show the hard drive underneath — the two units are stacked on top of each other. A dedicated power supply helps provide the desktop power.
In contrast, the laptop is flatter and more compact (note that the quarter in this picture appears a bit bigger). The laptop has a battery and its components tend to be smaller. In both the desktop and the laptop, the CPU is obscured by a heavyweight CPU fan, which helps keep the CPU at a reasonable operating temperature. If the components overheat, they can become permanently damaged. Both units have dual inline memory modules (DIMM) for their RAM units. Notice that laptop memory modules are significantly smaller than desktop modules.
In terms of weight and power consumption, desktop computers typically consume 100 - 400 W of power and typically weigh anywhere from 5 to 20 pounds. A laptop typically consumes 50 - 100 W of power and uses an external charger to supplement the battery as needed.
The trend in computer hardware design is toward smaller and more compact devices. Figure 3 depicts a Raspberry Pi single-board computer. A single-board computer (SBC) is a device in which the entirety of the computer is printed on a single circuit board.

Figure 3. A Raspberry Pi single-board computer
The Raspberry Pi SBC contains a system-on-a-chip (SoC) processor with integrated RAM and CPU, which encompasses much of the laptop and desktop hardware shown in Figure 2. Unlike laptop and desktop systems, the Raspberry Pi is roughly the size of a credit card, weighs 1.5 ounces (about a slice of bread), and consumes about 5 W of power. The SoC technology found on the Raspberry Pi is also commonly found in smartphones. In fact, the smartphone is another example of a computer system!
Lastly, all of the aforementioned computer systems (Raspberry Pi and smartphones included) have multicore processors. In other words, their CPUs are capable of executing multiple programs simultaneously. We refer to this simultaneous execution as parallel execution. Basic multicore programming is covered in Chapter 14 of this book.
All of these different types of computer hardware systems can run one or more general purpose operating systems, such as macOS, Windows, or Unix. A general-purpose operating system manages the underlying computer hardware and provides an interface for users to run any program on the computer. Together these different types of computer hardware running different general-purpose operating systems make up a computer system.
What You Will Learn In This Book
By the end of this book, you will know the following:
How a computer runs a program: You will be able to describe, in detail, how a program expressed in a high-level programming language gets executed by the low-level circuitry of the computer hardware. Specifically, you will know:
how program data gets encoded into binary and how the hardware performs arithmetic on it
how a compiler translates C programs into assembly and binary machine code (assembly is the human-readable form of binary machine code)
how a CPU executes binary instructions on binary program data, from basic logic gates to complex circuits that store values, perform arithmetic, and control program execution
how the OS implements the interface for users to run programs on the system and how it controls program execution on the system while managing the system’s resources.
How to evaluate systems costs associated with a program’s performance: A program runs slowly for a number of reasons. It could be a bad algorithm choice or simply bad choices on how your program uses system resources. You will understand the Memory Hierarchy and its effects on program performance, and the operating systems costs associated with program performance. You will also learn some valuable tips for code optimization. Ultimately, you will be able to design programs that use system resources efficiently, and you will know how to evaluate the systems costs associated with program execution.
How to leverage the power of parallel computers with parallel programming: Taking advantage of parallel computing is important in today’s multicore world. You will learn to exploit the multiple cores on your CPU to make your program run faster. You will know the basics of multicore hardware, the OS’s thread abstraction, and issues related to multithreaded parallel program execution. You will have experience with parallel program design and writing multithreaded parallel programs using the POSIX thread library (Pthreads). You will also have an introduction to other types of parallel systems and parallel programming models.
Along the way, you will also learn many other important details about computer systems, including how they are designed and how they work. You will learn important themes in systems design and techniques for evaluating the performance of systems and programs. You’ll also master important skills, including C and assembly programming and debugging.
Getting Started with This Book
A few notes about languages, book notation, and recommendations for getting started reading this book:
Linux, C, and the GNU Compiler
We use the C programming language in examples throughout the book. C is a high-level programming language like Java and Python, but it is less abstracted from the underlying computer system than many other high-level languages. As a result, C is the language of choice for programmers who want more control over how their program executes on the computer system.
The code and examples in this book are compiled using the GNU C Compiler (GCC) and run on the Linux operating system. Although not the most common mainstream OS, Linux is the dominant OS on supercomputing systems and is arguably the most commonly used OS by computer scientists.
Linux is also free and open source, which contributes to its popular use in these settings. A working knowledge of Linux is an asset to all students in computing. Similarly, GCC is arguably the most common C compiler in use today. As a result, we use Linux and GCC in our examples. However, other Unix systems and compilers have similar interfaces and functionality.
In this book, we encourage you to type along with the listed examples. Linux commands appear in blocks like the following:
$
The $ represents the command prompt. If you see a box that looks like
$ uname -a
this is an indication to type uname -a on the command line. Make sure that you don’t type the $ sign!
The output of a command is usually shown directly after the command in a command line listing. As an example, try typing in uname -a. The output of this command varies from system to system. Sample output for a 64-bit system is shown here.
$ uname -a
Linux Fawkes 4.4.0-171-generic #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux
The uname command prints out information about a particular system. The -a flag prints out all relevant information associated with the system in the following order:
The kernel name of the system (in this case Linux)
The hostname of the machine (e.g., Fawkes)
The kernel release (e.g., 4.4.0-171-generic)
The kernel version (e.g., #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019)
The machine hardware (e.g., x86_64)
The type of processor (e.g., x86_64)
The hardware platform (e.g., x86_64)
The operating system name (e.g., GNU/Linux)
You can learn more about the uname command or any other Linux command by prefacing the command with man, as shown here:
$ man uname
This command brings up the manual page associated with the uname command. To quit out of this interface, press the q key.
While a detailed coverage of Linux is beyond the scope of this book, readers can get a good introduction in the online Appendix 2 - Using UNIX. There are also several online resources that can give readers a good overview. One recommendation is “The Linux Command Line“1.
Other Types of Notation and Callouts
Aside from the command line and code snippets, we use several other types of “callouts” to represent content in this book.
The first is the aside. Asides are meant to provide additional context to the text, usually historical. Here’s a sample aside:
The second type of callout we use in this text is the note. Notes are used to highlight important information, such as the use of certain types of notation or suggestions on how to digest certain information. A sample note is shown below:
| ** | How to do the readings in this book As a student, it is important to do the readings in the textbook. Notice that we say “do” the readings, not simply “read” the readings. To “read” a text typically implies passively imbibing words off a page. We encourage students to take a more active approach. If you see a code example, try typing it in! It’s OK if you type in something wrong, or get errors; that’s the best way to learn! In computing, errors are not failures — they are simply experience. | 
The last type of callout that students should pay specific attention to is the warning. The authors use warnings to highlight things that are common “gotchas” or a common cause of consternation among our own students. Although all warnings may not be equally valuable to all students, we recommend that you review warnings to avoid common pitfalls whenever possible. A sample warning is shown here:
| ** | This book contains puns The authors (especially the first author) are fond of puns and musical parodies related to computing (and not necessarily good ones). Adverse reactions to the authors’ sense of humor may include (but are not limited to) eye-rolling, exasperated sighs, and forehead slapping. | 
If you are ready to get started, please continue on to the first chapter as we dive into the wonderful world of C. If you already know some C programming, you may want to start with Chapter 4 on binary representation, or continue with more advanced C programming in Chapter 2.
We hope you enjoy your journey with us!
References
William Shotts. “The Linux Command Line”, LinuxCommand.org, https://linuxcommand.org/
1. By the C, by the C, by the Beautiful C
“By the Beautiful Sea”, Carroll and Atteridge, 1914
This chapter presents an overview of C programming written for students who have some experience programming in another language. It’s specifically written for Python programmers and uses a few Python examples for comparison purposes (Appendix 1 is a version of Chapter 1 for Java programmers). However, it should be useful as an introduction to C programming for anyone with basic programming experience in any language.
C is a high-level programming language like other languages you might know, such as Python, Java, Ruby, or C++. It’s an imperative and a procedural programming language, which means that a C program is expressed as a sequence of statements (steps) for the computer to execute and that C programs are structured as a set of functions (procedures). Every C program must have at least one function, the main function, which contains the set of statements that execute when the program begins.
The C programming language is less abstracted from the computer’s machine language than some other languages with which you might be familiar. This means that C doesn’t have support for object-oriented programming (like Python, Java, and C++) or have a rich set of high-level programming abstractions (such as strings, lists, and dictionaries in Python). As a result, if you want to use a dictionary data structure in your C program, you need to implement it yourself, as opposed to just importing the one that is part of the programming language (as in Python).
C’s lack of high-level abstractions might make it seem like a less appealing programming language to use. However, being less abstracted from the underlying machine makes C easier for a programmer to see and understand the relationship between a program’s code and the computer’s execution of it. C programmers retain more control over how their programs execute on the hardware, and they can write code that runs more efficiently than equivalent code written using the higher-level abstractions provided by other programming languages. In particular, they have more control over how their programs manage memory, which can have a significant impact on performance. Thus, C remains the de facto language for computer systems programming where low-level control and efficiency are crucial.
We use C in this book because of its expressiveness of program control and its relatively straightforward translation to assembly and machine code that a computer executes. This chapter introduces programming in C, beginning with an overview of its features. Chapter 2 then describes C’s features in more detail.
1.1. Getting Started Programming in C
Let’s start by looking at a “hello world” program that includes an example of calling a function from the math library. In Table 1 we compare the C version of this program to the Python version. The C version might be put in a file named hello.c (.c is the suffix convention for C source code files), whereas the Python version might be in a file named hello.py.
| Python version (hello.py) | C version (hello.c) | 
|---|---|
| ``` | |
| ‘’’ | |
| The Hello World Program in Python | |
| ‘’’ | 
Python math library
from math import *
main function definition:
def main():
statements on their own line
print(“Hello World”) print(“sqrt(4) is %f” % (sqrt(4)))
call the main function:
main()
|
/*
The Hello World Program in C
*/
/* C math and I/O libraries */ #include <math.h> #include <stdio.h>
/* main function definition: */ int main(void) { // statements end in a semicolon (;) printf(“Hello World\n”); printf(“sqrt(4) is %f\n”, sqrt(4));
return 0; // main returns value 0 }
Notice that both versions of this program have similar structure and language constructs, albeit with different language syntax\. In particular:
**Comments:**
-
In Python, multiline comments begin and end with `'''`, and single-line comments begin with `#`\.
-
In C, multiline comments begin with `/*` and end with `*/`, and single-line comments begin with `//`\.
**Importing library code:**
-
In Python, libraries are included \(imported\) using `import`\.
-
In C, libraries are included \(imported\) using `#include`\. All `#include` statements appear at the top of the program, outside of function bodies\.
**Blocks:**
-
In Python, indentation denotes a block\.
-
In C, blocks \(for example, function, loop, and conditional bodies\) start with `{` and end with `}`\.
**The main function:**
-
In Python, `def main():` defines the main function\.
-
In C, `int main(void){ }` defines the main function\. The `main` function returns a value of type `int`, which is C’s name for specifying the signed integer type \(signed integers are values like -3, 0, 1234\)\. The `main` function returns the `int` value 0 to signify running to completion without error\. The `void` means it doesn’t expect to receive a parameter\. Future sections show how `main` can take parameters to receive command line arguments\.
**Statements:**
-
In Python, each statement is on a separate line\.
-
In C, each statement ends with a semicolon `;`\. In C, statements must be within the body of some function \(in `main` in this example\)\.
**Output:**
-
In Python, the `print` function prints a formatted string\. Values for the placeholders in the format string follow a `%` symbol in a comma-separated list of values \(for example, the value of `sqrt(4)` will be printed in place of the `%f` placeholder in the format string\)\.
-
In C, the `printf` function prints a formatted string\. Values for the placeholders in the format string are additional arguments separated by commas \(for example, the value of `sqrt(4)` will be printed in place of the `%f` placeholder in the format string\)\.
There are a few important differences to note in the C and Python versions of this program:
**Indentation:** In C, indentation doesn’t have meaning, but it’s good programming style to indent statements based on the nested level of their containing block\.
**Output:** C’s `printf` function doesn’t automatically print a newline character at the end like Python’s `print` function does\. As a result, C programmers need to explicitly specify a newline character \(`\n`\) in the format string when a newline is desired in the output\.
**`main` function:**
-
A C program must have a function named `main`, and its return type must be `int`\. This means that the `main` function returns a signed integer type value\. Python programs don’t need to name their main function `main`, but they often do by convention\.
-
The C `main` function has an explicit `return` statement to return an `int` value \(by convention, `main` should return `0` if the main function is successfully executed without errors\)\.
-
A Python program needs to include an explicit call to its `main` function to run it when the program executes\. In C, its `main` function is automatically called when the C program executes\.
#### 1\.1\.1\. Compiling and Running C Programs
Python is an interpreted programming language, which means that another program, the Python interpreter, runs Python programs: the Python interpreter acts like a virtual machine on which Python programs are run\. To run a Python program, the program source code \(`hello.py`\) is given as input to the Python interpreter program that runs it\. For example \(`$` is the Linux shell prompt\):
$ python hello.py
The Python interpreter is a program that is in a form that can be run directly on the underlying system \(this form is called **binary executable**\) and takes as input the Python program that it runs \([Figure 4](#FigPythonExecution)\)\.

Figure 4\. A Python program is directly executed by the Python interpreter, which is a binary executable program that is run on the underlying system \(OS and hardware\)
To run a C program, it must first be translated into a form that a computer system can directly execute\. A C **compiler** is a program that translates C source code into a **binary executable** form that the computer hardware can directly execute\. A binary executable consists of a series of 0’s and 1’s in a well-defined format that a computer can run\.
For example, to run the C program `hello.c` on a Unix system, the C code must first be compiled by a C compiler \(for example, the [GNU C compiler](https://gcc.gnu.org), GCC\) that produces a binary executable \(by default named `a.out`\)\. The binary executable version of the program can then be run directly on the system \([Figure 5](#FigCCompilation)\):
$ gcc hello.c $ ./a.out
\(Note that some C compilers might need to be explicitly told to link in the math library: `-lm`\):
$ gcc hello.c -lm

Figure 5\. The C compiler \(gcc\) builds C source code into a binary executable file \(a\.out\)\. The underlying system \(OS and hardware\) directly executes the a\.out file to run the program\.
##### Detailed Steps
In general, the following sequence describes the necessary steps for editing, compiling, and running a C program on a Unix system:
1.
Using a [text editor](https://www.cs.swarthmore.edu/help/editors.html) \(for example, `vim`\), write and save your C source code program in a file \(e\.g\., `hello.c`\):
$ vim hello.c
1.
Compile the source to an executable form, and then run it\. The most basic syntax for compiling with `gcc` is:
$ gcc <input_source_file>
If compilation yields no errors, the compiler creates a binary executable file named `a.out`\. The compiler also allows you to specify the name of the binary executable file to generate using the `-o` flag:
$ gcc -o <output_executable_file> <input_source_file>
For example, this command instructs `gcc` to compile `hello.c` into an executable file named `hello`:
$ gcc -o hello hello.c
We can invoke the executable program using `./hello`:
$ ./hello
Any changes made to the C source code \(the `hello.c` file\) must be recompiled with `gcc` to produce a new version of `hello`\. If the compiler detects any errors during compilation, the `./hello` file won’t be created/re-created \(but beware, an older version of the file from a previous successful compilation might still exist\)\.
Often when compiling with `gcc`, you want to include several command line options\. For example, these options enable more compiler warnings and build a binary executable with extra debugging information:
$ gcc -Wall -g -o hello hello.c
Because the `gcc` command line can be long, frequently the `make` utility is used to simplify compiling C programs and for cleaning up files created by `gcc`\. [Using make and writing Makefiles](https://www.cs.swarthmore.edu/~newhall/unixhelp/howto_makefiles.html) are important skills that you will develop as you build up experience with C programming\.
We cover compiling and linking with C library code in more detail at the end of [Chapter 2](#_compilation_steps_)\.
#### 1\.1\.2\. Variables and C Numeric Types
Like Python, C uses variables as named storage locations for holding data\. Thinking about the **scope** and **type** of program variables is important to understand the semantics of what your program will do when you run it\. A variable’s **scope** defines when the variable has meaning \(that is, where and when in your program it can be used\) and its lifetime \(that is, it could persist for the entire run of a program or only during a function activation\)\. A variable’s **type** defines the range of values that it can represent and how those values will be interpreted when performing operations on its data\.
In C, all variables must be declared before they can be used\. To declare a variable, use the following syntax:
type_name variable_name;
A variable can have only a single **type**\. The basic C types include `char`, `int`, `float`, and `double`\. By convention, C variables should be declared at the beginning of their scope \(at the top of a `{ }` block\), before any C statements in that scope\.
Below is an example C code snippet that shows declarations and uses of variables of some different types\. We discuss types and operators in more detail after the example\.
vars\.c
{ /* 1. Define variables in this block’s scope at the top of the block. */
int x; // declares x to be an int type variable and allocates space for it
int i, j, k; // can define multiple variables of the same type like this
char letter; // a char stores a single-byte integer value // it is often used to store a single ASCII character // value (the ASCII numeric encoding of a character) // a char in C is a different type than a string in C
float winpct; // winpct is declared to be a float type double pi; // the double type is more precise than float
/* 2. After defining all variables, you can use them in C statements. */
x = 7; // x stores 7 (initialize variables before using their value) k = x + 2; // use x’s value in an expression
letter = ‘A’; // a single quote is used for single character value letter = letter + 1; // letter stores ‘B’ (ASCII value one more than ‘A’)
pi = 3.1415926;
winpct = 11 / 2.0; // winpct gets 5.5, winpct is a float type j = 11 / 2; // j gets 5: int division truncates after the decimal x = k % 2; // % is C’s mod operator, so x gets 9 mod 2 (1) }
Note the semicolons galore\. Recall that C statements are delineated by `;`, not line breaks — C expects a semicolon after every statement\. You’ll forget some, and `gcc` almost never informs you that you missed a semicolon, even though that might be the only syntax error in your program\. In fact, often when you forget a semicolon, the compiler indicates a syntax error on the line *after* the one with the missing semicolon: the reason is that `gcc` interprets it as part of the statement from the previous line\. As you continue to program in C, you’ll learn to correlate `gcc` errors with the specific C syntax mistakes that they describe\.
#### 1\.1\.3\. C Types
C supports a small set of built-in data types, and it provides a few ways in which programmers can construct basic collections of types \(arrays and structs\)\. From these basic building blocks, a C programmer can build complex data structures\.
C defines a s