M05 - Pointers, References, and Constants

Introduction

Memory & Variables

"What does memory look like?"
Declarations allocate memory. Variables are references to memory.
Bindings!

References

We may want to have/share a *direct reference to an object* in memory.

Pointers

We may want to have/share an *indirect reference to an object* in memory

Const-ness

We can mark objects as constant
We can also express that references:
wont change (i.e. a constant pointer)
points to something that won't change (pointer to a constant object)
or, both -- is a reference that will always point to the same object that itself won't change

So far, we have seen how to declare and use variables in C++. In this module, we're going to take a closer look into the magic that goes on when declaring variables, how we can share data, and how we can protect data from being modified.

Bindings

Declarations

Computers have memory organized in a linear fashion. Memory is divided into bytes, each byte having a unique address. Let us consider the following code:

char letter = 'A'; // 1 byte
double pi = 3.14159; // 8 bytes
int x = 5; // 4 bytes

The code above declares three variables: letter, pi, and x. Variable declarations instruct the compiler to":

allocate necessary memory to store an object of the declared type, and
bind the variable to object in memory.

In the case of the code above, the compiler allocates 1 byte for letter, 8 bytes for pi, and 4 bytes for x. Additionally, the variables letter, pi, and x are bound to the memory locations where the objects are stored. The binding of a variable to an object is what allows us to refer to the object by its name.

Static Typing & the sizeof Operator

Notice that the size of variables must at compile-time for the compiler to be able to help us allocate memory.

This is possible because C++ is a statically-typed language. In contrast, dynamically-typed languages like Python allow us to declare variables without specifying their types. In such languages, the size of variables cannot be determined at compile-time and must be determined at runtime instead. This is one of the reasons why C++ is faster than Python.

The sizeof operator is a compile-time operator that returns the size of a variable in bytes. For example, sizeof(char) returns 1, sizeof(double) returns 8, and sizeof(int) returns 4. You can use the sizeof operator to determine the size of a variable in your code (e.g., sizeof(letter) returns 1, sizeof(pi) returns 8, and sizeof(x) returns 4).

Assignment

In assignment, the value of the object is copied from the right-hand side to the left-hand side. References are not re-bound in assignment (in fact, references can never be re-bound). For example:

int x = 5;
int y = x; // y is a copy of x
int z = 10;

y = 15; // y is now 15, x is still 5

z = x; // z is now 5
z = 20; // z is now 20, x is still 5

In the code above, y is a copy of x's value. Changing y does not affect x. Similary, z is assigned the value of x, hence changing z does not affect x.

References

References enable us to create a direct (named) reference to an object in memory. They are direct in the sense that they are bound to the object they refer to, so you can treat them as if they were the object itself.

To declare a reference to an object, we declare a variable of the same type as the object and add an ampersand (&) after the type to indicate that the variable is a reference to an existing object.

int x = 5;
int &y = x; // y is a reference to x

x = 10; // x and y are now both 10
y = 20; // x and y are now both 20

The code above declares and initializes an integer variable x with the value 5. We then declare an integer reference y and bind it to (the object of) x. Now, x and y both refer to the same object in memory. So, changing the value of the object through x also changes the value of the object through y, and vice versa.

Pass-by-Reference

References are useful for sharing objects between functions. Consider the following function:

Player getHighestScorePlayer(std::vector<Player> players) { ... }

The function above takes a vector of players and returns the player with the highest score. However, the function makes a copy of the vector of players, which can be inefficient if the vector is large.

A better approach would be to pass in a reference to to the existing vector of players:

// Pass by reference (notice the &)
Player getHighestScorePlayer(std::vector<Player> &players) { ... }

We call this passing by reference. By passing by reference, we avoid making a copy of the vector of players, which can be more efficient.

Addresses

So far we know that variables are bound to objects in memory; but where exactly in memory are these objects stored?

Let's consider the following variables:

int x = 5;
char letter = 'A';
double number = 3.14159;

Memory Layout

As we can see in the diagram above, the variables x, letter, and number are bound to objects in memory that have unique addresses. For instance, the addresses of x, letter, and number are 0x1000, 0x1004, and 0x1005.

Depending on the system, the size of the memory address can vary. For example, a 32-bit system uses 4 bytes to store memory addresses, while a 64-bit system uses 8 bytes. In the example above, we are using 4-byte (32-bit) memory addresses since each address can be represented using 8 hexadecimal digits.

Address-of Operator

The address-of operator (&) returns the memory address of an object in memory.

For example, &x returns 0x1000, &letter returns 0x1004, and &number returns 0x1005.

Pointers

Pointers enable us to create an indirect reference to an object in memory. They are indirect in the sense that they store the address of the object they refer to, so you must dereference them to access the object.

To declare a pointer to an object, we declare a variable of the same type as the object and add an asterisk (*) after the type to indicate that the variable is a pointer.

int x = 5; // at address 0x0100
int *y = &x; // y is a pointer to x

Here, y is a pointer to x. That is, y stores the address of x.

Uninitialized Pointers

Similarly to how uninitialized variables have undefined values, uninitialized pointers (which are just variables) have undefined values.

Player *ptr; // uninitialized pointer, probably pointing to garbage

Dereferencing an uninitialized pointer is undefined behavior and can result in your program attempting to access memory it doesn't have access to or reading garbage values.

To avoid this, always initialize pointers to nullptr if you don't have a valid address to assign to them.

Player *player = loadPlayer(); // pointer is now possibly pointing to a valid object

if (player != nullptr) {
    // Dereference the pointer
    std::cout << "Player's name: " << player->getName() << std::endl;
} else {
    std::cout << "Failed to load player." << std::endl;
}

Dereferencing

We can dereference a pointer to access the object it points to. To dereference a pointer, we use the asterisk (*) operator before the pointer variable.

int x = 5;
int *y = &x;

*y = 10; // x is now 10 (and y remains unchanged)

Here, we assigned the object pointed to by y the value 10.

Pass-by-Pointer

Pointers are useful when we want to share a reference whose underlying object may change.

Consider the following function:

int main() {
    // Load players from a file
    vector<Player*> players = loadPlayers();

    Player* lastWinner = getHighestScoringPlayer(players);
    while (true) {
        // Play a round
        ...

        // Report the winner
        Player *winner = getHighestScoringPlayer(players);
        if (winner != lastWinner) {
            std::cout << "New winner!" << std::endl;
            std::cout << "Last round's winner lost with a score of " << (*lastWinner).getScore() << " points." << std::endl;
            lastWinner = winner;
        } else {
            std::cout << "Winner is on a streak!" << std::endl;
        }
    }
}

At the end of every round we want to report if there is a new winner. We can use pointers to keep track of the last winner. This way, we can compare the last winner with the current winner to determine if there is a new winner or they're on a streak.

Notice that since we're storing a pointer to the last winner, we can access the last winner's new score withotu having to find them in the players vector.

Arrow Operator

The arrow operator (->) is a shorthand for dereferencing a pointer and accessing a member of the object it points to.

For example, (*lastWinner).getScore() is equivalent to lastWinner->getScore().

Constants

Constant Objects

We can mark objects as constant by using the const keyword. The notion of a constant object hashas two implications:

The object cannot be reassigned a new value through assignment.
The object cannot be internally modified.

int const x = 5;
x = 10; // Error: assignment of read-only variable 'y'
*(&x) = 10; // Error: assignment of read-only location '*(&x)'

Notice that the const keyword is placed after the type. This is because the const keyword applies to the object, not the type. However, the const keyword can also be placed before the type (but we will soon see why this is an exception, not the rule).

Pointers, just like other objects, can be marked as constant. A constant pointer cannot be reassigned a new address. This means that the pointer itself is constant, not the object it points to. We can mark a pointer as constant by using the const keyword after the type.

int x = 5;
int y = 10;

int *const ptr = &x; // ptr is a constant pointer to x
*ptr = 10; // x is now 10

ptr = &y; // Error: assignment of read-only variable 'ptr'

Notice that the object pointed to by ptr can still be modified, but the pointer itself cannot be reassigned a new address.

In general, appending the const keyword after the type makes the object constant (of that type).

Constant References & Pointers

There are often cases where we want to share immutable references to mutable objects. For example, we may want to share an object with a function but we don't want the function to modify the object. At the same time, you may still want or need the object to be mutable elsewhere.

For example, consider the following function:

void reportTransaction(Transaction *transaction) { ... }

This function takes a pointer to a transaction (more specifically, a pointer to a mutable object). This allows the reportTransaction function to potentially modify the transaction, which may not be desirable.

We can specify that thiere is no need to modify the transaction by marking the pointer as a constant pointer:

void reportTransaction(const Transaction *transaction) {
    transaction->withdraw(100); // Error: passing 'const Transaction' as 'this' argument discards qualifiers
    transaction = nullptr; // (this is fine)
 }

This way, the reportTransaction function can still access the transaction but cannot modify it. It can however modify the pointer itself (e.g., incrementing the pointer to point to the next transaction).

We can also prevent the pointer from being modified by marking it as a constant pointer to a constant object:

void reportTransaction(const Transaction *const transaction) {
    transaction->withdraw(100); // Error: passing 'const Transaction' as 'this' argument discards qualifiers
    transaction = nullptr; // Error: assignment of read-only variable 'transaction'
}

As for references, they can never be reassigned, so marking a reference as constant is redundant. However, we can mark a reference as a (constant) reference to a constant object:

int x = 5;

const int &ref = x; // ref is a constant reference to x
ref = 10; // Error: assignment of read-only reference 'ref'

In this case, the reference ref is a constant reference to the object x. This means that the object x cannot be modified through the reference ref.

Verbology

It is often easiest to read the const keyword from right to left. For example:

T *foo - foo is a pointer to a T object
- foo can be reassigned
- what foo points to can be modified
T *const foo - foo is constant and is a pointer to a T object
- foo cannot be reassigned
- what foo points to can be modified
T const *foo - foo is a pointer to a constant T object
- foo can be reassigned
- what foo points to cannot be modified
T const *const foo (or const T *const foo) - foo is constant and is a pointer to a constant T object
- foo cannot be reassigned
- what foo points to cannot be modified

Introduction​

Bindings​

Declarations​

Assignment​

References​

Pass-by-Reference​

Addresses​

Pointers​

Dereferencing​

Pass-by-Pointer​

Constants​

Constant Objects​

Constant References & Pointers​

Verbology​