Today we continue the series by introducing variables and how they’re initialized. This is another basic topic with surprising complexity for C# developers.

Table of Contents

Declaration

The basic form of a variable declaration should be very familiar to C# developers:

int x;

Just like in C#, we state the type of the variable, the variable’s name, and end with a semicolon. We can also declare multiple variables in one statement:

int x, y, z;

Also like C#, these variables do not yet have a value. Consider trying to read the value of such a variable:

int x;
int y = x;

In C#, this would result in a compiler error on the second line. The compiler knows that x doesn’t have a value, so it can’t be read and assigned to y. In C++, this is known as “undefined behavior.” When the compiler encounters undefined behavior, it is free to generate arbitrary code for the entire executable. It may or may not produce a warning or error to warn about this, meaning it may silently produce an executable that doesn’t do what the author thinks it should do. It is very important to never invoke undefined behavior and tools have been written to help avoid it.

This undefined behavior does have a purpose: speed. Consider this:

int localPlayerHealth;
foreach (Player p in players)
{
    if (p.IsLocal)
    {
        localPlayerHealth = p.Health;
        break;
    }
}
Debug.Log(localPlayerHealth);

We know that one player has to be the local player because that’s how our game was designed, so it’s safe to not initialize localPlayerHealth before the loop. Initializing it to 0 would be wasteful in this case, but the C# compiler doesn’t know about our game design so it can’t prove that we’ll always find the local player and it forces us to initialize.

In C++, we’re free to skip this initialization and assume the risk of undefined behavior if it turns out there really wasn’t a local player in the players array. Alternatively, we can replicate the C# approach and just initialize the variable to be safe.

Initialization

C++ provides a lot of ways to initialize variables. We’ve already seen one above where a value is copied:

int x = y;

There are also some other ways that aren’t in C#:

int x{}; // x is filled with zeroes, so x == 0
int x{123};
int x(123);

Many more types of initialization exist, but are specific to certain types such as arrays and classes. We’ll cover these later in the series.

All of these initialization strategies can be combined when declaring multiple variables in one statement:

int a, b = 123, c{}, d{456}, e(789);

This results in these values:

Variable Value
a (Unknown)
b 123
c 0
d 456
e 789
Type Deduction

In C#, we can use var to avoid needing to specify the type of our variables. Similarly, C++ has the auto keyword:

auto x = 123;
auto x{123};
auto x(123);

Also similar to C#, we can only use auto when there is an initializer. The following isn't allowed:

auto x;
auto x{};

It's important to remember that x is just as strongly-typed as if int were explicitly specified. All that's happening here is that the compiler is figuring out what type the variable should be rather than us typing it out manually.

An alternative approach, much less frequently seen, is to use the decltype operator. This resolves to the type of its parameter:

int x;
decltype(x) y = 123; // y is an int

Lastly, since C++17 the register keyword has been deprecated:

register int x = 123;

It used to request that the variable be placed into a CPU register rather than in RAM, such as on the stack. Compilers have long ignored this request, so it's best to avoid this keyword now.

Identifiers

The rules for naming C++ identifiers are similar to the rules for C#. They must begin with a letter, underscore, or any non-digit Unicode character. After that, they can contain any Unicode character except some really strange ones.

Additionally, there are some restrictions on the names we can choose:

Restriction Example Where
All keywords int for All code
operator then an operator symbol operator+ All code
~ then a class name ~MyClass All code
Any name beginning with double underscores int __x All code except the Standard Library
Any name beginning with an underscore then a capital letter int _X All code except the Standard Library
Any name beginning with an underscore int _x All code in the global namespace except the Standard Library

There is no equivalent to C#'s "verbatim identifiers" (e.g. int @for) to work around the keyword restriction.

Pointers

Like C#, at least when "unsafe" features are enabled, C++ has pointer types. The syntax is even similar:

int* x;
int * x;
int *x;

The placement of the * is flexible, just like in C#. However, declaring multiple variables in one line is different in C++. Consider this declaration:

int* x, y, z;

The type of y differs between the languages since the * only attaches to one variable in C++:

Language Type of x Type of y Type of z
C# int* int* int*
C++ int* int int

To make all three variables into pointers in C++, add a * to each:

int *x, *y, *z;

Or omit a * so that only some are pointers:

int *x, y, *z; // x and z are int*, y is int

We'll cover how to actually use pointers more in depth later in the series.

References

C++ has two kinds of references: "lvalue" and "rvalue." Just like with pointers, these are an annotation on another type:

// lvalue references
int& x;
int & x;
int &x;
 
// rvalue references
int&& x;
int && x;
int &&x;

When declaring more than one variable per statement, the same rule applies here: & or && only attaches to one variable:

int &x, y; // x is an int&, y is an int

Taken all together, this means we can declare several variables per statement and each can have their own modifier on the stated type:

int a, *b, &c, &&d;

The variables get these types:

Variable Type
a int
b int*
c int&
d int&&

We'll dive into the details of how lvalue references and rvalue references work later in the series. For now, it's important to know that they are like non-nullable pointers. This means we must initialize them when they are declared. All of the above lines will fail to compile since we didn't. So let's correct that:

int x = 123;
int& y = x;
 
int&& z = 456;

Here we have y as an "lvalue reference to an int." We initialize it to an lvalue, which is essentially anything with a name. x has a name and is the right type: int. The result is that y now references x.

z is an "rvalue reference to an int." An rvalue is essentially anything without a name. We initialize it to 456 which has no name but does have the right type: int. This means that z now references 456.

Putting this back together, we end up with multiple variables being declared and initialized when required like this:

int x = 123;
int a, *b, &c = x, &&d = 456;
Conclusion

At a high level, variables in C++ are similar to C#. In the details though, there are very important differences. The undefined behavior stemming from not initializing them, pointer and reference characters only applying to one variable in multiple declaration, various new kinds of initialization syntax, and the presence of both lvalue and rvalue references all make for a pretty different landscape even in this basic category of variables.

Later in the series, we'll expand this topic when we discuss classes, arrays, function pointers, lambdas, and all kinds of other exotic topics. Stay tuned!