JacksonDunstan.com

Today we’ll begin exploring the C++ Standard Library. As C++ is mostly a superset of C, the C++ Standard Library is mostly a superset of the C Standard Library. So we’ll begin there!

Table of Contents

Background

First, a word of caution: the C Standard Library is very old. Most of it dates back at least 30 years and even the newer parts are about 10 years old and built to fit in with the original design. The C language itself is also very simple. Its lack of features impacts the library design.

For example, there are “families” of functions that all do the same thing but on different data types. To take an absolute value of a floating point value we call fabs for double, fabsf for float, and fabsl for long double. In C++, we’d just overload abs with different parameter types and the compiler would choose the right one to call.

The C++ Standard Library includes many more modern designs that rely on C++ language features. It has that abs overloaded function, for example. The C Standard Library is included in the C++ Standard Library largely as part of C++’s broad goal to maintain a high degree of compatibility with C code. There are a few parts of it that are genuinely useful on their own, but these are few and far between.

Still, 30+ years of momentum is a powerful force and it’s extremely common to see the C Standard Library in use even when more modern alternatives are available. That makes it important for us to understand as many C++ codebases will include some C Standard Library usage.

We’re not going to go in depth and cover every little corner of the C Standard Library today, but we’ll survey its highlights.

General Purpose

As for composition, the C++ Standard Library is made up of header files. As of C++20, modules are also available. The C Standard Library is available only as header files. C Standard Library header files are named with a .h extension: math.h. These can be included directly into C++ files: #include <math.h>. They are also wrapped by the C++ Standard Library. The wrapped versions begin with a c and drop the .h extension, so we can #include <cmath>. These wrapped header files place everything in the std namespace and may also place everything in the global namespace so both std::fabs and ::fabs work.

There’s one truly general purpose header file in the C Standard Library: stdlib.h/cstdlib. Unlike a more focused header file like math.h/cmath that obviously focuses on mathematics, a variety of utilities are provided by this header. Some of the basics include size_t, the type that the sizeof operator evaluates to, and NULL, a null pointer constant widely used before the advent of nullptr in C++11. The broad nature of this header file makes it hard to compare to C#, but it can roughly be though of as the System namespace:

#include <stdlib.h>
 
// sizeof() evaluates to size_t
size_t intSize = sizeof(int);
DebugLog(intSize); // Maybe 4
 
// NULL can be used as a pointer to indicate "null"
int* ptr = NULL;
 
// It's vulnerable to accidental misuse in arithemtic
int sum = NULL + NULL;
 
// nullptr isn't: this is a compiler error
int sum2 = nullptr + nullptr;

Before C++ introduced the new and delete operators for dynamic memory allocation, C code would use the malloc, calloc, realloc, and free functions. The C# equivalent of malloc is Marshal.AllocHGlobal, realloc is Marshal.ReallocHGlobal, and free is Marshal.FreeHGlobal:

// Allocate 1 KB of uninitialized memory
// Returns null upon failure
// Memory is untyped, so casting is required to read or write
void* memory = malloc(1024);
 
// Reading it before initialization is undefined behavior
int firstInt = ((int*)memory)[0];
 
// Release the memory. Failing to do so is a memory leak.
free(memory);
 
// Allocate and initialize to all zeroes 1 KB of memory: 256 x 4 bytes
memory = calloc(256, 4);
 
// Re-allocate previously-allocated memory to get more or less
// Old memory is not freed if allocation fails
memory = realloc(memory, 2048);
 
// Also need to release memory from calloc and realloc
free(memory);

There are some functions to parse numbers from strings, similar to int.Parse, float.Parse, etc.:

// Parse a double
double d = atof("3.14");
DebugLog(d); // 3.14
 
// Parse an int
int i = atoi("123");
DebugLog(i); // 123
 
// Parse a float and get a pointer to its end in a string
const char* floatStr = "2.2 123.456";
char* pEnd;
float f = strtof(floatStr, &pEnd);
DebugLog(f); // 2.2
 
// Use the end pointer to parse more
f = strtof(pEnd, &pEnd);
DebugLog(f); // 123.456

Some generic algorithms are provided, similar to the C# Array class as well as Random and Math:

// Seed the global randomizer
// This is not thread-safe
srand(123);
 
// Use the global randomizer to generate a random number 
int r = rand();
DebugLog(r); // Maybe 440
 
// Compare pointers to ints
auto compare = [](const void* a, const void* b) {
    return *(int*)a - *(int*)b;
};
 
// Sort an array
int a[] = { 4, 2, 1, 3 };
qsort(a, 4, sizeof(int), compare);
DebugLog(a[0], a[1], a[2], a[3]); // 1, 2, 3, 4
 
// Binary search the array for 2
int valToFind = 2;
int* pVal = (int*)bsearch(&valToFind, a, 4, sizeof(int), compare);
int index = pVal - a;
DebugLog(index); // 1
 
// Take an absolute value
DebugLog(abs(-10)); // 10
 
// Divide and also get the remainder
// stdlib.h/cstdlib also provides the div_t struct type
div_t d = div(11, 3);
DebugLog(d.quot, d.rem); // 3, 2

Finally, there’s some OS-related functionality:

// Run a system command
int exitCode = system("ping example.com");
DebugLog(exitCode); // 0 if successful
 
// Get an environment variable
char* path = getenv("PATH");
DebugLog(path); // Path to executables
 
// Register a function (lambda in this case) to be called when the program exits
atexit([]{ DebugLog("Exiting..."); });
 
// Explicitly exit the program with an exit code
exit(1); // Exiting...

Math and Numbers

The next category of header in the C Standard Library relates to mathematics. One we’ve seen throughout the series is stdint.h/cstdint, which provides integer types via typedef. Basic types like int have guaranteed sizes in C#, but this header file goes above and beyond to also define types that fulfill particular requirements:

#include <stdint.h>
 
int32_t i32; // Always signed 32-bit
int_fast32_t if32; // Fastest signed integer type with at least 32 bits
intptr_t ip; // Signed integer that can hold a pointer
int_least32_t il; // Smallest signed integer with at least 32 bits
intmax_t imax; // Biggest available signed integer
 
// Range of 32-bit integer values
DebugLog(INT32_MIN, INT32_MAX); // -2147483648, 2147483647
 
// Biggest size_t
DebugLog(SIZE_MAX); // Maybe 18446744073709551615

There are also some types in stddef.h/cstddef. Some of these are more types that satisfy particular requirements. Unusually, there are also types that are C++-specific in the cstddef version of this header:

#include <cstddef>
 
// C and C++ types
std::max_align_t ma; // Type with the biggest alignment
std::ptrdiff_t pd; // Big enough to hold the subtraction of two pointers
 
// C++-specific types
std::nullptr_t np = nullptr; // The type of nullptr
std::byte b; // An "enum class" version of a single byte

limits.h/climits also has some maximum and minimum macros, equivalent to int.MaxValue and similar in C#:

#include <limits.h>
 
// Range of int values
DebugLog(INT_MIN, INT_MAX); // Maybe -2147483648, 2147483647
 
// Range of char values
DebugLog(CHAR_MIN, CHAR_MAX); // Maybe -128, 127

The inttypes.h/cinttypes header also has integer-related utilities. These are needed because conversions to and from strings aren’t built into the language as they are in C# with functions like int.Parse:

#include <inttypes.h>
 
// Parse a hexadecimal string to an int
// The nullptr means we don't want to get a pointer to the end
intmax_t i = strtoimax("f0a2", nullptr, 16);
DebugLog(i); // 61602

Similarly, float.h/cfloat provides a bunch of floating point macros similar to what C# provides via constants like float.MaxValue:

#include <float.h>
 
// Biggest float
float f = FLT_MAX;
DebugLog(f); // 3.40282e+38
 
// Difference between 1.0 and the next larger float
float ep = FLT_EPSILON;
DebugLog(ep); // 1.19209e-07

fenv.h/cfenv gives us fine-grain control over how the CPU deals with floating point numbers. There’s no real equivalent to this in C#:

#include <fenv.h>
 
// Clear CPU float exceptions. Different than C++ exceptions.
feclearexcept(FE_ALL_EXCEPT);
 
// Divide by zero
// Use volatile to prevent the compiler from removing this
volatile float n = 1.0f;
volatile float d = 0.0f;
volatile float q = n / d;
 
// Check float exceptions to see if this was a divide by zero or produced
// an inexact result
int divByZero = fetestexcept(FE_DIVBYZERO);
int inexact = fetestexcept(FE_INEXACT);
DebugLog(divByZero != 0); // true
DebugLog(inexact != 0); // false
 
// Clear float exceptions
feclearexcept(FE_ALL_EXCEPT);
 
// Perform a division whose quotient can't be represented exactly
d = 10.0f;
q = n / d;
 
// Check float exceptions
divByZero = fetestexcept(FE_DIVBYZERO);
inexact = fetestexcept(FE_INEXACT);
DebugLog(divByZero != 0); // false
DebugLog(inexact != 0); // true

Strings and Arrays

The next category of headers deals with strings and arrays. Let’s start with string.h/cstring which has a lot of operations that are built into the string class, managed arrays, and Buffer in C#:

#include <string.h>
 
// Compare strings: 0 for equality, -1 for less than, 1 for greater than
DebugLog(strcmp("hello", "hello")); // 0
DebugLog(strcmp("goodbye", "hello")); // -1
 
// Copy a string
char buf[32];
strcpy(buf, "hello");
DebugLog(buf);
 
// Concatenate strings
strcat(buf + 5, " world");
DebugLog(buf); // hello world
 
// Count characters in a string (its length)
// This iterates until NUL is found
DebugLog(strlen(buf)); // 11
 
// Get a pointer to the first occurrence of a character in a string
DebugLog(strchr(buf, 'o')); // o world
 
// Get a pointer to the first occurrence of a string in a string
DebugLog(strstr(buf, "ll")); // llo world
 
// Get a pointer to the next "token" in a string, separated by a delimiter
// Stores state globally: not thread-safe
char* next = strtok(buf, " ");
DebugLog(next); // hello
next = strtok(nullptr, ""); // null means to continue the global state
DebugLog(next); // world
 
// Copy the first three bytes of buf ("hel") to later in the buffer
memcpy(buf + 3, buf, 3);
DebugLog(buf); // helhelworld
 
// Set all bytes in buf to 65 and put a NUL at the end
memset(buf, 65, 31);
buf[31] = 0;
DebugLog(buf); // AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

wchar.h/cwchar is the equivalent for “wide” characters. Support for various character types in C# is provided by the System.Text namespace, which has similar functionality to what’s in this header:

#include <wchar.h>
 
wchar_t input[] = L"foo,bar,baz";
 
// Get the first token, using 'state' to hold the tokenization state
wchar_t* state;
wchar_t* token = wcstok(input, L",", &state);
DebugLog(token); // foo
 
// Get the second token
token = wcstok(nullptr, L",", &state);
DebugLog(token); // bar
 
// Get the third token
token = wcstok(nullptr, L",", &state);
DebugLog(token); // baz

ctype.h/cctype has functions related to just characters. C’s lack of a bool type means 0 is used instead of false and non-0 is used instead of true. C# doesn’t use ASCII natively, so this is approximated by ASCIIEncoding there:

#include <ctype.h>
 
// Check for alphabetical characters
DebugLog(isalpha('a') != 0); // true
DebugLog(isalpha('9') != 0); // false
 
// Check for digit characters
DebugLog(isdigit('a') != 0); // false
DebugLog(isdigit('9') != 0); // true
 
// Change to uppercase
DebugLog(toupper('a')); // A

wctype.h/cwctype is the equivalent for “wide” characters. A lot of this is built into the char type in C#:

#include <wctype.h>
 
// Check for alphabetical characters
DebugLog(iswalpha(L'a') != 0); // true
DebugLog(iswalpha(L'9') != 0); // false
 
// Check for digit characters
DebugLog(iswdigit(L'a') != 0); // false
DebugLog(iswdigit(L'9') != 0); // true
 
// Change to uppercase
DebugLog(towupper(L'a')); // A

uchar.h/cuchar has character conversion functions. The “encoding” classes in C#’s System.Text namespace provide for these conversions in .NET:

// Convert to UTF-16
char input[] = "A";
char16_t output;
mbstate_t state{};
size_t len = mbrtoc16(&output, input, MB_CUR_MAX, &state);
DebugLog(len); // 1
uint8_t* outputBytes = (uint8_t*)&output;
DebugLog(outputBytes[0], outputBytes[1]); // 65, 0

Language Tools

This category of header files includes a range of tools that aren’t part of the C or C++ language, but are closely tied to it or would be built into other languages.

First up is stdarg.h/cstdarg. This header contains the required types and macros to implement variadic function. These are uncommonly used in C++ since variadic templates are available, easier to use, and type-safe. In C#, we’d use the params keyword to have the compiler generate a managed array of arguments at the call site. Here’s how to use the va_ macros to implement a variadic function:

#include <stdarg.h>
 
// The "..." indicates a variadic function
void PrintLogs(int count, ...)
{
    // A va_list holds the state
    va_list args;
 
    // Use the "va_start" macro to start getting args
    va_start(args, count);
 
    for (int i = 0; i < count; ++i)
    {
        // Use the "va_arg" macro to get the next arg
        const char* log = va_arg(args, const char*);
        DebugLog(log);
    }
 
    // Use the "va_end" macro to stop getting args
    va_end(args);
}
 
// Call the variadic function
PrintLogs(3, "foo", "bar", "baz"); // foo, bar, baz

Next there is assert.h/cassert containing the assert macro. If the NDEBUG preprocessor symbol is defined, this checks if a condition is false and calls std::abort to end the program, possibly with additional debugging steps such as breaking an interactive debugger. If the condition is true, nothing happens. If NDEBUG isn’t defined, the condition itself is stripped out of the program and not compiled. In C#, we’d use the [Conditional] attribute to build an assert or make use of the existing Debug.Assert:

#include <assert.h>
 
assert(2 + 2 == 4); // OK
assert(2 + 2 == 5); // Calls std::abort and maybe more

Then we have setjmp.h/csetjmp, used to implement a high-powered version of goto. This can jump outside of a function, but by breaking these normal language rules eschews the normal destructor calls that are used to clean up local objects. None of this is available in C#:

#include <setjmp.h>
 
// Saved execution state
jmp_buf buf;
 
// Use volatile to prevent the compiler from optimizing this away
volatile int count = 0;
 
void Goo()
{
    count++;
    DebugLog("Goo calling longjmp with", count);
 
    // Go to the saved execution state and pass 'count' as the 'status'
    longjmp(buf, count);
}
 
void Foo()
{
    DebugLog("Foo");
 
    // Save the execution state
    // When longjmp is called, execution goes here
    // The passed 'status' is "returned" from setjmp
    int status = setjmp(buf);
    DebugLog("Foo got status", status);
    if (status >= 3)
    {
        return;
    }
    DebugLog("Foo calling Goo");
    Goo();
}

This prints the following:

Foo
Foo got status, 0
Foo calling Goo
Goo calling longjmp with, 1
Foo got status, 1
Foo calling Goo
Goo calling longjmp with, 2
Foo got status, 2
Foo calling Goo
Goo calling longjmp with, 3
Foo got status, 3

Lastly, there’s errno.h/cerrno. This header provides the errno macro that holds a global error flag used by several C Standard Library functions. This is generally considered to be a poor way of handling errors as it’s not thread-safe and the caller needs to know to check something that isn’t part of the function signature. It’s never used in C#, so there’s really no equivalent. It is widely used in the C Standard Library though, so let’s see how it works:

#include <errno.h>
 
// Pass an invalid argument to sqrt (from math.h)
float root = sqrt(-1.0f);
 
// It returns NaN
DebugLog(root); // NaN
 
// It signals this error by setting errno to EDOM (out of domain)
DebugLog(errno); // Maybe 33
 
// Check that this is what was set
DebugLog(errno == EDOM); // true

System Integration

The last category of header files deals with the system on which we run our programs. Let’s start with time.h/ctime which is like a basic version of DateTime in C#:

#include <time.h>
 
// Get the time in the return value and in the pointer we pass
time_t t1{};
time_t t2 = time(&t1);
DebugLog(t1, t2); // Maybe 1612052060, 1612052060
 
// Get the amount of CPU time the program has used
// Not in relation to any particular time (like the UNIX epoch)
clock_t c1 = clock();
 
// Do something expensive we want to benchmark
volatile float f = 123456;
for (int i = 0; i < 1000000; ++i)
{
    f = sqrtf(f);
}
 
// Check the clock again
clock_t c2 = clock();
double secs = ((double)(c2) - c1) / CLOCKS_PER_SEC;
DebugLog("Took", secs, "seconds"); // Maybe: Took 0.011 seconds

We also have signal.h/csignal to deal with OS signals. This allows us to deal with signals such as being terminated by the OS and to raise such signals ourselves. This isn’t normally done with C# as the .NET environment our program is running in handles such signals:

#include <signal.h>
 
signal(SIGTERM, [](int val){DebugLog("terminated with", val); });
raise(SIGTERM); // Maybe: terminated with 15

Many C Standard Library functions use a global “locale” setting to determine how they work. The locale.h/clocale header file has functions to change this setting. It’s similar to the thread-specific CultureInfo in C#:

#include <locale.h>
 
// Set the locale for everything to Japanese
// This is global: not thread-safe
setlocale(LC_ALL, "ja_JP.UTF-8");
 
// Get the global locale
lconv* lc = localeconv();
DebugLog(lc->currency_symbol); // Â¥

And finally, we’ll end with the header that enables “Hello, world!” in C: stdio.h/cstdio. This is like Console in C#. There’s also file system access, similar to the methods of File in C#:

#include <stdio.h>
 
// Output a formatted string to stdout
// The first string is the "format string" with value placeholders: %s %d
// Subsequent values must match the placeholders' types
// This is a variadic function
printf("%s %d\n", "Hello, world!", 123); // Hello, world! 123
 
// Read a value from stdin
// The same "format string" is used to accept different types
int val;
int numValsRead = scanf("%d", &val);
DebugLog(numValsRead); // {1 if the user entered a number, else 0}
if (numValsRead == 1)
{
    DebugLog(val); // {Number the user typed}
}
 
// Open a file, seek to its send, get the position, and close it
FILE* file = fopen("/path/to/myfile.dat", "r");
fseek(file, 0, SEEK_END);
long len = ftell(file);
fclose(file);
DebugLog(len); // {Number of bytes in the file}
 
// Delete a file
int deleted = remove("/path/to/deleteme.dat");
DebugLog(deleted == 0); // True if the file was deleted
 
// Rename a file
int renamed = rename("/path/to/oldname.dat", "/path/to/newname.dat");
DebugLog(renamed == 0); // True if the file was renamed

Conclusion

The C Standard Library is very old, but still very commonly used. Being so old and based on the much less powerful C, a lot of its design leaves a lot to be desired. The global states used by functions like rand and strtok and macros like errno aren’t thread-safe and are difficult to understand how to use correctly. Using special int values, even inconsistently, instead of more structured outputs like exceptions and enumerations is similarly difficult to use.

Regardless of any complaints we may have about the C Standard Library’s design, we still need to know how to use it. The C++ Standard Library offers alternatives to much of what we’ve seen here today, but that’s not always the case. Sure, we can swap in <random> for rand, <chrono> for time, and <filesystem> for remove, but assert and stdint.h remain the most modern standardized ways of achieving those areas of functionality.

From here on we’ll be covering the C++ part of the C++ Standard Library. We’ll see a lot more modern designs for areas like containers, algorithms, I/O, strings, math, and threading!

C++ For C# Developers: Part 38 – C Standard Library