Measuring Error in Float Functions

As I have written about before, we all know that our floating point calculations contain error from accumulated rounding error and that error can distribute unevenly across the range of inputs for that function based on the distribution of results in each calculation.

Checking this error at run-time would involve a lot of overhead but having a method to analyse the function before the program is released is useful for detecting any edge cases and getting an idea of the minimum and maximum error to help you calculate a correct epsilon for handling errors. (Ask me about epsilons in physics libraries to get my full rant on this...)

Luckily, this kind of function checking is very simple. We simply need to create a replacement 'float' type which mimics all floating point calculations at the highest precision and stores this error in the type. This type can then combine it's own error with others when used together to get an understanding of the cumulative offset.

A class which replaces a standard type like this will need to have the correct constructors and operator overrides to cover all the use cases so that we don't get compile errors or type mismatches or even worse - blindly building a new instance and losing our cumulative error count.

The main elements of this class are the 'float' value it is wrapping and a high precision error counter. We can't use float for this error counter as it will be vulnerable to the same level of rounding error as the float that it is wrapping.

(Optionally: I added a string to keep the name of the float to make debugging and checking that we are counting the error correctly easier)

With that in mind, you should end up with a class that looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class FloatError
{
public:

	FloatError(float _x)
	{
		static int floatIDCounter = 0;

		m_name = "FloatID_";
		m_name.append(std::to_string(floatIDCounter));

		m_value = _x;
		m_error = 0.0;
	}

        /* other constructors go here... */

	//Operators overload
	FloatError operator+(const FloatError& _in)
	{
		float val = m_value + _in.m_value;
		long double highResResult = ((long double)m_value + (long double)_in.m_value);

		long double error = abs((highResResult - (long double)val ));
		m_error += error + _in.m_error;
		return FloatError(val, m_error);
	}

        /* Others go here... */

	std::string m_name;
	float m_value;
	long double m_error;
};

This implementation of the operator+ override is the important part here. Each operation is repeated while cast to the higher precision type. The difference of these two results is then computed to give us a high level approximation of the rounding error that has been incurred. This result is then combined with the errors of the two numbers being summed to give us the total error in the final result.

It would be tempting in this implementation to not take the absolute value of the error to store the total offset from the correct answer rather than allowing errors to cancel each other out by taking the negative and positive errors. However, that would only be correct for measuring when values are added or subtracted. When dealing with all operations we would have to handle how the offset value of the multiply or division would increase error with the number it is being applied to - which is a more complex problem than we are addressing here. In this instance we are only looking at analysing the total rounding error - for now.

With this all in place we want to be able to use this class on a function taking floats. The example we will use will be a function which takes a fixed array of floating point numbers and add them all together returning the sum value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#define float FloatError
float Sum(std::array<float, 200>& _in)
{
	float err(0.0);
	for (auto i : _in)
	{
		err = err + i;
	}
	return err;
}
#define float float

In the code example you will see that we have wrapped the function in a define to override the float type and replace it with our FloatError type. We can then call this function from our test function which generates arrays of random values in a fixed range and then tests the Sum  function with those values to try and find a maximum and minimum error in that range. (Note: for any array of random values in this example there should exist values that produce zero error.) 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void TestFloatError(const float _min = 0.f, const float _max = 1000.f, const int _sampleCount = 50)
{
	std::array<FloatError, 200> numbers;
	float lo = _min;
	float hi = _max;
	long double minerr = 1000000.f;
	long double maxerr = 0.f;

	for (int j = 0; j < _sampleCount; j++)
	{
		for (int i = 0; i < 200; i++)
		{
			numbers[i] = getRandom(lo, hi);
		}

		FloatError errRes = Sum(numbers);

		std::cout << "ERROR: " << errRes.m_error << ".\n";

		minerr = std::min(minerr, errRes.m_error);
		maxerr = std::max(maxerr, errRes.m_error);
	}
	std::cout << "MIN ERROR: " << minerr << ".\n";
	std::cout << "MAX ERROR: " << maxerr << ".\n";
}

Once it has tested the function multiple times it will output and min and max error while also outputting the error for each sample as it runs. Which in the end looks like this for the range (0,1000.f):

ERROR: 0.219635.
ERROR: 0.202362.
ERROR: 0.211487.
ERROR: 0.231384.
ERROR: 0.206512.

... Many results ...

ERROR: 0.218292.
ERROR: 0.212372.
ERROR: 0.235107.
ERROR: 0.181488.
ERROR: 0.217163.
ERROR: 0.231659.
ERROR: 0.227386.
ERROR: 0.22171.
ERROR: 0.2005.
ERROR: 0.221191.
ERROR: 0.23172.
ERROR: 0.243927.
ERROR: 0.213196.
ERROR: 0.214783.
ERROR: 0.219055.
ERROR: 0.229004.
MIN ERROR: 0.181488.
MAX ERROR: 0.243927.

So we can see that the total error that is possible in even this small sample is not insignificant - and that is only with additions of float values which already sit on floating point steps.

I hope this was useful to you and that you can apply it in your own code to help visualise the error that is hidden in even your simplest algorithms!

 

 

Defining Approximate Type Systems To Allow for Controlled Precision

Many programmers writing software today are unaware of the precision with which they are currently writing applications. A general rule of convenience, there not being any errors and no large performance impact dictate many choices in the types and algorithms being used to get the results desired.

Quite often it is seen that a programmer will use an 'int' array to store numbers only ranging from zero to four. In this situation the programmer has given far too much precision in the type and is losing memory performance by having to load all the extra bit representing numbers above four. Additionally a programmer may use a math function such as 'std::sinf' to calculate some value of sin for an input but they are only interested in the result of sin to three decimal places - in which case 'std::sinf' is doing far more work that is necessary and working at a much higher precision to conform to the C Math Specification.

The need to run at a lower precision can come from all sorts of requirements. From the need to fit the calculation within a set number or cycles or simply to reduce the power consumption of the device. Many of the techniques employed in functions to get lower precision or estimated results are referenced under the category of Approximate Computation.

In the case of types, most modern programming languages do not provide a type system which allows a clear selection of the precision for most basic types. The options are often quantised to 8/16/32/64/128-bit representations to make an easier match to the common hardware available. This is known as Software-Hardware Co-design. 

However, in the case of functions we are given the full control to calculate the precision to what we need and then store into one of the predefined types. There are all sorts of numerical techniques for calculating the result of different mathematical functions to different precision and some of these can be seen in the differences between math libraries which do not conform to the C Math Specification.

We can conclude that we are, for practical reasons, stuck with standard types but can allow any function to work a given precision. How would we go about it if we wanted to implement our own approximate 'std::sinf' function for a small section of a large codebase already using the standard library implementation?
We could do that quite simply by adding our new function like any other - but what if we wanted to pass a 'float' to that function? We would need to create our own namespace, or rename the function - this would mean significant rewrites to the code base.
And we would want to make the use of this approximate variation safe - the result of a lower precision sin mixed with the real precision sin could cause some trouble as any trigonometric identities would also be approximate. Basically, we want to ensure we don't mix an approximate result with a precise one without the programmer explicitly stating to do so.

To meet the requirements above we can use a wrapper type around basic types with a templated precision flag. This type can then be used for overloading the call to 'sinf' - so that no function calls need to change and through the type system a 'float' could not be used with an 'ApproxType<float>' without the programmer explicitly requesting the base 'float' type from the wrapper. This means the programmer would have to change the types of the objects in the code base that are going to be less accurate but none of the functional logic. Additionally, it allows for the user to specify a maximum level of error that this type accepts, by allowing this we can statically switch calls made with the approximate type to the function which provides the requested level of precision.

For example, a float which says it accepts 10% error in its result that is passed to 'sinf' will be forwarded to an implementation of 'sinf' that at minimum meets that requirement.

So, with those rules and a plan - this is how to implement such a design in C++!

We want to start with the Approximate Wrapper Type. This is our class which will essentially be the type of the object that is normally passed to the function but is marked up statically with the level of accuracy desired. It is also given a 'Get' function to allow the data to be removed from the object. This is the method by which the programmer is expressing that the accuracy is final and safe to use outside of the type-safety. Ideally, we would want to also have the type specify minimum and maximum values and include all the copy rules to allow that to work - but for this basic implementation this will do to show the concept.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
template< typename T, int accNumurator >
class ApproxType
{
public:
	ApproxType(T _in)
	{
		m_data = _in;
	}

	T& Get()
	{
		return m_data;
	}

	T	m_data;
};

In this example I am using a Taylor Approximation of Sin generated through my build system. It looks a bit scary but is just a simple polynomial.

1
2
3
4
5
6
7
constexpr float ApproxSin_Order7(const float _x) noexcept
{
	return ((-0.00001633129816711214847969706187580f*(1.0f)) + (1.00037993800406788125201273942366242f*(_x)) 
			+ (-0.00211754585380569430516639606310036f*(_x* _x)) + (-0.16175605700248782414796266948542325f*(_x* _x * _x)) 
			+ (-0.00577124859440479066885476555626155f*(_x* _x * _x * _x)) + (0.01203154857279084034848981588083916f*(_x* _x * _x * _x * _x)) 
			+ (-0.00127402703149161102696984571025496f*(_x* _x * _x * _x * _x * _x)) + (-0.00000043655425207306325578100769137f*(_x* _x * _x * _x * _x * _x * _x)));
}

Now we want to Overload 'sinf' with our function which takes our wrapped approximate type. In this function we are specifying how accurate our approximate implementation is (accuracy = 8) and using an 'std::conditional' to select which function should be called. In this case, if the approximate type that is passed in allows this level or accuracy then it will call the Taylor Approximation of Sin  other wise it will simply call 'std::sinf'. 

This logic is all performed with constexpr and static types where possible so that this code can be evaluated at compile time to reduce this down to as close as possible to a single function call.

1
2
3
4
5
6
7
8
template< typename T, int accNumurator >
float sinf(ApproxType<T, accNumurator> _x)
{
	static constexpr int accuracy = 8;
	typedef std::conditional < accuracy >= accNumurator, FuncWrapper<ApproxSin_Order7>, FuncWrapper<std::sinf> >::type A;

	return A::Call(_x.Get());
}

You may have noticed in the above example that the 'std::conditional' doesn't actually contain function pointers but function pointers wrapped in a class called FuncWrapper. This is because std::conditional only takes types but we want to be able to do a compile-time selection. So we wrap the function pointer is a class which has a static function call to run the wrapped function. The call to 'std::conditional' gives us the correct type to then call the function on. It is a little bit of a hack, but is what has to be done for now until C++ gets good support for a Static If.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
template< float func(float) >
class FuncWrapper
{
public:

	static float Call(float _in)
	{
		return func(_in);
	}
};

If we put this all together we can see how the code is used. We havent had to make any special calls to the function we replaced as it is all handled by the overloading of the function. The compiler has then generated the correct function calls for each based on the level of accuracy we allow for each of the ApproxType's. 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int main()
{
	float var = 0.5f;
	ApproxType<float, 6> approx(var);
	ApproxType<float, 9> approxhigh(var);
	
	float approxreslow  = sinf(approx);		// Calls the approximate function
	float approxreshigh = sinf(approxhigh); // Calls std::sinf from the Approximate Wrapper
	float accurateres	= sinf(var);		// Calls std::sinf

	return 0;
}

These functions can now be dropped into a codebase and then through specifying where you want the types to be approximate, we can access all the possibilities of Approximate Computing with very minimal changes.

 

Thanks for reading. All the code in one place:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <cmath>

template< typename T, int accNumurator >
class ApproxType
{
public:
	ApproxType(T _in)
	{
		m_data = _in;
	}

	T& Get()
	{
		return m_data;
	}

	T	m_data;
};

template< float func(float) >
class FuncWrapper
{
public:

	static float Call(float _in)
	{
		return func(_in);
	}
};

constexpr float ApproxSin_Order7(const float _x) noexcept
{
	return ((-0.00001633129816711214847969706187580f*(1.0f)) + (1.00037993800406788125201273942366242f*(_x)) 
			+ (-0.00211754585380569430516639606310036f*(_x* _x)) + (-0.16175605700248782414796266948542325f*(_x* _x * _x)) 
			+ (-0.00577124859440479066885476555626155f*(_x* _x * _x * _x)) + (0.01203154857279084034848981588083916f*(_x* _x * _x * _x * _x)) 
			+ (-0.00127402703149161102696984571025496f*(_x* _x * _x * _x * _x * _x)) + (-0.00000043655425207306325578100769137f*(_x* _x * _x * _x * _x * _x * _x)));
}

template< typename T, int accNumurator >
float sinf(ApproxType<T, accNumurator> _x)
{
	static constexpr int accuracy = 8;
	typedef std::conditional < accuracy >= accNumurator, FuncWrapper<ApproxSin_Order7>, FuncWrapper<std::sinf> >::type A;

	return A::Call(_x.Get());
}

int main()
{
	float var = 0.5f;
	ApproxType<float, 6> approx(var);
	ApproxType<float, 9> approxhigh(var);
	
	float approxreslow  = sinf(approx);		// Calls the approximate function
	float approxreshigh = sinf(approxhigh); // Calls std::sinf from the Approximate Wrapper
	float accurateres	= sinf(var);		// Calls std::sinf

	return 0;
}

Approximate methods for handling error in multi-threaded solutions

In many modern applications we rely on the use of many multi-threaded algorithms, some of which are ran across CPUs and some of which are ran as GPGPU tasks.

For many types of programs this usually involves taking some task and splitting it into separate parts and recombining or processing the results of each separate part into a singular answer.

tasks_good.jpg

When each of separate threads executes correctly we are given the absolute correct answer in, hopefully, less time than if the task hadnt been separated into sub-components.

But what happens if one of the sub-components of the task failed? In most applications this would be a catastrophic error and we would end the program with no result. For a program which is doing a lot of heavy computation that could mean the loss of a whole lot of processed data.

To avoid that, it is possible to employ methods which detect a fault in a single sub-component and resubmit it for processing, exclude it from results or in some way handle the error and try and recover or nicely drop out of the program in a way that isn't too lossy.

However, what if the task we were running was made out of hundreds of thousands of components, where the result of executing each component could be predicted, modelled or in some way approximated from the input data and the results being produces by the other non-failing sub-components? In that situation we might be able to continue running the application by simply replacing the failed execution with an approximate answer and reporting to the program that it reported an error and give some statistical impact of that error. Such as up-to x% based on the number of sub-components and how many reported errors.

In that case you could run some work and have a work graph that looks like below.

tasks_bad.jpg

In this new model that allows a failed/slow/broken task to be approximated from its input and neighbours we can add some stability to programs that don't require 100% accuracy and precision. If a program is fault-tolerant in its algorithms and results then it gives greater freedom to handle error or hardware problems which are guaranteed to occur on a long enough time scale.

Programs which allow for this condition are interesting in other ways too. If the program can handle up-to 10% error and still function and we have a task that is split into 1000 sub-tasks, and each sub-task is independent and side-effect free then we can manually add error into the program to increase performance if our approximation technique for failed sub-tasks is cheaper than the task that it was performing. This would give us an error that at worst within the threshold of error the program allows, and at best close to the correct answer.

In the case of artificially inducing error in the application we could reveal potential to reduce the power required by the computer to compute that tasks as well as improve the programs overall performance. We would want to induce this state in an application when the trade-off between accuracy and performance in non-linear, i.e. a small drop in accuracy gives a much more valuable benefit in another criteria.

There are many opportunities to use this technique in mobile computing, sensors and distributed computing as well as more general simulation technology to allow us to build better performing and more stable applications.

C++11/14/17 Library Usage on Github

I recently collected a list of all the top level classes and free functions that are added for each of the modern C++ revisions (11/14/17) so that I could scan the Modern C++ example repository to find which library classes and functions don't yet have implementations. (Turns out... a lot!)

However, I am quite interested in how much of these modern standards are actually being adopted in real-world applications. As it was my experience when working in industry that the majority of C++ programmers, if not faced with wide usage of the modern concepts, did not independently do the research and work to keep up to date with the latest additions and methods. This leads to most code I see in the private sector looking like C++98 with the odd 'auto' and modern 'for' loop thrown in for convenience. 

Alternatively, you would see many classes and functions from Boost, which have now been ported over to the standard library - but I will come to this later.

To be able to get some solid figures on what is actually being used, I looked to the only large public set of searchable repositories we have - Github.
I used the Github HTTP API to query for the most popular (starred) repositories that are listed as C++. I then cloned all of these repositories to my machine (286 projects all together approx 120GB+ download...) and did some text analysis looking for the correct includes project wide and testing to see if these functions/classes were called or used. This is not the most accurate solution to this problem and will produce some false positives with identically named functions or negatives with complexly included headers - but for a quick pass this should give us the broad stroke usages. Ideally, we would want to build each project and see what is actually being linked during the compilation or do some more knowledgeable static analysis. But until then... results!

C++11

Total_Output11.csv.png
Total_Output11.csv_pie.png

I was quite shocked by the lack of C++11 classes and functions being used. However, C++11 did introduce an awful lot of quite obscure parts and many variations - so this might not be as bad as it looks.

C++14

C++14 fared a lot better than C++11 which makes sense as C++14 is a revision which corrects a lot of the little things that C++11 missed and generally adds the little useful things that became obvious only after C++11 was finalised ( such as the std::rbegin/rend standalone functions).

C++17

I was honestly surprised to see as much usage of the C++17 standard as we did. Some of the functions that were used saw even higher usage than some C++11 functions that were used. Unsurprisingly the parts that were used have been the ones that you can see by the name are pretty simple and understandable - some of the more complex things in there havent shown up on any searches yet!

MORE WORK

As I mentioned above, some of the standard things in C++11 are based on some popular parts of the Boost library. I am going to do a search for Boost usage next and see if there is significant crossover which would explain the delay in some projects making the switch as that would be an awful big risk to get very little gain!

C++ noexcept and move constructors effect on performance in STL Containers

A little known feature of some of the C++ standard library containers is that in the latest C++ Standard they will use the C++11 move operation to more cheaply resize the container when possible.

A move operation is cheaper than a straight copy because it performs fewer operations at the cost of trashing the source object. This is because the memory is not copied and then reassigned, simply reassigned. (full details here if you are unfamiliar)

The actual mechanism which tells you if the container is using a move constructor or copy constructor is quite opaque to the user and the actual condition for it being switched not well documented.

class MyClass
{
    ...

    //This is a correctly formed copy constructor
    MyClass ( MyClass&& _input) noexcept
    {
     ... do something ...
    }

    ...

};

An STL container can only use the move constructor in it's resizing operation if that constructor does not break its strong exception safety guarantee. In more plain language, it wont use the move constructor of an object if that can throw an exception. This is because if an exception is thrown in the move then the data that was being processed could be lost, where as in a copy constructor the original will not be changed.

So, how do we tell the STL library that our move constructor wont throw any exceptions? Simple, we use the "noexcept" function specifier!

Tagging our move constructor with "noexcept" tells the compiler that it will not throw any exceptions. This condition is checked in C++ using the type trait function: "std::is_no_throw_move_constructible". This function will tell you whether the specifier is correctly set on your move constructor. If you haven't wrote a move constructor yourself the compiler generated one should pass this test. This is really only a concern if you have implemented your own. If your type passes the test then when the STL container calls "std::move_if_noexcept" it will correctly move the object instead of falling back to a copy.

template<typename Y>
MoveProperties GetProps()
{
	MoveProperties mp;
	mp.moveassignable               = std::is_move_assignable<Y>::value;
	mp.moveconstructible            = std::is_move_constructible<Y>::value;
	mp.moveTriviallyAssignable      = std::is_trivially_move_assignable<Y>::value;
	mp.moveTriviallyConstructible   = std::is_trivially_move_constructible<Y>::value;
	mp.moveNoThrowAssignable        = std::is_nothrow_move_assignable<Y>::value;
	mp.moveNoThrowConstructible     = std::is_nothrow_move_constructible<Y>::value;
	return mp;
}

We can use these tests to build two classes, one which has a noexcept copy contructor and one without. Then place them into a large STL container and time them to see what a difference this can make to the performance of the system.

So firstly, two classes:

#define ArraySize 100

unsigned int staticCounter = 0;
unsigned int copyCount = 0;
unsigned int moveCount = 0;

class NoThrowMoveConstructible
{
public:
	NoThrowMoveConstructible() 
	{ 
		m_memberVariable = new std::array<int, ArraySize>();
		m_memberVariable->fill(staticCounter++); 
	}
	~NoThrowMoveConstructible()
	{
		if (m_memberVariable != nullptr)
		{
			delete m_memberVariable;
		}
	}
	NoThrowMoveConstructible(NoThrowMoveConstructible&&	_in) noexcept 
	{
		moveCount++;
		this->m_memberVariable = _in.m_memberVariable;
		_in.m_memberVariable = nullptr;
	}
	NoThrowMoveConstructible(const NoThrowMoveConstructible& _in) noexcept
	{ 
		copyCount++;
		this->m_memberVariable = new std::array<int, ArraySize>(*(_in.m_memberVariable));
	}

	std::array<int, ArraySize>* m_memberVariable;
};

class CantBeNoThrowMoveConstructible
{
public:
	CantBeNoThrowMoveConstructible() 
	{ 
		m_memberVariable = new std::array<int, ArraySize>(); 
		m_memberVariable->fill(staticCounter++);
	}

	~CantBeNoThrowMoveConstructible()
	{
		if (m_memberVariable != nullptr)
		{
			delete m_memberVariable;
		}
	}

	CantBeNoThrowMoveConstructible(CantBeNoThrowMoveConstructible&& _in) 
	{ 
		moveCount++;
		this->m_memberVariable = _in.m_memberVariable;
		_in.m_memberVariable = nullptr;
	}

	CantBeNoThrowMoveConstructible(const CantBeNoThrowMoveConstructible& _in) 
	{ 
		copyCount++;
		this->m_memberVariable = new std::array<int, ArraySize>(*(_in.m_memberVariable));
	}

	std::array<int, ArraySize>* m_memberVariable;
};

These two classes are identical except for the "noexcept" being applied to its copy and move constructors. They are each simply a wrapper around a dynamically assigned array of integers. We are using an "std::array" here so that if we were to do thorough testing it would be trivial to test copy/move on different sized objects.

Using our "Props" function we can see that the classes have the correct properties. No throw move construction is false on our "CantBeNoThrowMoveConstructible" class and true on our "NoThrowMoveConstructible" class. They are otherwise identical when it comes to move properties.

    NoThrowMoveConstructible:
	moveassignable              false
	moveconstructible           true
	moveTriviallyAssignable     false
	moveTriviallyConstructible  false
	moveNoThrowAssignable	    false
	moveNoThrowConstructible    true

    CantBeNoThrowMoveConstructible:
	moveassignable	            false	
	moveconstructible           true	
	moveTriviallyAssignable     false	
	moveTriviallyConstructible  false	
	moveNoThrowAssignable       false	
	moveNoThrowConstructible    false	

If we were to run this analysis on any basic class or class where we havent overriden any of the move operator functions or constructors we would see all these properties return true. This is because the default implementations of these functions which are automatically added to your class if you don't replace them (much like the default destructors) are created with the correct noexcept properties to allow moving the objects.

Now that we know the classes are behaving as we would expect we can run a simple timing function to judge the performance and using the counters verify that behavior at run-time.

template< typename Y >
std::vector<Y> OutputTimeForFillingVector(unsigned int vectorSize)
{
	copyCount = 0;
	moveCount = 0;
	std::vector<Y> vectorToFill;

	Timer<std::chrono::milliseconds> t;
	t.Start();

	for (int i = 0; i < vectorSize; i++)
	{
		Y val;
		vectorToFill.emplace_back(val);
	}

	std::vector<Y> copyVector = vectorToFill;

	std::cout << "Move Operations: " << moveCount << ". Copy Operations: " << copyCount << "\n";
	std::cout << "Time taken for " << typeid(Y).name() << ": " << t.GetElapsedTime().count() << "ms.\n";

	return copyVector;
}

int main()
{
	MoveProperties NoThrowMoveable		= GetProps<NoThrowMoveConstructible>();
	MoveProperties NoneNoThrowMoveable = GetProps<CantBeNoThrowMoveConstructible>();
	
	unsigned int vecSize = 1000000;
	
	std::cout << "Can do move operations: \n";
	OutputTimeForFillingVector<NoThrowMoveConstructible>(vecSize);
	std::cout << "Can't do move operations: \n";
	OutputTimeForFillingVector<CantBeNoThrowMoveConstructible>(vecSize);

	system("pause");
	return 0;
}

This simple code simply creates a vector of size "vecSize" by repeatedly adding a single element to the vector. The time it takes for this to happen and the number of copies and moves that were performed on the underlying object is then output to the console for us to analyse.

Can do move operations:
Move Operations: 2099753. Copy Operations: 2000000
Time taken for class NoThrowMoveConstructible: 469ms.
Can't do move operations:
Move Operations: 0. Copy Operations: 4099753
Time taken for class CantBeNoThrowMoveConstructible: 926ms.

This leads to the class which cannot be correctly moved taking nearly twice as long as the one that can, to be placed into a simple vector. Importantly you can see the effect of this in the Visual Studio Diagnostics panel's process memory watcher.

stdmove.png

We can see that the "std::move" enabled class has a smooth increase in memory as the memory is allocated and moved to the copy of the object in the vector leading to a steady increase. Where as the copy constructed class has to frequently make copies of its memory and deallocate the the memory in the dead objects leading to a  bouncing rise in memory over a longer time due to the extra work that is having to be done.

The lesson in all of this is that if you touch the copy-constructors or operators make sure to assign the correct exception handling status to the functions to be able to enable the optimal behaviour. If you never touch these at all, there is a good chance they will default to the correct behaviour, but it is always good to check!

Any size math vector using Modern C++

In the normal case of working with any graphics or games programming code you inevitably end up with a lot of classes for handling 2D/3D points in the world. Usually as a direct map to GLSL/HLSL's float1, float2, float3 and float4 or the int and double equivalents.

This can quickly get messy to maintain when you have tens of classes to update for any small change - not to mention tedious. A resolution  to this problem is taking advantage of templates, the std::array classes and some nice new features of the language to try and (almost) make a readable single class that can be initialised to the correct type and number of elements to be used. Ideally for maximum performance we want to keep the dozens of classes to make performance analysis easier and therefore easier to make go fast, but for more general or less close to the metal applications I am presenting the class 'vecN'

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
template<typename T, unsigned int elemCount>
struct vecN
{
public:
	vecN() {}

	template<class... Args>
	vecN(Args... args)
	{
		const int n = sizeof...(Args);
		static_assert(n == elemCount, "Invalid number of arguments for vector type");

		data = { { args... } };
	}

	std::array<T, elemCount> data;
}

This is the basic backbone of the class. It is a simple vector class that can initialised like - 'vecN<float,4> someFloat4();' or  'vecN<float,4> someFloat4(1.f, 2.f, 3.f, 4.f);'.

It takes the two template parameters ('float' and '4') and uses them to initialise an std::array. The constructors offered are a simple blank constructor which will leave the array with default uninitialised values or there is a variadic template constructor. The variadic template constructor will take the number of elements in the array as an input and initialise the std::array with those values. It uses a static_assert to check at compile-time that we can only pass the correct number of elements to the array, this prevents us from passing too many or too few elements to the array.

With the class just as it is, we are able to use the vector - but a vector should provide more functionality than a simple array!

1
2
3
4
5
6
7
8
	template<class... Args>
	void Set(Args... args)
	{
		const int n = sizeof...(Args);
		static_assert(n == elemCount, "Invalid number of arguments for vector type");

		data = { { args... } };
	}

We want to be able to set the values in the vector, so we can use the same code as the constructor to write a function to allow the values in the vector to be set after construction.

A good vector class should also provide basic mathematical function to make the vector easy to use with simple types and other vectors

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
	void Multiply(T _mulVal)
	{
		std::for_each(data.begin(), data.end(), [&_mulVal](T& elem) { elem*= _mulVal; });
	}

	void Multiply(vecN<T,elemCount> _mulVal)
	{
		int counter = 0;
		std::for_each(data.begin(), data.end(), [&counter, &_mulVal](T& elem) { elem *= _mulVal.data[counter++]; });
	}

	void Add(T _addVal)
	{
		std::for_each(data.begin(), data.end(), [&_addVal](T& elem) { elem += _addVal; });
	}

	void Add(vecN<T, elemCount> _addVal)
	{
		int counter = 0;
		std::for_each(data.begin(), data.end(), [&counter, &_addVal](T& elem) { elem += _addVal.data[counter++]; });
	}

	float Length()
	{
		std::array<T, elemCount> dataSqr = data;
		std::for_each(dataSqr.begin(), dataSqr.end(), [](T& elem) { elem = elem*elem; });
		T sum = std::accumulate(dataSqr.begin(), dataSqr.end(), (T)0);
		return sqrt(sum);
	}

	void Normalise()
	{
		T len = 1.0/Length();
		std::for_each(data.begin(), data.end(), [&len](T& elem) { elem *= len; });
	}

This code makes the class actually useful for some simple linear algebra. In this code we are using 'std::for_each' from <algorithm> and 'std::accumulate' from <numeric> to allow us to work with our arbitrary sized array generically with relative ease and confidence in the code that it will generate. C++17 allows us to apply different execution patterns on these functions, but that has not been enabled in my compiler just yet, so for now we are sticking with default behaviour, but this is an area that could be improved soon - particularly if we wanted to use a very large vector such as 'vecN<float, 2048>' !

Next we want to think of the common use cases for the classes - such as having a 'float3' and needed to pass a 'float4' which is quite common when writing to some textures or in shader pipelines.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
	vecN<T, elemCount - 1> PopElem()
	{
		vecN<T, elemCount - 1> output;
		std::copy_n(data.begin(), elemCount-1, output.data.begin());
		return output;
	}

	vecN<T, elemCount + 1> PushElem(T _value = 0.0)
	{
		vecN<T, elemCount + 1> output;
		std::copy_n(data.begin(), elemCount, output.data.begin());
		output.data[elemCount] = _value;
		return output;
	}

	template<int _size>
	constexpr vecN<T, _size> GetResizedVector()
	{
		vecN<T, _size> output;
		std::copy_n(data.begin(), std::min((unsigned int)(_size), elemCount), output.data.begin());
		return output;
	}

Using the push and pop functions provided here we can arbitrarily return a copy of the vector with one more or less element. Or, we can use 'GetResizedVector' to request an arbitrary sized copy when we need to do larger adjustments. Nothing too fancy going on with these functions. We are using 'std::copy_n' to copy a sub-range of the array when shrinking, in the past this might have been done with memcpy, but this is a lot safer.

Next we want to look at accessing data from the vector in a nice way.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
	constexpr T& GetElement(int _index)
	{
		return data[_index];
	}

	template<int _index>
	constexpr T& GetElement()
	{
		return data[_index];
	}

These two little functions give the user two ways to access the data. The second function is done using the template in the hope that the compiler will recognise the constant pointer offset and save on evaluating '[_index]' at run-time where possible - just like how a normal 'someFloat3.x' would behave. They are also constexpr where possible to allow for aggressive compilers to resolve values where ever possible.

This isn't sufficient for all use cases though we also want to provide standard x/y/z/w/xyz access.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
	constexpr T& x()
	{
		static_assert(1 <= elemCount, "Invalid number of arguments for vector type");
		return data[0];
	}

	constexpr T& y()
	{
		static_assert(2 <= elemCount, "Invalid number of arguments for vector type");
		return data[1];
	}

	constexpr T& z()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		return data[2];
	}

	constexpr T& w()
	{
		static_assert(4 <= elemCount, "Invalid number of arguments for vector type");
		return data[3];
	}

	constexpr std::array<T, 3> xxx()
	{
		static_assert(1 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[0], data[0], data[0] };
		return xyz;
	}

... and many more ...

These functions, and many more not shown provide for all the element access methods that a user will be used to when dealing with 'float1-4' types. Where possible we are statically asserting to prevent out of range data access, so 'someFloat3.z()' will only work on arrays that have at least 3 elements.

But what if our user has a 6 element array and wants the first and last elements only...?

1
2
3
4
5
6
7
	template<class... Indexs>
	constexpr vecN<T, sizeof...(Indexs)> GetOrderedArrayOfIndices(Indexs... indxs)
	{
		vecN<T, sizeof...(Indexs)> output;
		output.data = { { data[indxs]... } };
		return output;
	}

In that case, we have to use variadics again and similarly to how we initialised the array of the class with the arguments in the constructor and 'Set' function we need to expand the variadic input, but this time, expanding with the pattern of 'data[indxs]...' so that instead of getting:
{ { value1, value2, value3 and on and on } }
we get:
{ {data[value1], data[value2], data[value3] and on and on } }

This allows us to call the function like 'someVector4.GetOrderedArrayOfIndicies(0,3);' to get a vector with two elements which hold the first and last element values of 'someVector4'. With this method we can arbitrarily construct arrays from other array or perform reordering.

Finally, we want to be able to work with these types without excessive typing of hard to read code, so we can hide most of the templating in the types with a few simple typedefs...

1
2
3
4
typedef vecN<float, 4> float4;
typedef vecN<float, 3> float3;
typedef vecN<float, 2> float2;
typedef vecN<float, 1> float1;

So, now the user can simple create a 'float4' and use it like any other class they are used to- but easily convert sizes and access elements however they want.

Thanks for reading!

 

Full Code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
template<typename T, unsigned int elemCount>
struct vecN
{
public:
	vecN() {}

	template<class... Args>
	vecN(Args... args)
	{
		Set(args...);
	}

	template<class... Args>
	void Set(Args... args)
	{
		const int n = sizeof...(Args);
		static_assert(n == elemCount, "Invalid number of arguments for vector type");

		data = { { args... } };
	}

	void Multiply(T _mulVal)
	{
		std::for_each(data.begin(), data.end(), [&_mulVal](T& elem) { elem*= _mulVal; });
	}

	void Multiply(vecN<T,elemCount> _mulVal)
	{
		int counter = 0;
		std::for_each(data.begin(), data.end(), [&counter, &_mulVal](T& elem) { elem *= _mulVal.data[counter++]; });
	}

	void Add(T _addVal)
	{
		std::for_each(data.begin(), data.end(), [&_addVal](T& elem) { elem += _addVal; });
	}

	void Add(vecN<T, elemCount> _addVal)
	{
		int counter = 0;
		std::for_each(data.begin(), data.end(), [&counter, &_addVal](T& elem) { elem += _addVal.data[counter++]; });
	}

	float Length()
	{
		std::array<T, elemCount> dataSqr = data;
		std::for_each(dataSqr.begin(), dataSqr.end(), [](T& elem) { elem = elem*elem; });
		T sum = std::accumulate(dataSqr.begin(), dataSqr.end(), (T)0);
		return sqrt(sum);
	}

	void Normalise()
	{
		T len = 1.0/Length();
		std::for_each(data.begin(), data.end(), [&len](T& elem) { elem *= len; });
	}

	vecN<T, elemCount - 1> PopElem()
	{
		vecN<T, elemCount - 1> output;
		std::copy_n(data.begin(), elemCount-1, output.data.begin());
		return output;
	}

	vecN<T, elemCount + 1> PushElem(T _value = 0.0)
	{
		vecN<T, elemCount + 1> output;
		std::copy_n(data.begin(), elemCount, output.data.begin());
		output.data[elemCount] = _value;
		return output;
	}

	template<int _size>
	constexpr vecN<T, _size> GetResizedVector()
	{
		vecN<T, _size> output;
		std::copy_n(data.begin(), std::min((unsigned int)(_size), elemCount), output.data.begin());
		return output;
	}

	constexpr T& GetElement(int _index)
	{
		return data[_index];
	}

	template<int _index>
	constexpr T& GetElement()
	{
		return data[_index];
	}

	template<class... Indexs>
	constexpr vecN<T, sizeof...(Indexs)> GetOrderedArrayOfIndices(Indexs... indxs)
	{
		vecN<T, sizeof...(Indexs)> output;
		output.data = { { data[indxs]... } };
		return output;
	}

	constexpr T& x()
	{
		static_assert(1 <= elemCount, "Invalid number of arguments for vector type");
		return data[0];
	}

	constexpr T& y()
	{
		static_assert(2 <= elemCount, "Invalid number of arguments for vector type");
		return data[1];
	}

	constexpr T& z()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		return data[2];
	}

	constexpr T& w()
	{
		static_assert(4 <= elemCount, "Invalid number of arguments for vector type");
		return data[3];
	}

	constexpr std::array<T, 3> xxx()
	{
		static_assert(1 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[0], data[0], data[0] };
		return xyz;
	}

	constexpr std::array<T, 3> yyy()
	{
		static_assert(2 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[1], data[1], data[1] };
		return xyz;
	}

	constexpr std::array<T, 3> zzz()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[2], data[2], data[2] };
		return xyz;
	}

	constexpr std::array<T, 3> xyz()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[0], data[1], data[2] };
		return xyz;
	}
	constexpr std::array<T, 3> xzy()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[0], data[2], data[1] };
		return xyz;
	}

	constexpr std::array<T, 3> yzx()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[1], data[2], data[0] };
		return xyz;
	}

	constexpr std::array<T, 3> yxz()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[1], data[0], data[2] };
		return xyz;
	}

	constexpr std::array<T, 3> zyx()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[2], data[1], data[0] };
		return xyz;
	}

	constexpr std::array<T, 3> zxy()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 3> xyz = { data[2], data[0], data[1] };
		return xyz;
	}

	constexpr std::array<T, 2> xx()
	{
		static_assert(1 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[0], data[0] };
		return xyz;
	}

	constexpr std::array<T, 2> xy()
	{
		static_assert(2 <= elemCount, "Invalid number of arguments for vector type");

		std::array<T, 2> xyz = { data[0], data[1] };
		return xyz;
	}

	constexpr std::array<T, 2> xz()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[0], data[2] };
		return xyz;
	}

	constexpr std::array<T, 2> yx()
	{
		static_assert(2 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[1], data[0] };
		return xyz;
	}

	constexpr std::array<T, 2> yy()
	{
		static_assert(2 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[1], data[1] };
		return xyz;
	}

	constexpr std::array<T, 2> yz()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[1], data[2] };
		return xyz;
	}

	constexpr std::array<T, 2> zx()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[2], data[0] };
		return xyz;
	}
	constexpr std::array<T, 2> zy()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[2], data[1] };
		return xyz;
	}
	constexpr std::array<T, 2> zz()
	{
		static_assert(3 <= elemCount, "Invalid number of arguments for vector type");
		std::array<T, 2> xyz = { data[2], data[1] };
		return xyz;
	}

	std::array<T, elemCount> data;
};

typedef vecN<float, 4> float4;
typedef vecN<float, 3> float3;
typedef vecN<float, 2> float2;
typedef vecN<float, 1> float1;

Calculating Floating-Point Error in std::random_device

Floating-point representation in programming is inherently error prone as we are not able to perfectly represent real numbers without an infinite number of bits. As a result we have a standard (https://en.wikipedia.org/wiki/IEEE_754) which has some quirky behaviours. One of which is that the error in representing real numbers get larger as the number you are trying to represent moves away from 0. The scale of this error and how fast it grows depends on the number of bits we have to represent that and it is not inherently obvious when this error occurs.

If we have perform multiple arithmetic operations then the rounding error from those calculations could be increasing. To calculate what this error is we need to be able to understand the maximum error for the given number. As a small guide, I have created a graph (below) as an easy reference.

DOwnIMOXUAAxXFe.jpg

In a later article I will go into how this can effect even simple geometric or purely numerical calculations.
For now I am going to show some code for calculating this error and then using this error to test the accuracy and understanding the actual range of results you will get from using the standard library random function generator.

In this example we want to choose a range for the random number generator, Look at the floating point representation of that range to see how many possible outcomes we could have and then test to see if we do get those outcomes and how frequently each one is hit to see how often we are getting duplicates within a range.

First step - Getting the number of values available in float in our range (between low and high).

1
2
3
4
	float nextAfter = std::nextafterf(_low, _high);
	double minStep = (double)nextAfter - (double)_low;
	double numberOfSteps = ((double)_high - (double)_low) / minStep;
	numberOfSteps += 1.0;

In this example we are using std::nextafter to find the next possible floating point representation and then casting to a larger bit representation (to reduce error in this calculation) and dividing the range by this value. This will only give correct results for small ranges as the step size will change as the number gets bigger. (To calculate this perfectly for large ranges there are other less concise approaches - this will be shown in a later post).

Next, we want to create our random device and initialise the range for it to work on:

1
2
3
4
	std::random_device	rd;
	std::mt19937		gen(rd());
	std::uniform_real_distribution<float> uniformRealDistribution(_low, _high);
	float randomResult = uniformRealDistribution(gen);

This code creates a device which produces random numbers in the range between '_low' and '_high'. Pretty standard stuff.

Now we want to create an std::map which holds a floating point result and a counter. This will allow us to count the occurrences of each outcomes from our random number generator and the number of times that number was output.

1
	std::map<float, int> occuranceCounter;

The final step is simply putting this all together in a function to populate the std::map and output the results

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <map>
#include <random>
#include <iostream>

void TestRandomDistriubtion(float _low, float _high, int sampleSize)
{
	std::random_device	rd;
	std::mt19937		gen(rd());
	std::uniform_real_distribution<float> uniformRealDistribution(_low, _high);
	std::map<float, int> occuranceCounter;

	float nextAfter			= std::nextafterf(_low, _high);
	double minStep			= (double)nextAfter - (double)_low;
	double numberOfSteps	= ((double)_high - (double)_low) / minStep;

	std::cout << "Max Possible outcomes: " << numberOfSteps + 1 << ". ";

	for (unsigned int i = 0; i < sampleSize; i++ )
	{
		float randomResult = uniformRealDistribution(gen);

		if (occuranceCounter.find(randomResult) == occuranceCounter.end())
			occuranceCounter[randomResult] = 1;
		else
			occuranceCounter[randomResult] += 1;
	}


	int size = occuranceCounter.size();
	int counter = 0;
	for (auto& kv : occuranceCounter)
	{
		if (kv.second != 1)
		{
			counter++;
		}
	}

	float percent = (float)counter / (float)size;
	std::cout << "repeat: " << percent*100.f << "%. Total numbers is " << size << ".\n";
}

This function will then output text like this for 10000 samples:

Max Possible outcomes: 1.04858e+06. repeat: 0.401647%. Total numbers is 9959.

And then if we write a loop we can look at the output of this for different magnitudes of input values:

Testing for values from 1 to 2
==================
Max Possible outcomes: 8388609. repeat: 0.080064051%. Total numbers is 9992.
==================
Testing for values from 10 to 11
==================
Max Possible outcomes: 1048577. repeat: 0.41168794%. Total numbers is 9959.
==================
Testing for values from 100 to 101
==================
Max Possible outcomes: 131073. repeat: 3.5566156%. Total numbers is 9644.
==================
Testing for values from 1000 to 1001
==================
Max Possible outcomes: 16385. repeat: 27.622238%. Total numbers is 7465.
==================
Testing for values from 10000 to 10001
==================
Max Possible outcomes: 1025. repeat: 100%. Total numbers is 1025.
==================
Testing for values from 100000 to 100001
==================
Max Possible outcomes: 129. repeat: 100%. Total numbers is 129.
==================
Testing for values from 1000000 to 1000001
==================
Max Possible outcomes: 17. repeat: 100%. Total numbers is 17.
==================
Testing for values from 10000000 to 10000001
==================
Max Possible outcomes: 2. repeat: 100%. Total numbers is 2.
==================

If we look at the results we can see how the number of possible values match the graph at the start of this post. We are slowly reducing the maximum number of outcomes from our random number generator until the point where we can only get the left or right bound value when we are at 10 million. Which at such a large number is quite understandable and probably something you have ran into when working with float numbers in similar situations. However, you will notice that at even small numbers like between 1000-1001 we are quantised to only 16385 possible results. For a lot of applications that is a huge margin of error compared to the accuracy you get between zero and one.

To resolve this we can use bigger data types but that isn't always ideal. So quite a lot of papers recommend working at offsets and then rounding back up to reduce error, but this doesnt totally solve the problem. I will be doing more posts in this area as I finish my research in a related topic where I will be going over some of the coping solutions applications which don't need 100% accuracy of results frequently use.

Full source code for generating the above output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <map>
#include <random>
#include <iostream>
#include <iomanip>

void TestRandomDistriubtion(float _low, float _high, int sampleSize)
{
	std::random_device	rd;
	std::mt19937		gen(rd());
	std::uniform_real_distribution<float> uniformRealDistribution(_low, _high);
	std::map<float, int> occuranceCounter;

	float nextAfter = std::nextafterf(_low, _high);
	double minStep = (double)nextAfter - (double)_low;
	double numberOfSteps = ((double)_high - (double)_low) / minStep;

	std::cout << "Max Possible outcomes: " << numberOfSteps + 1 << ". ";

	for (unsigned int i = 0; i < sampleSize; i++)
	{
		float randomResult = uniformRealDistribution(gen);

		if (occuranceCounter.find(randomResult) == occuranceCounter.end())
			occuranceCounter[randomResult] = 1;
		else
			occuranceCounter[randomResult] += 1;
	}


	int size = occuranceCounter.size();
	int counter = 0;
	for (auto& kv : occuranceCounter)
	{
		if (kv.second != 1)
		{
			counter++;
		}
	}

	float percent = (float)counter / (float)size;
	std::cout << "repeat: " << percent*100.f << "%. Total numbers is " << size << ".\n";
}

int main()
{
	float minTestValues[] = { 1.f, 10.f, 100.f, 1000.f, 10000.f, 100000.f, 1000000.f, 10000000.f };
	int arraySize = 8;
	int sampleCount = 10000;

	for (int i = 0; i < arraySize; i++)
	{
		std::cout << std::setprecision(10) << "Testing for values from " << minTestValues[i] << " to " << minTestValues[i] + 1.f << "\n==================\n" << std::setprecision(8);
		TestRandomDistriubtion(minTestValues[i], minTestValues[i] + 1.f, sampleCount);
		std::cout << "==================\n";
	}


	return 0;
}

 

 

 

 

 

C++ ConstExpr Compile-time Lookup Table Generation

In big projects it is not uncommon for there to be large generated data-sets that based on simple parametric functions. Especially in graphical applications.

This often means writing a program to generate the results and then having a large data file that is imported or sometimes even copied into a normal code file. This can lead to lots of problems that are caused by changes in functions that generate them, synchronising them with the project or even just a programmer accidentally changing a single value in the list.

To avoid this you can generate the tables in the application, however this adds the problem of startup costs which are not good for the end-user experience.

To avoid these issues in modern C++ it is possible to generate the function result tables at compile. Here I will show a simple demo of how to generate and access one of these tables for any function which can be evaluated at compile time.

Approach

The general approach to this problem is to have a constant table that can be generated once at compile time and then a number of functions for accessing this data at runtime in a controlled error free way.

For this implementation the array will be a 'static constexpr std::array<Type, Size>' that will be dynamically created at compile time. This will be created using template functions to population the array with the correct size and type.

To access and read this array we will want a template function which takes the desired input value for the function and performs the lookup with the correct parameters. By making the compile time array static in the lookup function we will be able to wrap this nicely all in one place.

We will also provide the lookup functions that will allow the user to pass around the array how they like. This is so that if they want to use the array in multiple function it wont instantiate a static instance of the array for each one.

Array Generation

This at first looks a bit complex, but is actually quite simple once it has been parsed out.
The function MakeLookUpTable constructs a sequence from 1->N of numbers to pass to the helper (std::make_index_sequence<N>{}). This code sequence is then used in a variadic fashion to construct the array in MakeLookUpTableHelper. 
In MakeLookUpTableHelper the sequence 'CounterSequence' is used to construct the array by changing its value from 1,2,3,4,5, to the value of the input for the function f(x) for that index.

So if we wanted 5 samples across the the range 0-1 it would change the sequence from 1,2,3,4,5 to 0, 0.2, 0.4... (we can switch to N+1 to be inclusive)

These values are then passed to our function to get the result for the function for that value and it is stored in the array.

//##########Table Generation
//Takes a given list of indices and computes the input value for the index across the range.
template<class Function, std::size_t... CounterSequence>
constexpr auto MakeLookUpTableHelper(Function f, const float _min, const float _stepSize, std::index_sequence<CounterSequence...>)
->std::array<typename std::result_of<Function(float)>::type, sizeof...(Indices)>
{
	return { { f(_min + (CounterSequence*_stepSize))... } };
}

//Passes through the correct index list object
template<int N, class Function>
constexpr auto MakeLookUpTable(Function f, const float _min, const float _stepSize)
->std::array<typename std::result_of<Function(std::size_t)>::type, N>
{
	return MakeLookUpTableHelper(f, _min, _stepSize, std::make_index_sequence<N>{});
}

Lookup Functions

The look up functions are relatively quite simple.

For the direct lookups 'TableInterp(float _x)' we simple statically instantiate our array and then using the values we have stored for the size and range of the array we can extract the correct value back from the table.

In the case of the interop function we are doing the same thing (usage note: calling both of these will create two identical tables in memory...) but we are taking two values from the table and using the percentage distance to the next value we blend linearly to the next value. This allows us to approximate values between those stored in the table.

For the versions of the function that take an 'arrayIn' we are providing a method for the user to generate a table or swap in a run-time editable table or any mix. This lets the programmer choose how they want to control the memory and is just nice to have.

#define TARGET_RANGE_MIN 0.f
#define TARGET_RANGE_MAX 6.28f
#define TARGET_RANGE_SAMPLES 1000
#define TARGET_RANGE_STEP_SIZE (float)(TARGET_RANGE_MAX-TARGET_RANGE_MIN) /(float)TARGET_RANGE_SAMPLES
#define TARGET_RANGE_GET_INDEX(x) ((float)((x + (0.000001f) - TARGET_RANGE_MIN))) / ((float)TARGET_RANGE_STEP_SIZE)
//##########Table Query
template <float(*approx)(float)>
constexpr float TableNoInterp(float _val)
{
	static constexpr auto lookupTable = MakeLookUpTable<TARGET_RANGE_SAMPLES>(approx, TARGET_RANGE_MIN, TARGET_RANGE_STEP_SIZE);

	return lookupTable[TARGET_RANGE_GET_INDEX(_val)];
}

template <typename T>
constexpr float TableNoInterp(float _val, T& arrayIn)
{
	return arrayIn[TARGET_RANGE_GET_INDEX(_val)];
}

template <float(*approx)(float)>
constexpr float TableInterp(float _val)
{
	static constexpr auto lookupTable = MakeLookUpTable<TARGET_RANGE_SAMPLES>(approx, TARGET_RANGE_MIN, TARGET_RANGE_STEP_SIZE);

	float pos = TARGET_RANGE_GET_INDEX(_val);
	int lowIndex = (int)pos;
	int highIndex = lowIndex + 1;
	float val0 = lookupTable[lowIndex];
	float val1 = lookupTable[highIndex];

	float scale = pos - floor(pos);
	float res = ((1.0f - scale) * val0) + ((scale)* val1);

	return res;
}

template <typename T>
constexpr float TableInterp(float _val, T& arrayIn)
{
	float pos = TARGET_RANGE_GET_INDEX(_val);
	int lowIndex = (int)pos;
	int highIndex = lowIndex + 1;
	float val0 = arrayIn[lowIndex];
	float val1 = arrayIn[highIndex];

	float scale = pos - floor(pos);
	float res = ((1.0f - scale) * val0) + ((scale)* val1);

	return res;
}

Usage

The usage of this code is then very simple:

//### Normal Code
//The table to be compiled at runtime must be a constexpr
constexpr float TestFun(float _x)
{
	return (_x*_x) + _x;
}

int main()
{
	//Example of table generation.
	auto a = MakeLookUpTable<TARGET_RANGE_SAMPLES>(TestFun, TARGET_RANGE_MIN, TARGET_RANGE_STEP_SIZE);

	//Example of using the lookup functions
	float result			= TableInterp<TestFun>(0.5f);
	float resultNoInterp	= TableNoInterp<TestFun>(0.5f);

	resultNoInterp	= TableNoInterp<decltype(a)&>(0.5f, a);
	result			= TableInterp<decltype(a)&>(0.5f, a);

	std::cout << "Iterp Result = " << result << "\n";
	std::cout << "NoIterp Result = " << resultNoInterp << "\n";

	return 0;
}

Not so bad!
I hope this is useful to someone!

All the code together:

#include <array>
#include <iomanip>
#include <iostream>
#include <cmath>

#define TARGET_RANGE_MIN 0.f
#define TARGET_RANGE_MAX 6.28f
#define TARGET_RANGE_SAMPLES 1000
#define TARGET_RANGE_STEP_SIZE (float)(TARGET_RANGE_MAX-TARGET_RANGE_MIN) /(float)TARGET_RANGE_SAMPLES
#define TARGET_RANGE_GET_INDEX(x) ((float)((x + (0.000001f) - TARGET_RANGE_MIN))) / ((float)TARGET_RANGE_STEP_SIZE)

//##########Table Generation
//Takes a given list of indices and computes the input value for the index across the range.
template<class Function, std::size_t... CounterSequence>
constexpr auto MakeLookUpTableHelper(Function f, const float _min, const float _stepSize, std::index_sequence<CounterSequence...>)
->std::array<typename std::result_of<Function(float)>::type, sizeof...(CounterSequence)>
{
	return { { f(_min + (CounterSequence*_stepSize))... } };
}

//Passes through the correct index list object
template<int N, class Function>
constexpr auto MakeLookUpTable(Function f, const float _min, const float _stepSize)
->std::array<typename std::result_of<Function(std::size_t)>::type, N>
{
	return MakeLookUpTableHelper(f, _min, _stepSize, std::make_index_sequence<N>{});
}


//##########Table Query
template <float(*approx)(float)>
constexpr float TableNoInterp(float _val)
{
	static constexpr auto lookupTable = MakeLookUpTable<TARGET_RANGE_SAMPLES>(approx, TARGET_RANGE_MIN, TARGET_RANGE_STEP_SIZE);

	return lookupTable[TARGET_RANGE_GET_INDEX(_val)];
}

template <typename T>
constexpr float TableNoInterp(float _val, T& arrayIn)
{
	return arrayIn[TARGET_RANGE_GET_INDEX(_val)];
}

template <float(*approx)(float)>
constexpr float TableInterp(float _val)
{
	static constexpr auto lookupTable = MakeLookUpTable<TARGET_RANGE_SAMPLES>(approx, TARGET_RANGE_MIN, TARGET_RANGE_STEP_SIZE);

	float pos = TARGET_RANGE_GET_INDEX(_val);
	int lowIndex = (int)pos;
	int highIndex = lowIndex + 1;
	float val0 = lookupTable[lowIndex];
	float val1 = lookupTable[highIndex];

	float scale = pos - floor(pos);
	float res = ((1.0f - scale) * val0) + ((scale)* val1);

	return res;
}

template <typename T>
constexpr float TableInterp(float _val, T& arrayIn)
{
	float pos = TARGET_RANGE_GET_INDEX(_val);
	int lowIndex = (int)pos;
	int highIndex = lowIndex + 1;
	float val0 = arrayIn[lowIndex];
	float val1 = arrayIn[highIndex];

	float scale = pos - floor(pos);
	float res = ((1.0f - scale) * val0) + ((scale)* val1);

	return res;
}


//### Normal Code
//The table to be compiled at runtime must be a constexpr
constexpr float TestFun(float _x)
{
	return (_x*_x) + _x;
}

int main()
{
	//Example of table generation.
	auto a = MakeLookUpTable<TARGET_RANGE_SAMPLES>(TestFun, TARGET_RANGE_MIN, TARGET_RANGE_STEP_SIZE);

	//Example of using the lookup functions
	float result			= TableInterp<TestFun>(0.5f);
	float resultNoInterp	= TableNoInterp<TestFun>(0.5f);

	resultNoInterp	= TableNoInterp<decltype(a)&>(0.5f, a);
	result			= TableInterp<decltype(a)&>(0.5f, a);

	std::cout << "Iterp Result = " << result << "\n";
	std::cout << "NoIterp Result = " << resultNoInterp << "\n";

	return 0;
}

C++11/14/17 Examples

If you have been keeping up with the latest in C++ news then you will have seen that C++17 has been finalised. Although the features of the update have been pretty consistent for a while now and support is getting there for most common compilers.

Some of the more obviously useful parts of the revision have been covered quite nicely in samples and tutorials. For some of the less well-known features there is a lack of samples and guides on how to best use them, especially for some of the more obscure parts of the library additions. 

To try and alleviate this, I have began to build a repository of every function and language feature of C++11/14/17. This can be found on github

It includes my own written samples for features where I couldn't find a good sample or use-case online as well as samples taken from cppreference.com and AnthonyCalandra/modern-cpp-features and other places (documented in the code) where the License was permissible.

The aim of the project is to produce a repository of samples which can be compiled and tested and kept up to date with the latest compiler support.

There is a still a lot missing in my library coverage map (this can be found for each revision in the root directory as C++_11_lib.csv etc.) but it is well on the way to being a usable resource.

You can find the repo below, please feel free to contribute and fix any dumb mistakes I may have made:

https://github.com/NTimmons/ModernCPP_Examples

Recruitment Shouldn't be a Numbers Game

I think we can all agree that the recruiter messages on Linkedin often leave an awful lot to be desired. I am not sure if this is across industries but for those of us in software we are all too often plagued with the most impersonal macro messages. Such as:

Hello {YOUR NAME}
We have a great opportunity!

{MUNDANE VAGUE JOB DESCRIPTION, POSSIBLY REFERRING TO YOU AS A NINJA, POSSIBLY NOT EVEN VAGUELY RELATED TO YOUR FIELD OF WORK}

If this isn’t suitable for you please pass it on!

Which must be sent out to so many people, probably in an automated way that the odds of it being useful to me are slim and it is basically spam and a little offensive to me and the compay they are representing that they didn't even vaguely read my past experience to make sure it was appropriate.

A major pet peeve is when they say "compensation based on experience". If you took the time to look at my profile which the message was sent to then you know, roughly, my experience and can give me a rough figure - and no, I will not speak for an hour on the phone before you give me the numbers.

When looking for a new job there are a few factors that are very important.

  • You have to be able to perform the job (Obviously)
  • The location should be suitable (You have to be able to physically do the job)
  • The compensation should be appropriate (Otherwise, you will never take the job)
  • The job should be appealing (Otherwise you won't stay)

So, if I am receiving multiple recruitment e-mails of the generic variety and they don't answer any of those questions it is not in my best interest to spend a lot of time interrogating the recruiter to get the information I need.

If I was a recruiter I would send out e-mails like this:

Hello {Candidate Name}

We are recruiting for a job in {Field of work} working on {General idea of the work}.

It is based in {Locations}. For this role we are looking to pay {Estimate pay} but can pay more for an experienced candidate.

{More detailed job description and any special requirements}

{Don’t ask them to do your job for them if they aren’t interested}

The best ways to contact me are: {Phone/Email + times}

Thanks
A totally polite and informative recruiter.

It would then give me all the information I would need to make an informed decision without having to mess around.

But that is just my rant for the day.

Bjarne Stroustrup: No Littering

Bjarne Stroustrup gave a very excellent talk at the Computer Laboratory on Friday, which I was luckily able to attend, where he discussed his work on developing safe ways to use pointers to ensure that you don't run into any memory leaks or bugs from incorrect usage and gave a great argument against garbage collection.

He was particularly advocating for his work on the C++ Core Guidelines which have been getting a lot of attention recently and the Microsoft GSL (Guideline Support Library) which he used in his examples exploring the ideas.

The combination of this library and the guidelines was used to demonstrate a more secure way to ensure correct pointer and memory behaviour along with a tool for Visual Studio to perform static analysis on the usage of pointers to give feedback through matching any errors or possibly mistakes it found with advice for how to avoid it in the Core Guidelines.

It all looked very sensible and not too invasive like some memory usage patterns, so I was quite impressed and will probably have a bit of a play with the code myself this afternoon.

What I wasn't expecting during the talk was how much of an enthusiast Stroustrup was for simple and effective code. A lot of experience I have had in the past with people who write very high level C++ is that it often becomes unreadable templates with features not yet in the current spec which only compiles on a fork of compiler 'x'. This was absolutely not the case here and it brightened my day.

"Within C++ is a smaller, simpler, safer language struggling to get out." -- Bjarne Stroustrup

I am quite excited to see where his work goes on this subject and glad to see it is all open (and secretly happy it integrates so well with Visual Studio).

 

For more details from the man himself, here is a slightly older version of the talk he gave:


Nick

 

EuroLLVM 2016 Work

I recently attended EuroLLVM 2016 to display some work I have done as part of my compilers module which seemed to go quite well.

As I am sending them a copy for the website I am also uploading it here as a mirror.

Alternative text - include a link to the PDF!

Build time - Run time distribution in Clang

I have recently done some research into the effects on build and run time performance for C/C++ applications being compiled with Clang when different optimisation passes are used.

The results of the raytracer test application showing strong clustering from the inclusion of inliner at different values and the GVN pass.

It produces some very interesting results. It showed that the '-O3' flag does not always use the most effective selection of optimisation passes for given code. And in fact, it may be more optimal (albeit very time consuming) to produce a custom optimisation pass list per compilation unit. As certain optimisation pass ordering produce good results for some patterns of code but bad results for others. Ideally a series of 'good' optimisation lists should be generated and matched against the patterns in the code that they handle best.

Here you can see the distribution of the experimental builds compared to the preset optimisation levels. The bottom right cluster is very important for showing that the same performance can be gained as -O3 for saving of over nearly 15% on build time.

The most interesting results showed the amount of compile time that could be saved for the same performance. This is due to some optimisation passes not being applicable to the code they are being ran on, essentially causing longer build times for no gain. It is possible to configure a benchmarking tool to find and remove these passes in a relatively short period of time. Same as the increase in performance this is highly dependent on the code that is being ran. For a very large complex application these kinds of gains would have be targeted at specific files, or groups of files that are being compiled.

A proposal will be submitted soon for this technique and results to be shown at EuroLLVM 2016.

 

 

Not all clock_t are equal

Here is a little warning to anyone who might fall into the same trap as me.
 I was using C++ standard clock() from <ctime> to measure the time it took for a program to execute on Windows. I should have read the the MSDN page about it. Visual Studio does not follow the standard for the clock() implementation. It should be measuring the time spent on the CPU but is actually a wall clock.

This was fine for my program as I wanted a wall clock and I was getting sensible results. However when I ran the same program on my Ubuntu machine the data I got back was less than useful.

I wanted to write a cross platform bit of code so I stuck to using standard objects. This hasn't worked out. The MSDN article suggests using GetProcessTimes to get the standard behaviour of clock() but that just isn't good.

I only wanted a wall clock anyway so I had to look for a different solution on both platforms.

The solution is std::chrono which was added in C++11.  It provides all the features I wanted for a wall clock and had somehow slipped past me. I didn't realise this was even part of the C++11 spec. Luckily though, Microsoft has implemented this one correctly and it can be used for getting the wall time on both platforms. Even if it is a little verbose.

#include <chrono>

int main()
{
    std::chrono::time_point<std::chrono::system_clock>  StartTime, EndTime;
    StartTime = std::chrono::system_clock::now();

    EndTime = std::chrono::system_clock::now();
    std::chrono::duration<double> seconds = EndTime - StartTime;

    std::cout << "Time: " << seconds.count() << " seconds.\n";
}



Starting Post-Grad at Cambridge University

It has always been a real ambition of mine to study at Cambridge or Oxford University. I am pleased to announce that I have just started my MPhil in Advanced Computer Science with the hopes to carry on my research into a full PhD.

Downing College is outright beautiful and I couldn't think of a nicer place to study. With that there is a MCR (Middle Common Room) which is a well looked after room for post-graduates with leather seats, a bar and a big TV. The people that run and look after it have been amazing as has everyone else I have had the pleasure to speak to at the College. The porters have been especially helpful in getting me moved in and all the forms sorted.

There is a lot to get used to here and going back to academia from the games industry is going to be tough. The term proper starts tomorrow and I cannot wait to get started.

Parsing A C++ Source File

So I have been looking into writing my own, very simple, compiler, assembler and linker. As I am, of course, a big fan of C++ I thought that would be a good place to start.

To everyone more experienced than me you probably know this was a terrible, terrible mistake.

The contexual nature of C++ means that to parse it I have ended up having to create a complicated "scope" tree. I then have to try and assign what type of scope this scope is. So this means that from the top of the "Scope" tree I have to walk down and assign based on source before the scope whether the scope is a function, a namespace, a class, a struct or default scope.

To determine this I also have to determine what a function is. This sounds trivial but it can be a bit deceptive. And people seem to want to use regular expressions and knowing the types before hand makes this easier but since we can declare types within functions this needs a few passes. It is complicated and a faff. Too much of a faff for Sunday morning relaxing code.

The long story short is that I made some good progress getting it there and my operations tree for the actually code part is working well but I am shelving that for now and I will start with implementing my own very simple language...

Tesla Selling Features Already In The Model S

I was just looking at the Tesla Model S Design Studio looking at the cost and features of the Tesla cars. Something caught my eye. Something that seems at odds with the impression I was getting from everything else. The rest of the site and everything else I had heard about Tesla and Elon Musk had given me the impression of a quite upfront and honest approach. Sure, the cars are expensive but it seemed that they were also very modern and it was quite clear where the money was going. I was even feeling alright about paying the higher cost just because it seemed like a good thing to support a company that was introducing genuinely modern features to a car beyond slightly better speakers (although that was an option) and smaller windows but this really annoyed me.

Let me show you.

Do you see the part that has struck me as pretty awful? It is right there at the bottom.

They are charging you £2,500 to enable a feature that must already be in the car. This reminds of all the problems we have seen in the games industry of selling customers day one DLC. Where the customer has bought a game but cant access a feature which will already be on the disk until they pay a little bit extra. It is very annoying.

I know, I know. I work in software, I realise this is just selling the software, so of course it can just be enabled when it is delivered. This isn't buying a small computer and installing a nice bit of software you wanted though. This is a £50-£100k already designed and fitted with the exact tools and sensors for this exact piece of software. Somewhere along the lines there has been a decision made in the production process where people have intentionally made two versions or two options for the software. One with all the features and with less. Then decided to charge 2-5% extra on the price of the car to make it use it all. They haven't done more work to enable these features for the car, you know that they are key selling points of the vehicle. They have done more work to allow them to be disabled and tacked on a price tag.

I can see how this exists though. At the price range that the car is being sold in it is beyond the point where the extra £2,100 matters. It's a small difference to the customer already paying £50k+ but to the company selling a few thousands of these a 2-5% mark-up is significant.

At the end of the day though, to me this is sleazy marketing. I just want to buy a fully functional car. If the cost of developing that software is really £2,100 then make the starting price £52,100 and that is fine. Just don't try and sell me something already in the car after the fact. It's not cool and £2,100 per car is an awfully low cost to undo the respectable image that has taken years to cultivate.

 

 

 

Applications can force a reboot in Windows 8

Ok, this is something I haven't seen before. I had to run the Google Software Removal Tool after an annoying bit of software kept changing my home page in Chrome.

At the end of running it it said a restart is required. Fair enough, no problems there. So I say OK expecting it either to directly shut down the computer or for windows to ask me. What happened was odd.

Windows displayed one of those banners, like it does when you shut down with applications running and it asks you if you want to force close the programs which aren't closing? Except it just has the text "A tool is requesting a reboot" with the only option being "Close". I really wish I had the chance to take a picture of it. So I select close to dismiss the banner expecting to be able to choose to cancel it (I was in the middle of something...) or continue but didn't get the opportunity. Apparently that banner exists just to tell you it is happening, I guess? But because it is a full screen banner I was unable to do anything such as save my work or send that e-mail I was writing. It was just a message to let you know what it is doing. It didn't even say which program had requested the reboot, which would be actual useful information to have.

I can sort of see the useful aspect of having this come up when something is trying to restart my computer, that is pretty smart. I can't see the point of it if when happens I have no control to stop it or save work and no information if it wasn't coming from a source I expected. Frankly, it was very annoying.

How big is a class with a static variable?

Given two classes each with two integer member variables but one of the classes' integers is a static variable. What is the size of each of these classes?

Well, the naive answer would be to assume that as both classes have two integers and an integer is (usually) four bytes then both classes would be 8 bytes. 

This isn't  true though, as a static member variable of a class is not actually stored in the class.

Here is an example of some code showing this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <stdio.h>
#include <string.h>

class NoStatic
{
public:
	NoStatic(){}

	int val0;
	int val1;
};

class StaticClass
{
public:
	StaticClass(){}

	int val0;
	static int val1;
};

int StaticClass::val1 = 0;

int main()
{
	int sizeNoStatic = sizeof(NoStatic);	//8 bytes
	int sizeStatic = sizeof(StaticClass);	//4 bytes


	return 0;
}

This is an important thing to consider when writing generic code and a good argument against just copying memory from one data structure to another without really looking at the code.

What do you think would happen in this example? :

1
2
3
4
5
6
7
	StaticClass classTestStatic;
	classTestStatic.val0 = 10;
	classTestStatic.val1 = 15;

	NoStatic classTest;

	memcpy(&classTest, &classTestStatic, 8);

The none static version of the class would not be given the value from the static variable in the static version of the class.

If we run the code and look at the memory addresses of the StaticClass instance we see this

So, we can see that variable 'val0' is stored at the same address of the class (0x00ddfa9c), meaning it is the first variable in the class. So if this was a non-static class we would expect 'val1' to be 4 bytes after this address at 0xddfaa0 however it is at 0x00b88130 which shows us it is not stored in the class and proves to us that a static is, like people say, nothing more than a glorified global variable.

So the real question for the reader is: Why is a class with all static variables one byte in size?

Thanks for reading!

 

Pooled Object Allocator

As part of a little test I have written a pooled object allocator. This is the first time I have written something like this and I quite enjoyed it. I dare say that I have probably fallen foul of some of the little gotchas in developing one of these but hopefully on my next pass or if I use it a little they may become apparent. Overall though for a little exercise I learned a little and wasn't a bad use of a spare hour or so.

THE CODE

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
#include <utility> //std::forward
#include <stack>   //std::stack

template<class inputType>
class SimplePool
{
public:

	SimplePool(unsigned int _capacity, int _overrideTypeSize = -1)
		:
		m_allocatedCount(0u),
		m_rawMemory(NULL),
		m_endOfMemory(NULL),
		m_typeSize(0),
		m_maxCount(0)
	{
		// This override type size might sound odd, but can be useful if you need to add padding but don't want to add it to the class.
		m_typeSize = _overrideTypeSize != -1 ? _overrideTypeSize : sizeof(inputType);
		
		// Assign the base memory pool we will allocate from.
		m_rawMemory = static_cast<inputType*>( malloc(_capacity * m_typeSize) );

		//Checking assignment was possible.
		if (m_rawMemory)
		{
			m_endOfMemory = m_rawMemory + _capacity;
			m_maxCount = _capacity;

			//We could also use a vector and reserve all the memory ahead of time and keep track of where we are in it
			//but using a stack for simplicities sake.
			for (unsigned int i = 0; i < m_maxCount; i++)
			{
				m_locations.push(m_rawMemory + i);
			}
		}
	}

	~SimplePool()
	{
		// Freeing all the assigned memory.
		free(m_rawMemory);

		// Cleaning up. You never know this pool might be in a pool.
		m_rawMemory		= NULL;
		m_endOfMemory		= NULL;
		m_allocatedCount	= 0u;
		m_typeSize		= 0u;
		m_maxCount		= 0u;
	}

	bool IsInitialised()
	{
		//Check to see if the pool is active.
		return m_rawMemory != NULL;
	}

	bool Allocate(inputType** _inputPointer)
	{
		//Check we have any room.
		if (m_allocatedCount < m_maxCount)
		{
			//If we have any free slots.
			if (m_locations.size())
			{
				//Always taking from the top of the pile - hoping the memory is still hot.
				(*_inputPointer) = m_locations.top();
				m_locations.pop();
				
				//increment the in use count.
				++m_allocatedCount;
				return true;
			}
		}
		return false;
	}

	//Separate Allocation when allocating something that needs a constructor.
	template <typename ...Args>
	bool Allocate_Construct(inputType** _inputPointer, Args&&... args)
	{
		//Check we have any room.
		if (m_allocatedCount < m_maxCount)
		{
			//If we have any free slots.
			if (m_locations.size())
			{
				//Always taking from the top of the pile - hoping the memory is still hot.
				inputType* addressPtr = m_locations.top();

				*_inputPointer = new((addressPtr)) inputType(std::forward<Args>(args)...);
				m_locations.pop();

				//increment the in use count.
				++m_allocatedCount;
				return true;
			}
		}
		return false;
	}

	bool Deallocate(inputType* _objectToRemove, bool _clear = false)
	{
		//Check the memory we are deleting is in the pool
		if ( (_objectToRemove > m_rawMemory) && (_objectToRemove < m_endOfMemory))
		{
			if (_clear)
			{
				memset(_objectToRemove, 0, m_typeSize);
			}

			//Add deleted object back to the list.
			m_locations.push(_objectToRemove);
			--m_allocatedCount;

			return true;
		}
		return false;
	}

	bool Deallocate_Destruct(inputType* _objectToRemove, bool _clear = false)
	{
		//Check the memory we are deleting is in the pool
		if ((_objectToRemove > m_rawMemory) && (_objectToRemove < m_endOfMemory))
		{
			//Call destructor allocated object.
			_objectToRemove->~inputType();

			//Add deleted object back to the list.
			m_locations.push(_objectToRemove);
			--m_allocatedCount;

			return true;
		}
		return false;
	}


	inputType*			m_rawMemory;
	inputType*			m_endOfMemory;
	unsigned int			m_allocatedCount;
	unsigned int			m_typeSize;
	unsigned int			m_maxCount;
	std::stack<inputType*>		m_locations;
};

int main()
{
	SimplePool<unsigned int> pool(1024u);

	//Test!
	//--------------------------------------------------------
	for (unsigned int i = 0; i < 1280; i++)
	{
		unsigned int* testOutput = 0;
		pool.Allocate_Construct(&testOutput, i);		
		pool.Deallocate_Destruct(testOutput, true);

		//pool.Allocate(&testOutput);		
		//pool.Deallocate(testOutput, true);
	}

	unsigned int* testdealloc = (unsigned int*)45656756765;
	pool.Deallocate(testdealloc, true);

	testdealloc = 0;
	pool.Deallocate(testdealloc, true);

	return 0;
}