data-oriented and data-driven programming.

529 views

I’ve got a good idea of object-oriented, but explanations of the other two just confuse me.

In: Technology

2 Answers

Anonymous 0 Comments

Structures model data, classes model (often stateful) behavior. OOP is not about smart data, if you have a bunch of getters and setters (accessors and mutators) you don’t have any encapsulation – because changing a member changes it’s accessors and mutators, which is a change to the interface, which necessitates a recompilation of all source files dependent on that header, and a change to all the code that depends on its former interface, which may have a cascade of necessary changes. This isn’t OOP.

The OOP you learn in school isn’t OOP at all – those little single page programs you write are designed to illustrate one single concept at a time, they’re not meant to teach you anything about good or bad programming. And people since the 90s have extrapolated that OOP means putting all your data into classes and your instances into containers, all the while you have “smart data”.

Your OOP classes tend to model reality. If you have a car, it’s got a year, make, model, color, trim, wheel and tire sizes, etc… I ask you, does it actually make sense to put all of this unlike data together in one place?

What you are left with is a singular container of some odd size and odd alignment. If you’re not careful, you’ll have padding, both between members and between instances – if you have them in a contiguous container. You have islands of 1. And look at your cache lines, when you’re iterating over a range of cars, they’re populated with all this data that you don’t actually want, most of the time. Smart data is horribly cache inefficient. Let’s say you want to tally the car colors across all your instances; well, you have all this make, model, year, tire size, etc crap you don’t want and you’re not interested in!

Instead, how about if you thought about the DATA first and only. What would a car look like? Well, here’s a good start:

struct car {
std::vector<std::string> make, model, trim;
std::vector<int> year, rgb_color, tire_size, wheel_size;
};

And so what is a single instance of a car? “`i“`. A car instance is any one index across all these “parallel arrays”. So now if you want to iterate over all your cars and tally up the colors, you have all your color data in isolation. Your cache lines contain nothing but car colors. And you can write functional style code that doesn’t care WHAT a car “looks like”, you can abstract that concept away because the algorithm is pure and only cares about color, be it from a single car, a container of cars, or some other structure that has color data in it.

It does force you to really reconsider what you think you know about OOP. The data comes first. And so where does that leave classes? They end up looking a whole lot smaller. Often you’ll overload “`operator()“`, and they’ll only have a member or two that require no getter or setter. All that matters is the parameter passed in and the return value. They’re small, they’re cheap, and if you need to purge or restart, you’ll just make another instance. You’ll use classes for polymorphism. When you want a sorter of car color but you want to abstract the algorithm. All of a sudden it makes sense how compact a class is as some unit of encapsulation.

And consider your target processor is very likely a batch processor. Putting one algorithm and one dataset into the instruction and data caches, respectively, leads to performance. Your processor will prefetch the next cache line of pure data, and ideally you’ll have already sorted your data set so there’s little to no branching, or at least the data is in favor of branch prediction. Sorting your data before the heavy processing stage is another DOD principle – at the very least, the only data in the set you want is the data you actually need, so discriminate early. When you do this, your algorithms can be TERRIBLE, but perhaps unbeatably fast. I tell you, using DOD techniques, I was in high speed trading for ~7 years and in that time I wrote a linear search that was untouchably fast, WAY faster than binary search.

You are viewing 1 out of 2 answers, click here to view all answers.