Eli5: explain data structure from computer science to a non technical person.


How do I explain what is data structure to someone who has never programmed before?

In: 15

Ultimately, computer data is just patterns of high and low electrical charges. Each high/low charge is called “a bit”.

We can technically call a number a “data structure”. It means we pick a number of bits and say that certain patterns mean certain numbers.

A more complicated data structure can refer to multiple things. Say we want to represent a list of 10 numbers. If each number is 8 bits, we could just point at 80 bits and say “that’s 10 numbers”.

But what if we want different sizes of lists? Well, then we might say “The first 8 bits represent how many numbers there are”. So the list of 10 numbers now takes 88 bits: the first 8 bits will be in the pattern for “10”, then we know the next 80 bits represent 10 numbers.

We can get even more complicated. Maybe we want a list of numbers, but the numbers aren’t all in a neat little line in memory. Well, we can store a *memory address* in a list. That way we still have a list that represents a number of things, but now those things can be scattered throughout memory.

So a data structure is really just a way for us to organize memory and dictate rules about what the values represent. But it’s not very useful to describe them to people who aren’t familiar with programs, because we don’t tend to use complex data structures unless we’re solving relatively complex problems.

At its most basic, there are tradeoffs you can make when you’re storing data, and those tradeoffs impact how you can later process the data.

For example, imagine you need to store a bunch of names so that you can assign seats to them at an event. You could have everyone line up and give their name to a person who writes them down in order. This is slow, but it’s highly organized, so you won’t take down more names than there are seats and it preserves the order. Alternatively, you could have everyone write down their own name and throw them in a bucket and have one or more people pull them out to assign seats. This is comparatively disorganized but can be done really fast, by dividing the work among multiple people.

Data structures in computers, are the set of ways you can organize storing, reading and modifying data and the associated rules and trade-offs that go with them.

Ok so you got a whole stack of cups. Different sizes, colors, some plastic, some glass, some skinny, some fat. All in one big stack.

You want one in particular. How do you describe it? Well say you want a cup that’s large, blue, plastic, and skinny.

How do you find it? You could go through each one and ask if it’s the cup you want. Or you could start by eliminating all the non-large, non-blue, non-plastic, non-skinny. Then you’d be left with what you can choose from.

Easy to do with 50 cups, now imagine an insanely large amount of cups. Like 10 million. And you want one.

How can you do that efficiently? Well you need to sort them by their attributes. You’d make a table where you’d say each cup has a number (called a key) and in that row you’d have each attribute type. Each type can have a subset of attributes like plastic shiny vs plastic matte and that would be a matrix in the attribute.

That’s data structure. Logically defining your attributes.

Have they used a spreadsheet? A data structure is like a spreadsheet, where the columns in the spreadsheet are the properties in a data structure, and each row is an individual record.

If they don’t understand spreadsheets, I would probably say that a data structure is simply a way to combine different types of information under one name. An Employee structure, for example, might have a name, birthdate, and hourly wage. Each employee is represented by a single instance of an Employee structure.

A data structure is a structure thats holds data (numbers, letters) in an efficient way for the task you want to do. There are commonly known data structures and patterns used which have been discovered to be very efficient and fast for certain tasks. But there is also mostly a tradeoff involved; e.g finding a specific number in a list. Depending on how long this list can be, if you only need to find but never change (adding or removing) or how you can access the information, you decide which tradeoffs you can accept and thus, which data structure will help solve your problem best. These tradeoffs are typically described as BigO notation