What is the logic behind programming languages 0-indexing?

535 views

As someone who primarily uses R, I don’t understand why Python indexes lists starting from 0. I’m very slow at simple mental calculations and doing something like subsetting an array in Python often takes me several extra seconds.

I think I read that it has something to do with memory, but thats so much less of a consideration for people who only use high level languages.

In: Technology

9 Answers

Anonymous 0 Comments

Historically an array/list was (and in languages like C, is) implemented by storing the memory address (“pointer”) of the first item in the array. Thus following the pointer takes you straight to the first item in the list. The memory location of any particular item is then calculated as “pointer to first item + size of each item * array index”. Hence the first item is number 0 because you don’t move the memory pointer to find it. The second item is “1 space to the right of the first item”, etc.

When people say that C is a very low level langauge, this is the sort of thing they mean.

Changing this behaviour might involve having the pointer point just before the first item, but then either you’re wasting an item slot’s worth of memory or the pointer is technically invalid as presented. This would be inconsistent with other uses of variables pointing at memory. Alternatively you could subtract 1 from all array indexes before doing the memory location math but that’s a lot of overhead that isn’t necessary.

Most (but not all) languages keep this convention for consistency with other languages.

Anonymous 0 Comments

Because that’s how address registers in the hardware work. If a “pointer” in a language is the address of the start of an array, the first element is at that address. The hardware’s “load indexed” instruction takes an address register for the base and another register as the index. The address+0 location contains the first element in the array.

Anonymous 0 Comments

At a deeper level, you need to figure out where in memory each item is. When the application asks the OS for memory for a variable, it gets given some address X. You will have the first item in the list at memory address X, the second at memory address X + offset, the third at X + 2* offset, the fourth at X + 3* offset, and so on.

So when you want to get a specific item from the array, you need to tell it how many offsets to use. The first item is right at the start of the list, so it has 0 offsets.

Anonymous 0 Comments

The values in the array are stored in memory. Rather than explicitly store the address for each element in the array, the computer stores the address only for the beginning of the array. To access other elements of the array, you must specify an offset.

Naturally, to access the beginning of the array itself, the offset would be 0. So the index starts at 0.

Anonymous 0 Comments

The other responses are all correct, but if you want a shortened version, think about how a number is stored. It’s stored in bits; 1’s and 0’s. You can have four bits that are all 0’s: 0000. If you start your index with 1 (0001), you miss out on an extra spot in the array.

Anonymous 0 Comments

Think of it like directions to get someplace. If you stood at the start of a long hallway, and after every doorway there’s a clear divider. Then to reach a specific room, just count the dividers getting there.

1. 1st room is just as you walk in
2. 2nd room is after the 1st divider
3. 3rd room is after the 2nd divider
4. 4th room, 3rd divider…
5. 5th room, 4th divider…
6. Room 6 = Hallway[after 5 dividers
7. Room 7 = Hallway[6]

For a more technical answer, [Wiki: Zero Based Numbering](https://en.wikipedia.org/wiki/Zero-based_numbering)

Which provides some context that this design provides some optimizations and advantages.

Anonymous 0 Comments

Back before arrays, when you had a list of items in memory, you had to math to pull the right on out:

address of i = base address + i * size of item

The first item is at the base address, so you need i = 0 to access it. The [] syntax was essentially an shortcut for the address math.

This also has roots in mathematics, where the first item in a series is subscripted at 0.

Anonymous 0 Comments

Let’s look at a simple case like an array with five 32-bit integers in C that we call a. 32 bits are 4 bytes long so 5 integers would be 4*5= 20 bytes. So the array is just 20 bytes or continuous memory.

The program knows where the array is in memory with the address of the first byte stored in the variable a and lest say is is at added 1000 (decimal)

So the first int starts at 1000 the next at 1004, 1008, 1012 and finally 1016.

So in n is at memory address a+4*n the code that you write a[n] is converted to that internally.

So if the first element in the array is 0 it all works fine just multiple what variable you look for with the size of the variable and you get the memory address.

If the first element is one the then memory address would be a+4*(n-1) so you need a extra instruction to calculate the address of the variable or a has to be reduced by 4.

If a is reduce you add other complexity like if you like the read the array as another variable format. When you read from a file you get red it as individual bytes or chars in C so if you would store the intein a file you would do that with the same address a but use it as 20 one byte chars. The same is true for data you transfer over a computer network. So changing the value if a would-be complex because when you change how you look at the data it would need to change. The result is that in a low-level language like C the reasonable option is to start all arrays at 0 because the result is a simple and faster result.

If you look at assembly instruction you can read an int with an offset from an address in a single instruction. So if you load data to 32-bit register from a memory address with offset the a+4*n calculation is done in hardware in a single instruction.
If you would access an array of structs where each struct is 7 bytes you need to do the address calculation yourself.

So from hardware and low-level point of view start at 0 make more sense.

You find an index that starts at one in languages where the variables are not as closely connected to the memory and hardware as C so often in interpreted languages. An example is LUA where what you think of as an array is a hash table.

The variable system in phyton is not as close to memory addresses as in C, if I understand it correctly, So the start a 0 is more of a choice and I suspect the language designers were just used to it and continued to use that. The Python interpreter was written in C so to have the same index start in the interpreted language you make Python and in the language, you write the interpreted would make the development simpler.

Anonymous 0 Comments

In 0-based indexing the index represents the **offset** relative to the start of the array.

It works best if you think of the underlying memory. Say you have an array that starts at address 200, and use an element size of 10. The first element (i=0) is *at* the start, so at address 200. The next one is at 210. With 0-based indexing the *i*th element is at 200 + *i*·10. In 1-based indexing the first element (i=1 here) is still at the start (200), so to make the formula work you need to subtract one when calculating the address: 200 + (*i*-1)·10.

1-based indexing may be useful when strictly dealing with ordinals (1st, 2nd, etc). But when you have to do arithmetic with your indices, 0-based is often easier and less error-prone.