What is the logic behind programming languages 0-indexing?

589 views

As someone who primarily uses R, I don’t understand why Python indexes lists starting from 0. I’m very slow at simple mental calculations and doing something like subsetting an array in Python often takes me several extra seconds.

I think I read that it has something to do with memory, but thats so much less of a consideration for people who only use high level languages.

In: Technology

9 Answers

Anonymous 0 Comments

Let’s look at a simple case like an array with five 32-bit integers in C that we call a. 32 bits are 4 bytes long so 5 integers would be 4*5= 20 bytes. So the array is just 20 bytes or continuous memory.

The program knows where the array is in memory with the address of the first byte stored in the variable a and lest say is is at added 1000 (decimal)

So the first int starts at 1000 the next at 1004, 1008, 1012 and finally 1016.

So in n is at memory address a+4*n the code that you write a[n] is converted to that internally.

So if the first element in the array is 0 it all works fine just multiple what variable you look for with the size of the variable and you get the memory address.

If the first element is one the then memory address would be a+4*(n-1) so you need a extra instruction to calculate the address of the variable or a has to be reduced by 4.

If a is reduce you add other complexity like if you like the read the array as another variable format. When you read from a file you get red it as individual bytes or chars in C so if you would store the intein a file you would do that with the same address a but use it as 20 one byte chars. The same is true for data you transfer over a computer network. So changing the value if a would-be complex because when you change how you look at the data it would need to change. The result is that in a low-level language like C the reasonable option is to start all arrays at 0 because the result is a simple and faster result.

If you look at assembly instruction you can read an int with an offset from an address in a single instruction. So if you load data to 32-bit register from a memory address with offset the a+4*n calculation is done in hardware in a single instruction.
If you would access an array of structs where each struct is 7 bytes you need to do the address calculation yourself.

So from hardware and low-level point of view start at 0 make more sense.

You find an index that starts at one in languages where the variables are not as closely connected to the memory and hardware as C so often in interpreted languages. An example is LUA where what you think of as an array is a hash table.

The variable system in phyton is not as close to memory addresses as in C, if I understand it correctly, So the start a 0 is more of a choice and I suspect the language designers were just used to it and continued to use that. The Python interpreter was written in C so to have the same index start in the interpreted language you make Python and in the language, you write the interpreted would make the development simpler.

You are viewing 1 out of 9 answers, click here to view all answers.