Skip to main content

why array [0]

Have you ever wondered why do we have to use zero-based indexing in the arrays of programming languages?

Okay, so if you are a programmer already, cool. But for the ones who aren’t, it is basically a convention which says, numbering of a sequence starts from zero and not one. Easy right? This thing is mostly used in CS.

So if you’re saying that the first alphabet in a string (word), say your name, is A, then in the computer’s language you would say, it’s the zeroth in that string. We call it the, ‘index’ zero. Now, a string or a word is an “array” of alphabets. What’s the array thing? Read ahead.

Array is when you store an ordered data or a sequence, one after another, i.e. in continuous locations, in a computer or even in real life. Like in an attendance sheet, you’ll find roll no. two’s name, just in the next line after roll no. one.

Here on, it is better if you’ve typed something like this before, ‘array_name[5]’ and even better if you have at least heard about “pointers”. The overwhelming variables highly used in machine-level programming languages like C and even, C++.

Sweet intro. Let’s brush up some memory. If you have studied ‘C’/’C++’ or any other machine-level language, you must have manipulated pointers already. Pointers are non-famous variables that store the addresses of other famous variables. For those who are thinking of a mouse pointer, I’m elaborating.

Data in any computer’s memory is stored into locations called memory-bytes that have unique addresses. These addresses are nothing but cell-numbers like those you see on bank locker cells. 15, 9999, etc. You keep some data in the locker and it can be found by the address of that locker, the number. In computers though, memory lockers’ or rather memory-cells’ addresses are hexadecimal numbers (0xd4a5fe). The cell-number 9999 is "0x270f"  in hexadecimal. This hex representation gives efficiency to the computers. Each memory-cell, stores data and the computers keep a track of it by the addresses of those cells.

Digest this. The memory cells are known as memory-bytes.

So now, if you have a computer say macBook-1899, of only four byte storage space, you’ll have memory addresses labeled, ‘0x1 to 0x4’ (hexadecimals for 1 to 4). All that you can store in this computer is a, b, c and d, one byte each. No worries, your phone has trillions of such bytes. If you download a picture of like 4 kilobytes, it literally gets stored from a byte number say, 1001 to the byte number 4001 in the phone’s memory.

I’m sure by now you know what address means. That’s right, the address of your home or the room number in a large HOSPITAL, or the “number of a memory-byte”. Addresses are in fact important right. To access you, people need to know your address. Same is the case with memory addresses. We will talk about arrays and get to the topic now.

Imagine X wants to find a relative in that, HOSPITAL. He tells the counter lady that the patient unwillingly lies in the room number let's say, 56. She says, okay! But he is “stored” at 55. She gives an instruction to a computer to fetch the name at the location, ‘patient_names[55]’ and it gives the name of the concerned patient, Y. Doctors say that Y’s health is improving. X is happy. Good. But what was all this?

This is zero-based indexing. Patient in room number one is stored at location or index zero in the array, ‘patient_names’ or something. Y'all must be like, “why‘s he explaining array so much? I already know that.” Well, nice. But if you haven’t programmed into basic languages ever or haven't studied the array data structure, maybe this’ll surprise you. Array, is nothing but a pointer.

Pointer is the guy who knows that the picture you downloaded is stored at ‘1001’. He himself stores that address. Talking in programming terms, when you say int a=10, ‘a’ is an integer object aka variable; general, and never concerned with the address where 10 is stored. If you ask for ‘a+1’, it’ll give you 11. Now, if you declare a pointer variable, int *p, and somehow assign it the address of variable ‘a’ using some weird syntax of languages, it stores the address of that 10. Umm, is it the same value like the byte-number 1001 that p stores? Indeed yes, if this time address ‘1001’ stores 10. Specifically, p stores the hexadecimal value of 1001. Have a look at this code below.

Why should we at all worry about pointers? Because I told you, that’s how an array is implemented. Array is sequential data stored in continuous memory bytes of a pc right. Aren’t addresses of concern then? And now, hero of the story, array is nothing but a pointer to the "first data object of your sequence". That, is, all. Every time when you’ve defined an array, typing in something similar to, ‘int array[] = {1, 2, 3}’, you have declared a “pointer” to the first object in the array. In this case, first element is an int object, 1. Don’t believe it? Look for yourself. 👇


Anyway,
the address you just saw was where 1 (being the first) was stored by the program. An actual memory-byte number out of the trillions, a computer has. If you pause here and think, this pretty much answers what we want. When a program executes ‘array[0]’, all that happens underneath the hood is that, the compiler calls a function which returns you the data stored at the address, “array + 0”. Therefore it returns the first element of the array.  And hence, 0-based indexing. Example below explains it all.

Remember the 4-byte, macBook-1899? Assume you have declared an array, {a, b, c, d} in it named alphabets. Do you realize that the only 4 addresses, 0x1 to 0x4, store ‘a’ to ‘d’ respectively. So the variable named ‘alphabets’, which is now a pointer, will definitely store, 0x1. Consider you want ‘c’, which lies at 0x3. If you give the instruction, alphabets[2], it’ll simply give you what is stored at, 0x1 + 2 which = 0x3. The third memory-byte in that computer, which stores ‘c’. See? To access third, you gave 2. All one needs to understand about zero-based indexing.


One more thing for a bit experienced people. You must already know that the size of integers, characters or floats, are different in the context of bytes. And so, when you increment a pointer it does not land on the immediate next byte. Compilers need to be smart. Remember, you always have to say 'int *pointer' or 'char *pointer' thereby telling it, the data type of the element that pointer is going to point. So when you increment an integer pointer, the compiler adds 4 to it taking it 4 bytes ahead, as it knows that integers generally require 4 bytes. Huh, almost done.


An "array pointer" can never point to a song right, because their sizes increase depending on how good the website you used is, right. What?


Now, I know you might have a question. If that’s all it was, then why didn’t the compiler designers do simply, ‘array[x]’ returns ‘array + x -1’, to make it one-indexed. No trouble to the programmers. See that ‘-1’ there, it’ll make the array, one-based indexed. Because if we'd say 'array[1],' then it should return 'base_address + 1 - 1' = 'base_address + 0,' which is exactly what we want. In fact there are programming languages FORTRAN and MATLAB, which use one-based indexing.


Then why don’t any popular programming languages use this convention? The reason is efficiency and computing speed. A compiler has to run thousands of instructions to execute something which might seem small on the higher level, like sorting an array. The ‘-1’ instruction would be executed so many times in a running OS (a program), slowing down the system, that it’s better to use zero-based indexing.


Lastly, ‘0-based indexing’ troubles only the programmers and not the end-users, like the counter lady at the HOSPITAL we talked about. What I’m saying is that, X’s family would never weep in some misunderstanding, because his attendee typed in 56 instead of 55 into the computer and said, the patient in room no. 56 ran away three days before, lol. Instead, she’ll have a well-tested software which will display bed number 55’s data when given 55, and not 56's. Beware of zero-indexing.

Comments