Friday, October 16, 2009

Why structures need internal padding?

It's for ``alignment''. Many processors can't access 2- and 4-byte quantities (e.g. ints and long ints) if they're crammed in every-which-way.
Suppose you have this structure:
struct {
char a[3];
short int b;
long int c;
char d[3];
};
Now, you might think that it ought to be possible to pack this structure into memory like this:
+-------+-------+-------+-------+
| a | b |
+-------+-------+-------+-------+
| b | c |
+-------+-------+-------+-------+
| c | d |
+-------+-------+-------+-------+
But it's much, much easier on the processor if the compiler arranges it like this:
+-------+-------+-------+
| a |
+-------+-------+-------+
| b |
+-------+-------+-------+-------+
| c |
+-------+-------+-------+-------+
| d |
+-------+-------+-------+
In the ``packed'' version, notice how it's at least a little bit hard for you and me to see how the b and c fields wrap around? In a nutshell, it's hard for the processor, too. Therefore, most compilers will ``pad'' the structure (as if with extra, invisible fields) like this:
+-------+-------+-------+-------+
| a | pad1 |
+-------+-------+-------+-------+
| b | pad2 |
+-------+-------+-------+-------+
| c |
+-------+-------+-------+-------+
| d | pad3 |
+-------+-------+-------+-------+

Size of struct different from the size of its data members

Structs are allowed to "pad" data members out with extra, unused bytes for the sake of alignment. if sizeof(int) and sizeof(float) are both 4 bytes on a system, but sizeof(double) is 8, and you make a struct comprising of an int, float and double, then depending on the platform when you check sizeof(struct) it would either give you 16 or 24(again it depends on the platform)

Padding is entirely platform-dependent. The members of the structure will appear, in memory, in the same order which they're declared. The compiler will often try to ensure that a struct member which is 8 bytes in size (like double) begins at an address evenly divisible by 8, If the previous struct member didn't occupy a number of bytes divisible by 8, then a some unused bytes (known as 'padding') may be included after the end of that struct member.


Something interesting about padding: It is possible for two different structs to have different data members and same total size, and also it is possible for two structs to have exactly same data members and different memory size.