Wednesday, August 20, 2008

Boxing & UnBoxing in C#

Boxing/Unboxing in .NET

Does int.ToString() converts an integer (a value type) to a string (a reference type) and hence, boxes the int. On the other hand, does int.Parse() converts a string to an integer and hence, unboxes the string?
Ans: No
To find out what happens during a call to int.ToString(), I decompiled mscorlib.dll (It is present is the .NET Framework directory which on my machine happens to be C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727) which contains the implementation of Int32.
The implementation of ToString() is as follows:

public override string ToString(){
return Number.FormatInt32( this, null, NumberFormatInfo.CurrentInfo);
}

The Number.FormatInt32() method is declared as follows:
[MethodImpl(MethodImplOptions.InternalCall)]

public static extern string FormatInt32( int value, string format, NumberFormatInfo info);

According to MSDN, the extern modifier is used in C# to declare a method that is implemented externally. So, the question is: where is FormatInt32() (see above code fragment) implemented? The answer lies in the MethodImpl attribute which decorates the method declaration. According to MSDN, MethodImplOptions.InternalCall specifies that the method is implemented in the CLR itself. So, I proceeded to download SSCLI a.k.a. Rotor (source code to a working implementation of the CLR).

I learnt that

FCFuncElement("FormatInt32", COMNumber::FormatInt32)
This tells that the implementation of FormatInt32 method is actually the implementation of the native C++ COMNumber::FormatInt32 function. So, the next question is where do I find the implementation of COMNumber::FormatInt32 function? I noticed that there was a file named comnumber.cpp in the \clr\src\vm directory. I opened the file and started to examine the COMNumber::FormatInt32 function. I discovered that COMNumber::FormatInt32 calls COMNumber::Int32ToDecChars function. This function is defined as follows:

wchar_t* COMNumber::Int32ToDecChars( wchar_t* p, unsigned int value, int digits){ LEAF_CONTRACT _ASSERTE(p != NULL);
while (--digits >= 0 value != 0) { *–p = value % 10 + ‘0′; value /= 10; } return p;}

As you can see here, COMNumber::Int32ToDecChars takes each digit of the integer starting from the rightmost digit and proceeding to the leftmost, converts it to the equivalent character and stores it in a string and returns the string. There is actually more action that goes on inside COMNumber::FormatInt32 but, I won’t be discussing that here. The core function is performed by COMNumber::Int32ToDecChars. So, I would wrap up the discussion of int.ToString() function by saying that it converts individual digits of an integer to their equivalent characters, stores them in a string and returns the string.
Next, I tried to figure out what goes on inside int.Parse(). I used .NET reflector and found out that it is pretty similar to what int.ToString() does. The string is read character-by-character, converted to its equivalent digit and added to a number after the digits converted previously have been shifted by one position.

Most C# textbooks provide an example as shown below for boxing/unboxing:
int i = 145;

object o = i; // Boxing
int j = (int)o; // Unboxing

int.ToString() and int.Parse() cannot be used in the above manner and so, these functions are not even remotely related to boxing/unboxing.

what actually happens during boxing/unboxing.
See the excerpts from Shared Source CLI Essentials:
By default, when an instance of a value type is passed from one location to another as a method parameter, it is copied in its entirety. At times, however, developers will want or need to take the value type and use it in a manner consistent with reference types. In these situations, the value type can be “boxed”: a reference type instance will be created whose data is the value type, and a reference to that instance is passed instead. Naturally, the reverse is also possible, to take the boxed value type and dereference it back into a value type - this is called “unboxing”.
The box instruction is a typesafe operation that converts a value type instance to an instance of a reference type that inherits from System.Object. It does so by making a copy of the instance and embedding it in a newly allocated object. For every value type defined, the type system defines a corresponding reference type called the boxed type. The representation of a boxed value is a location where a value of the value type may be stored; in essence, a single-field reference type whose field is that of the value type. Note that this boxed type is never visible to anyone outside the CLI’s implementation-the boxed type is silently generated by the CLI itself, and is not accessible for programmer use. (It is purely an implementation detail that would have no real utility were it exposed.)

No comments: