Sunday, February 8, 2004

Tutorial 3 - Types and parameter passing primer

This primer explains three different concepts.

  • Value types and reference types
  • Passing by value and passing by reference
  • Garbage collection (brief overview)

Background

I recommend the following sites for more thorough explanations (especially Jon Skeet's two pages) or just another viewpoint.

Value Types and Reference Types

Every object in C# (and the .NET environment) is one of two types: Value or Reference. I'll pull an ASCII art from Chris Brumme's BLOG.

              System.Object
                 /       \
                /         \
         most classes   System.ValueType
                            /       \
                           /         \
                  most value types   System.Enum
                                       \
                                        \
                                       all enums

Value Types

The section "most value types" in the above picture includes most primitives like int, float, and double. Structs (see the MSDN struct tutorial) are also value types. A struct is a "light-weight" class. It is declared with syntax very similar to that for declaring a class. But there are some differences in the way it is used since it is a value type. One struct that is commonly used, but sometimes overlooked as a value type, is DateTime. When a value type is created, a spot in memory is reserved for it and the contents are placed there.

MSDN Link: list of value types

The following commands declare and assign a variable x. In line 1, the variable x is declared as an int, which is a value type. This creates a spot in memory for the variable x sized to hold an int. In line 2, the value 37 in placed in that spot. Thus we say x has a value of 37.

1          int x;
2          x = 37;
The same two lines can be expressed as:
    int x = 37;

Some value types use the new keyword to call a constructor for setting an initial value. The following command creates a spot in memory to hold a DataTime struct, and sets the value of that spot in memory to 2000-01-01.

    DateTime dt = new DateTime(2000, 1, 1);  // Jan 1, 2000

Reference Types

The section "most classes" in the above picture includes the reference types. This includes all user defined classes (like our Airplane class from before). It also includes the string class.

Creating a reference type object requires the new keyword. The following command declares a variable airplane1, which holds a reference to an Airplane object on the heap. The Airplane object on the heap contains the data such as "Boeing" and "747". The variable airplane1 is similar to a C++ pointer in that all it really holds is an address (or reference).

    Airplane airplane1 = new Airplane ("Boeing", "747");

The following line of code does not actually create a new Airplane object. It simply creates a variable (airplane1) that could hold a reference to an Airplane object.

    Airplane airplane1; // no object created

Differences between value and reference types

There are a few key differences between value types and reference types.

  • Reference types are always stored on the heap. Value types can be stored on the stack (for local variables) or on the heap (generally for non-local variables, like static variables and member variables of a type that is already on the heap). See Memory in .NET - What goes where? by Jon Skeet for a more thorough explanation.
  • The value of a reference type variable can be null. That is, a reference type variable doesn't have to point to a valid object. Value type variables, on the other hand, cannot have a null value.

Analogy for value and reference types

I like to use the following analogy to understand value types and reference types. (I presented this earlier in Classes, objects and methods).

Reference types are like balloons and value types are like balls. A variable is like your hand. When you hold a balloon, you use a string to hold onto it. This string is a reference to the balloon. This is similar to how a variable holds a reference to a reference type object. When you hold a ball, you hold the actual ball in your hand, no strings. This is similar to how a variable holds a value type object. It holds the actual object (or value), no references and no strings attached.

We will continue this analogy later.

Passing by Value and Passing by Reference

This is an important concept in C# and .NET. I highly recommend either of the two pages listed above on Parameter passing.

There are two ways of passing variables to a method:

By value
Passes the value that a variable holds. Think of it as passing a copy of a variable.
By reference
Passes a reference to a variable. Think of it as passing the actual variable itself, as opposed to simply whatever value it contains.

The default is to pass by value. The keywords ref or out are required before a parameter to pass by reference.

The above statements always hold for all variables: value type and reference type. You either pass the variable's value or a reference to the variable. There is a lot of confusion due to the similarity in terms value type, reference type and passing by value and passing by reference. To alleviate this confusion, we'll look at each of the following four combinations:

  1. Passing value types by value
  2. Passing value types by reference
  3. Passing reference types by value
  4. Passing reference types by reference

Passing value types by value

FunctionA (below) simply increments a number by 10. I've declared it static which means that it is not part of a specific object, but rather part of the class itself (even though the class isn't shown). This has no bearing, however, on the principles demonstrated below.

public static void FunctionA (int x)
{
    x += 10;
}

public static void Main () 
{
    int myInt = 37;
    Console.WriteLine ("Before calling function, myInt = " + myInt);
    FunctionA (myInt);
    Console.WriteLine ("After calling function, myInt = " + myInt);
    Console.ReadLine();
}

In this example, Main creates an int (myInt) and prints the value to the screen. It then passes myInt, by value, to FunctionA. This means that FunctionA gets a new copy of the value of myInt. When FunctionA increments x by 10, it does not change the value of myInt since it is changing its own copy. Thus, the result is as follows:

Before calling function, myInt = 37
After calling function, myInt = 37

Analogy

To continue the balloon / ball example... Calling a function is like handing a balloon or ball over to another friend. In this example, we are talking about value types, which are analogous to balls Passing by value doesn't have a perfect analogy since it is like passing a copy. Thus, it would be like passing a copy of the ball that is in your hand over to a friend. You keep your original ball and your friend can't touch it.

Passing value types by reference

FunctionB (below) also increments a number by 10. However, by using the keyword ref when declaring the parameter int x, FunctionB indicates that the int must be passed by reference. The C# compiler requires that we also use the ref keyword when calling FunctionB so that it is obvious that we know what we are doing. We'll see why in a second.

public static void FunctionB (ref int x)
{
    x += 10;
}

public static void Main () 
{
    int myInt = 37;
    Console.WriteLine ("Before calling function, myInt = " + myInt);
    FunctionB (ref myInt);
    Console.WriteLine ("After calling function, myInt = " + myInt);
    Console.ReadLine();
}

In this example, Main creates an int (myInt) and prints the value to the screen. It then passes myInt, by reference, to FunctionB. This means that FunctionB gets a reference to myInt, not a new copy myInt's value. When FunctionB increments x by 10, it also changes the value of myInt (since x more or less points to myInt). Thus, the result is as follows:

Before calling function, myInt = 37
After calling function, myInt = 47

Using ref allows a function to change the contents of a variable passed in, which can lead to tricky-to-debug systems and spaghetti-code. The C# compiler requires that ref be used on both the declaration of FunctionB and the call of FunctionB (within Main) so that it is immediately obvious that variables passed into FunctionB may not have the same value when FunctionB returns.

Analogy

To continue the balloon / ball example: In this example, we are still talking about value types, which are analogous to balls. Passing by reference is like passing your ball over to your friend. Then when your friend is finished (the function returns) you get it back, along with any changes he made to it.

Passing reference types by value

FunctionC (below) uses the reference type StringBuilder. I chose StringBuilder because it is a simple, easy to use class that meets the requirements of being a reference type. Any other class would behave the same here.

//Note, StringBuilder is part of System.Text namespace.
public static void FunctionC (StringBuilder x)
{
    x = new StringBuilder ("Hello Universe");
}

public static void Main () 
{
    StringBuilder sb = new StringBuilder ("Hello World");
    Console.WriteLine ("Before calling function, sb = " + sb);
    FunctionC (sb);
    Console.WriteLine ("After calling function, sb = " + sb);
    Console.ReadLine();
}

In this example, Main creates a new StringBuilder instance with an initial value of "Hello World" and assigns it to the variable sb. Thus, sb holds a reference to our new StringBuilder object. FunctionC is called and sb is passed by value. This means that the reference to the StringBuilder object is being passed by value. FunctionC gets the value of the reference (or in other words, it gets a new copy of this reference). At the start of FunctionC, printing x would print "Hello World". The variable x is assigned to a new StringBuilder object with a value of "Hello Universe". However, this does not affect sb. After calling the function, sb still prints "Hello World".

Results

Before calling function, sb = Hello World
After calling function, sb = Hello World

Analogy

To continue the balloon / ball example: In this example, we are now talking about reference types, which are analogous to balloons. Remember that you only hold a string to a balloon, not the actual balloon. Passing by value is like getting a new string, connecting it to your balloon, and then giving this new string to your friend. If your friend cuts the string or connects it to a different balloon (as is done in the above example), it has no affect on your string.

Passing reference types by reference

FunctionD (below) also uses the reference type StringBuilder.

//Note, StringBuilder is part of System.Text namespace.
public static void FunctionD (ref StringBuilder x)
{
    x = new StringBuilder ("Hello Universe");
}

public static void Main () 
{
    StringBuilder sb = new StringBuilder ("Hello World");
    Console.WriteLine ("Before calling function, sb = " + sb);
    FunctionD (ref sb);
    Console.WriteLine ("After calling function, sb = " + sb);
    Console.ReadLine();
}

Again in this example, Main creates a new StringBuilder instance with an initial value of "Hello World" and assigns it to the variable sb. Thus, sb holds a reference to our new StringBuilder object. FunctionC is called and sb is passed by reference. This means that the reference to the StringBuilder object is being passed by reference. The variable x will hold the same reference as sb holds in Main. When the variable x is assigned to a new StringBuilder object with a value of "Hello Universe", it also affects sb. After calling the function, sb prints "Hello Universe".

Results:

Before calling function, sb = Hello World
After calling function, sb = Hello Universe

Analogy

To continue the balloon / ball example: In this example, we are talking about reference types, which are analogous to balloons. Remember that you only hold a string to a balloon, not the actual balloon. Passing by reference is like handing the string in your hand over to your friend. When your friend is finished (i.e. at the end of the function) he hands it back to you. If your friend cuts the string or connects it to a different balloon (as is done in the above example), then you have a new balloon at the end of your string. What a nice surprise!

Parameter Passing Summary

We've discussed passing by value and passing by reference for both value types and reference types. I should point out that when dealing with reference types, it doesn't matter how you pass the object, the calling function will always be able to make changes to the object itself. In the following example, we are passing by value, however sb and x both point to the same object, so x.Append() will affect sb.

//Note, StringBuilder is part of System.Text namespace.
public static void FunctionE (StringBuilder x)
{
    x.Append(" from me!");
}

public static void Main () 
{
    StringBuilder sb = new StringBuilder ("Hello World");
    Console.WriteLine ("Before calling function, sb = " + sb);
    FunctionE (sb);
    Console.WriteLine ("After calling function, sb = " + sb);
     Console.ReadLine();
}

Results:

Before calling function, sb = Hello World
After calling function, sb = Hello World from me!

Garbage Collection

To summarize, I'll use the words of Jeffrey Richter:

The garbage collector checks to see if there are any objects in the heap that are no longer being used by the application. If such objects exist, then the memory used by these objects can be reclaimed

Analogy

When you let go of a string, the attached balloon floats away. Even if two balloons are tied together, they will both float away if they are not tied down. The garbage collector is a small plane that goes around collecting these balloons. Note, the plane doesn't have any set schedule or order for collecting balloons, it just happens as needed. (Ok, there are some rules about when it can happen, but not when it will happen).

When the garbage collector needs resources, it will find all of the objects that are not "tied down" and collect them. There are a few more advanced topics when it comes to Finalizers and the IDisposable interface. We'll get into that later.

No comments: