Passing by Reference or Value

Even if you are new to programming, you probably have an understanding of functions and their purpose. What may be less clear, however, is what is happening underneath the hood when you pass a value to a function. In some languages, such as Java, when you pass a value (such as an integer or a char, this only applies to primitive types) the function receives a copy of that value. In this instance you are guaranteed that the original version of the value will remain unchanged after the function call. In other languages, such as C/C++ and PHP, a copy of the value need not be made; you have the option of passing the value by reference. In this case any changes made to the value within the function will persist beyond the function call. Here is an example, using PHP:

//x is passed by value
function foo($x)
{
	$x++;
}
//x is passed by reference
function bar(& $x)
{
	$x++;
}
$a = 1;
foo($a); //a is still 1
bar($a); //a is 2

In PHP the & symbol denotes that the variable is a reference. In this case the syntax is fairly simple, as references can be handled the same as values. In a type safe language like C++ however, a reference cannot be handled in such a way; the corresponding value must be accessed manually. This being the case the C++ reference syntax is a bit more complicated (and often messy).

Understanding the difference between passing by value and passing by reference is only half the battle. Understanding when to use it is equally important. There are two cases (that I will mention) that passing by reference comes in handy. The first may seem obvious: when you want the value to be changed by the function call. Generally, if you expect the value to remain unchanged after the function call, or if you no longer have use for the variable, then you should pass it by value. Passing by reference is not a substitute for return values. In the previous example, in practice, the function bar should be written like this:

function bar($x)
{
	$x++;
	return $x;
}
$a = 1;
$a = bar($a); //a is 2

Passing by reference is more appropriate in cases where the return value of a function cannot be the altered value that was passed:

function bar(& $x)
{
	$x++;
	if($x > 5)
		return true;
	else
		return false;
}
$a = 1;
$greaterThanFive = bar($a);
//a is 2

The second case where passing by reference is useful is when passing by value will have a negative impact on your application’s performance. In a typical case such as passing an integer to a function, the performance impact of making a copy of that value is negligible. What if, however, you need to pass an array of thousands of items to a function? If you needed to do this several times there would be a noticeable difference with regard to performance between passing by value and passing by reference. A reference to an object is always the same size (a 32-bit memory address for example). An object, however, may be very large, and making copies of it needlessly may not be the best idea.

Scripting Vs. Programming

Occasionally I’ll see someone make a point of distinguishing coding in a particular language as scripting as opposed to programming. Often times the distinction is arbitrary. I’ve seen justifications for this distinction ranging from scripting languages not being as strict as programming languages, to scripting languages not being turing complete. Web languages in particular (HTML, javascript, PHP, etc) seem to have the stigma of being scripting languages. To this day, however, I have not seen a non-trivial definition of the difference (or perhaps just one that satisfies me).

This does not mean that I don’t think there is a difference; in my own mind I tend to draw a distinction. I do this, however, based on the practice of coding itself rather than the language. As an aside, before I go into detail, I’d like to mention that I’m not trying to pass my opinion off as a definition or absolute truth, just as my opinion. When I think of script, I think of HTML in particular. This isn’t because it isn’t Turing complete, or because it isn’t compiled, or because it is a “web” language. I make this distinction based on the tolerance for error in the practice of coding HTML. Go to any site and validate its source. Chances are you come up with multiple errors. This would not be tolerated in a language like C++. I’m not trying to say that HTML is inferior, when in fact it really wouldn’t be fair to compare it to the “traditional” programming languages.

Some languages are often referred to as scripting languages, but I feel are more like programming languages. PHP (and its web programming counter parts) for example, is often considered a scripting language. If you take into consideration my earlier assessment of HTML however, you will see why I do not consider it as such. If I forget to close a tag or if I use improper syntax in an HTML script there is a good chance that it will display just fine in a browser. In PHP on the other hand, if I forget a closing bracket or use incorrect syntax, my script will fail. Even if it is able to run, I will see unexpected behavior, and there is no attempt (and should not be) by the interpreter to correct these mistakes.

I don’t think it is the case that you can divide all of the languages in use today into scripting languages or programming languages. There is some gray area, and many languages have elements that are script like even though it would be difficult to consider them entirely a scripting language. Javascript and Xquery, in my mind are examples of this. As I mentioned earlier these are just my perceptions. I think that the difference between scripting and programming is completely arbitrary. I think one of the main reasons that such a distinction exists is simply so programmers can point out that writing HTML or PHP isn’t “real programming”. The difference really isn’t that important, which is why I don’t think it is necessary to create a formal definition.

Client Side Vs. Server Side Code

In my experience, one of the most common pitfalls for beginning programmers is not understanding the relationships between objects in their environment. This is especially the case in web development where there is in almost every case a blend between multiple client side and server side scripts. Failure to understand the the way browsers and servers communicate or the relationships between (X)HTML (or JavaScript or CSS etc) and PHP (insert alternative language here) will certainly lead to a poor or incorrect implementation. If you are an experienced programmer you probably won’t gain much from reading this, but if you are a beginner, hopefully I can provide some insight that will save you a lot of trouble.

The difference between client side and server side code is fairly simple. Client side code is processed by the client (the browser to be more specific) while server side code is processed by the server. HTML for example is parsed by the browser; the browser is responsible for taking that code and turning it into what you see in your window. For the purposes of parsing web pages, there is a short list of the types of code the browser can deal with. A typical web page, as far as the client is concerned, consists of some flavor of HTML often supplemented by CSS, or JavaScript (an exhaustive list of the types of client side code is beyond the scope of this entry).

Server side code, on the other hand, is never seen by the browser. The browser is not and should never need to be aware of server side scripts such as PHP. While a web page consists of client side code, this code is often either partially or entirely generated by a server side script. For example:

$title = 'Client Side Vs. Server Side Code';
if($title == '')
	echo '<title>Tinsology</title>';
else
	echo "<title>$title</title>";

When you navigate to a page containing the code above the browser will see “<title>Client Side Vs. Server Side Code</title>”. That’s it. The browser does not see any of the PHP code that generated the title. When you request a page containing PHP code from the server, the server first processes that page and then sends the resulting output to the client.

Server side code is browser independent (unless explicitly coded otherwise). This means that if the page you create looks different in Internet Explorer than it does in Opera it has nothing to do with your PHP code, but rather the resulting client side code.

Bubble Sort is Never the Answer

It is not too often in the real world that you have to implement your own sort. Generally, whatever language you are using has a library with this functionality built in. If the occasion does arise, however, it is important to understand which algorithms are applicable in which situations. As with most choices, there is no absolute correct answer; there are many trade offs to consider. When choosing an algorithm there are three things you should consider: performance, overhead, and ease of implementation.

You should give equal consideration to each of these factors, disregarding any one of them can lead to poor choices. It is common, for instance, for people to ignore the ease of implementation and focus on the performance of the algorithm. The problem with this is that not every operation is critical. No one is going to die if they songs on their play list do not get sorted quickly enough. Programmer time is more expensive than run time as a professor of mine often said. In addition to this, some high performance algorithms can slower than simpler algorithms due to overhead. If you are sorting 100 items, you can probably insertion sort them just as fast or faster than you can heap sort them. The same would not be true with one million items; heap sort would be faster.

Once we consider all of the factors, you should find that no one algorithm is ideal in every case. There are some algorithms, however, that are not ideal in any case. Unfortunately one of these algorithms is among the most popular: bubble sort. Bubble sort is a very simple algorithm to implement and it has little overhead. The problem lies in its performance. You might think that this conflicts with my earlier point that even simple, low performance algorithms can be faster than others in the right situation. You also might think that bubble sort, being easy to implement, makes up for its performance short comings. This would be true, if it were not the case that there are algorithms that are equally simple to implement, require just as little overhead, and perform better in practice.

Insertion sort is one such algorithm. Like bubble sort it is an in place sort, and is just as easy to implement. Both algorithms have the same time complexity (O notation), but in practice insertion sort performs better in most cases. This being the case you may wonder why bubble sort is even around. Certainly if it is obsolete pages regarding its implementation should be torn out of books and mentioning it should be punishable by a swift slap with a keyboard. Maybe not. When I learned bubble sort it was as an example of how not to sort. In my non-expert opinion, it is equally important to understand how NOT to do things as it is important to understand how TO do them. My point? Learn bubble sort, but never use it.

Know What to Expect from your Programming Language

I often see people asking how to do things with a given programming language that it was not intended to do. Recently I read a post from someone who wanted to know how to take a java program and compile it to a .exe. For anyone who is not aware, Java programs are not compiled in the same way a C++ program is compiled. The java source code is first compiled to bytecode. That bytecode is then interpreted by the java virtual machine. The writer was intending to get a performance boost by having the code compiled rather than interpreted.

While it is true that a compiled language can be faster than a interpreted language, it is not the case that every compiler can out perform every interpreter. This is especially true if you are compiling code that was intended to be interpreted. There are Java compilers out there, but the Java interpreter is much more mature than any of them. In addition to the non-existent performance increase, by compiling this code you eliminate one of Java’s key features: portability. If you bypass the JVM then you will have to worry about which systems your code will run on and which they will not.

Ultimately, if you need code that is very fast the answer is not to take code in one language and tweak it into something it was never meant to be. This echos another problem: language dependence. Too often I see people who learn everything there is to know about one programming language, and never bother to learn another. Being a programmer does not mean being a C++ programming, or a Java programmer, or a PHP programmer, it means understanding the concepts of programming and having that understanding transcend multiple languages. This will remain to be true until someone comes up with a catch-all language that is ideal in every case. Until then if you need really fast code think about C++, if you want safe portable code think about Java. You should also be aware that these trade-offs are not absolute. Not every Java program is slower than an equivalent C++ program, and a poorly written C++ program is certainly slower than a well written Java program.

The example I mentioned above is only one of many. I’ve seen people that want to write desktop applications in PHP, write .NET apps that work without the .NET framework, and use javascript in an offline application. Though someone might have hacked something together that facilitates this, be aware that in most cases these implementations are not ideal. If you need to do something that the language you are using was not intended to do then that is a sign you need to branch out and become a programmer.

The Programmer’s Cardinal Sin

If you are a programmer you should do yourself, and anyone else working with your code, a favor: stop using copy and paste. If there is a case where you need to use the exact same, or very similar, code in multiple places, that is a sign that you should be using a function, object, or other structure. I say this not for the sake of ‘proper coding practices’, but to save you and anyone else dealing with your code a massive headache.

I admit that there have been cases where I have copied code (I was young! I didn’t know what I was doing). From experience, I can tell you that I have almost always regretted it. For one, it is very annoying to have to make changes in multiple parts of your program. For another, when you copy code, you also copy any errors in that code. You might be inclined to think that you’ve been careful or you are certain that this code doesn’t contain any errors, but in retrospect, I think at least half of the time I’ve done this even when it is a fairly simple program, I’ve had to go back and fix bugs in multiple places.

This headache is even worse for someone who is working with your code. When making changes or updates that person may think that they’ve done what they intended when in fact they’ve only addressed half or less of the problem. The same goes for errors: it can be extremely frustrating to think that you’ve fixed a problem and then have it present itself again because you are unaware that the original coder used copy and paste.

If you have the means, I advize you to attach USB electrodes (you can buy them at target) to yourself and have yourself zapped everytime you press ctrl+c.

Over Optimizing

The word optimal has become a buzz word that gets tossed around too easily. Something a professor of mine pointed out a few weeks ago: By definition, optimal is something that isn’t realistic to achieve in the context of programming or software engineering. If something is optimal, there is no room for improvement, which is never the case. Some people may be irritated by the misuse of this word, but I think what is worse is the practice of optimizing.

There is a fine line between making meaningful improvements to the performance of your program,  and wasting coding time. If you have some data that needs to be searched through, a good use of your time would be deciding on a data structure to store your data, and an algorithm to search through it. A bad use of your time would be to go through all of your loops and change i++ to ++i. For those of you who are not aware of the difference, ++i is slightly faster than i++. The difference, however, is so tiny that you should never spend any amount of time changing one to the other. The same thing goes for print and echo. In PHP echo is slightly faster than print, but in a real world application it would be difficult to measure the difference.

I like to call these practices trivial over optimizations. There is a non-trivial kind of over optimization that I think is an even bigger time waster. Earlier I mentioned that a good use of your time would be deciding which data structure to use to store data that you plan to search through. This being the case you might want to spend some time comparing the performance differences between a binary search tree and a priority queue, correct? Maybe. If this program is being written to control someone’s pacemaker then the answer is certainly yes. If you plan to sort someone’s play list with this program then the answer is probably no.

There is a third kind of optimization that is not an optimization at all: If you are searching through ten items, a linear search is faster than a binary search. This is because of the over-head in constructing the binary search tree. Optimization is a practice that should belong to programs that are dealing with a large amount of data, or are time-critical. If you are concerned with the performance of your program the solution is not to optimize the code you have, the solution is to learn to write code in such a way that it doesn’t need to be optimized. This is something that comes with experience and attention to detail.

Inheriting Code

If you’ve been a programmer for any amount of time you’ve more than likely had the honor of inheriting someone else’s code. This might be in a corporate scenario, or you might just be modifying an open source program. Either way you’re experiences though varied are probably marked with few comments, poor syntax, and obscure methods. This might be something you can tolerate, but personally, I am a code perfectionist. I find it difficult to code in new functionality or modify existing functionality without reworking the code to suit my tastes. This can easily go from moving a few braces around to doing a full overhaul.

Unfortunately I can’t offer much advice to those who are stuck in a situation where they are digging through terrible code; honestly all you can do is be patient and buy a stress ball. I can, however, offer some tips on good coding practice that might prevent you from ruining someone’s day.

Tabs and Braces and Spaces

Also known as whitespace. There are few things more frustrating than staring at code that isn’t organized. The number one thing that deters me from helping someone with a programming question is looking at their code and seeing things like this:

$x = 0;
while($x &lt; 10)
{
if($x==3){echo 'This is terrible code';}
else
{
echo 'hello';
}
x++
}

To begin with, the only time braces should not be on a line of their own is when they are preceded, or followed by a conditional statement:

//OK... this method id my personal preference
if($x == 1)
{
	echo 'hello';
}
//OK.. probably the most common
if($x == 1) {
	echo 'hello';
	/*
	.
	.
	.
	*/
}
else {
	echo 'goodbye';
	/*
	.
	.
	.
	*/
}
//OK.. I'm not a fan but its acceptable
if($x == 1) {
	echo 'hello';
	/*
	.
	.
	.
	*/
} else
	echo 'goodbye';
	/*
	.
	.
	.
	*/
}
//NOT OK!
if($x == 1) { echo 'hello'; /* . . . */ }

Moving on, code nested within braces or within the body of a conditional or other control flow statement should be indented:

//OK
for($i = 0; $i &lt; 10; $i++)
{
	echo &quot;hello \n&quot;;
	echo $i;
}
/**
* OK,
* Some people don't like this method, but as long
* as it is only a single line its fine with me. Make
* sure you leave an empty space following the
* final line.
**/
if($x == 1)
	echo '$x = 1';
else
	echo '$x != 1';
//NOT OK
while($x &lt; 10)
{
echo $x;
$x++;
}

Now we come to spaces. Operators and their operands should always be separated by a space. UNIX shell scripters can make an exception to this of course, but otherwise you should space things out:

//All of these are OK
$x = 10;
$y = 5;
$z = $x + $y
$z *= 2;
//These are not OK
$x=10;
$y=5;
$z=$x+$y++;
$z*=($z+$x)--; //I can't even tell you with this is equal to

Duplicate Functionality

The only thing worse than editing your terrible code, is doing it twice. If you find yourself using copy and paste, or otherwise coding the same functionality multiple times, for the sake of anyone reading your code please stop. Not only is this going to translate into a bad experience when it comes time to update your code, but its also likely that whatever mistakes you made the first time you wrote the code were duplicated.

Yesterday I installed a WordPress plugin. The plugin performed well, and for the most part I was happy with it. Of course, however, there were modifications I needed to make (including as it turns out rewriting the code to make it conform to the XHTML standard). Once I opened the file to make the changed I was horrified to discover that there were literally zero functions or objects. If something needed to be done twice, the code was copied. Unfortunately the writer of the plugin forgot to close a div, which is an honest mistake. What isn’t an honest mistake, and should be punishable by death by CRT monitor thrown at you, is taking that code, and copying it three times (error included) to perform the exact same task.

Manageability

That same file I just mentioned contained over 5000 lines of code. I don’t care how large of a project it is, your code should never exceed 1000 lines. Of course, in this case, the programmer decided not to use any functions so there was no logical way to break up the file. If you, on the other hand, are not out of your mind, any significant amount of code you create will contain many functions. Most likely those functions can be categorized and placed in separate files. If you want to get fancy you can even use some objects. The important thing is that the code you write isn’t in blob form.

The same concept applies to individual lines of code. Once again refer to the plugin I had the displeasure of modifying, the longest line in that file was 549 characters long. If I can’t see an entire line of code with scrolling on a widescreen monitor then there is a problem. Really, even using a fullscreen monitor you should never have to use the horizontal scroll. There is nothing wrong with using the return key in the middle of a line of code.

//NOT OK
$animalCount = array('cat' =&gt; 5, 'dog' =&gt; 2, 'bird' =&gt; 7, 'mouse' =&gt; 3, 'badger' =&gt; 0, /* we don't need no stinking badgers */ 'chicken' =&gt; 4);
//OK
$animalCount = array(
			'cat' =&gt; 5,
			'dog' =&gt; 2,
			'bird' =&gt; 7,
			'mouse' =&gt; 3,
			'badger' =&gt; 0, //we don't need no stinking badgers
			'chicken' =&gt; 4	);

Comments

The final portion of my rant regards commenting your code. This is something we’ve all been guilty of at some point. I don’t care how well written your code is, most likely I can’t tell what it does at first glance. If you define a function, write a few words about what that function does. If you do something that seems out of the ordinary, at the very least write down that you did that intentionally.

I wrote some code a few days ago and it contained a switch statement. I intentionally did not use a break in one of the cases because the following case needed to be performed as well. To someone glancing at the code, however, this would seem like a bug, they would add a break statement and introduce a new bug when they thought they were fixing a bug.

Use comments!

Strings Are Arrays

I’ve seen a lot of people asking questions about how to find a certain character or sequence inside a string. It is common for people to turn to the library in order to find a function that does this for them, but it is likely the case that the answer is right in front of you. If you need to search thorough a string for whatever reason you can index it as an array. This is the case in a lot of language (PHP, C++, Java).

In a non-type-safe language like PHP people often forget the distinction between primitive types and other constructs. Primitives are often things like int, float, double, long, and char. Strings are generally not a primitive type but an array of chars. Most languages tend to hide this fact from you, because strings are so common it is often more convenient for the programmer to deal with them as if they were not an array of characters.  For example, Java does not support operator overloading; you can only use arithmetic operators (+, -, *) on primitive types, with the exception of strings, which can be appended using the + operator.

So how can you use this fact to your advantage? There are many cases where indexing strings rather than passing them to some function can serve your purpose more efficiently. Here are some examples in PHP:

Ex.
You have several lines of text and you want to count the instances of a particular word. One way to do this would be to use explode to turn the text into an array of words and then count the instances of that word:

$text = 'cat dog chicken cat bird mouse cat lizard';
$words = explode(' ' , $text);
$count = 0;
foreach($words as $word)
{
     if($word == 'cat')
          $count++;
}
echo $count; //should output 3

One function call and one loop, pretty simple right? Maybe not. It is important to remember that function calls may a lot more than they appear. In order to split the string into an array of words, explode must search through the string for a space. If all you need to do is count the number of instances of a word, there is no need to waste time constructing an array of words and then looping through that array. Here is a more efficient way:

$curr = '';
$count = 0;
$searchStr = 'cat';
$text = 'cat dog chicken cat bird mouse cat lizard';
$len = strlen($text);
for($i = 0; $i &lt; $len; $i++)
{
     if($text[$i] == ' ')
     {
          if($curr == $searchStr)
               $count++;
          $curr = '';
     }
     else
          $curr .= $text[$i];
}

The above example contains several more lines of code, but it is important to remember that the number of lines in a program should not be used to measure the performance of a program.