Archive for May 2009

The Programmer’s Cardinal Sin

If you are a programmer you should do yourself, and anyone else working with your code, a favor: stop using copy and paste. If there is a case where you need to use the exact same, or very similar, code in multiple places, that is a sign that you should be using a function, object, or other structure. I say this not for the sake of ‘proper coding practices’, but to save you and anyone else dealing with your code a massive headache.

I admit that there have been cases where I have copied code (I was young! I didn’t know what I was doing). From experience, I can tell you that I have almost always regretted it. For one, it is very annoying to have to make changes in multiple parts of your program. For another, when you copy code, you also copy any errors in that code. You might be inclined to think that you’ve been careful or you are certain that this code doesn’t contain any errors, but in retrospect, I think at least half of the time I’ve done this even when it is a fairly simple program, I’ve had to go back and fix bugs in multiple places.

This headache is even worse for someone who is working with your code. When making changes or updates that person may think that they’ve done what they intended when in fact they’ve only addressed half or less of the problem. The same goes for errors: it can be extremely frustrating to think that you’ve fixed a problem and then have it present itself again because you are unaware that the original coder used copy and paste.

If you have the means, I advize you to attach USB electrodes (you can buy them at target) to yourself and have yourself zapped everytime you press ctrl+c.

PR 0 to 4… By Accident

After hearing some noise about Google having updated page rank, I got curious and decided to check the PR of this site. I launched this site shortly before the last update, when it was assigned the rank of zero. This time around, to my surprise, I found that the index, as well as one other page was ranked 4. This was surprising because I have not made any deliberate effort to increase the rank of this site. I got bored of submitting to directories, stopping after about 20, I haven’t purchased links, spammed blogs or forums, or consulted with an ‘SEO Expert’.

So, how then, did I accomplish this feat? Did I discover a secret SEO technique? Did I go down the street to my local Google building and hold them up at gun point? If I had a secret I’d tell you. The reality is that I have been focusing almost exclusively on my content. At times, actually, it seems to me that that IS the secret. Whenever I see people asking how to increase their page rank, I see responses like “get more backlinks” and “post your links on forums”. I never see someone respond “improve the quality of your content”.

I don’t claim to all of a sudden be an expert on the topic, nor do I intend to brag up this ‘accomplishment’. My point is that spending time dropping links, submitting to directories, and sending fruit baskets to Google should all be secondary to improving your content. It is important to understand that high page rank is a symptom of high traffic. If  I, for whatever reason, wanted to catch the flu would I focus on giving myself a headache and inducing vomiting? No. I would go to the hospital, get down on my hands and knees, and start licking some floors. The same thing is true of page rank and traffic (not the licking floors part). Artificially increasing PR by spamming links will not lead to increased traffic.

Opera: A developer’s browser

My previous entry will suggest to the reader (and by suggest I mean scream aloud) that I am not fond of IE. In addition to this I mention that I use Opera as my primary browser. Typically for windows users, if you are not using IE you are using Firefox. Opera is a less popular browser, but I think much more useful for developers.

Out of the box Opera has several features that have made my life easier many times. One of them is fairly simple: if you want to validate a page, all you have to do is right click and click validate.

Another feature allows you to dynamically update the source code of a site. All of the popular browser allow you to view a page’s source, but Opera allows you to view that source, edit it, and have the changes applied to the currently open page. This if very useful for making adjustments to a layout, or on occasion, making a broke page work. For example, I was trying to register on a site using a captcha. Normally this isn’t a problem, but in this case the text field only allowed 4 characters, while the image contained 5. All I had to do to work around this was view the source, and change the size attribute to 5. This is the only case I can think of off the top of my head, but there are others.

This is just the tip of the iceberg. Opera’s developer console lets you view the DOM structure of a site and view cookies. If a site is hard coded to only work in IE or firefox, Opera can identify itself as either of those. If you want to open a page in another browser all you need to do right click select open with and choose any installed browser. The list keeps going.

If you haven’t given Opera a try you should definetly do so.

Internet Explorer No Longer Essential

I haven’t used IE as my primary browser for a long time. I switched from IE to Opera several years ago. Until recently, however, I’ve always had Internet Explorer installed. This is because there always seemed to be some website that either insisted on users using IE, or was coded in such a way that it only worked in IE. It seems counter-intuitive to me that the browser that is the least standards complient should be the one that everyone codes there website to work with, even at the expense of all of the other browsers, but that was the reality of the industry for a long time. Only now do I believe that that era is almost over; it is now possible to uninstall IE permanently.

The actually came to this conclusion by accident. A few months ago, my cable provider forced me to reactivate my Internet connection. This was altogether a very clumsy and inconvenient process. It consisted of me entering some personal information along with an account number and then waiting for their system to recognize my modem. Not only did this take an incredibly large amount of time, but I also had to complete the entire process in IE6. When I say IE6 I really mean IE6… their software downgraded IE7 without asking. Around an hour an a half later when the process was complete I found that this extremely (really I can’t stress this enough… Cox Communications I hope you read this) poorly written software had crippled IE. It crashed the moment I opened it.

Not wanting to deal with the additional stress of fixing this problem, having just spent a considerable amount of time dealing with my cable company, I happily went about my business, using Opera as my primary browser as always. A few days ago I clicked the IE icon by mistake and was reminded that it was still crippled. After a few months I had not yet needed to open IE even once. This came as a surprise to me as in the past I had occasionally needed to use IE to do things like accessing my bank account and watching certain videos.

So what is the blame for this development. Part of it, I think, is that developers have been (finally) focusing more on cross-browser support. Another part is the browsers themselves; Opera has a feature to identify itself as other browsers, which i nice for circumventing sites that are hardcoded to only allow FireFox or IE. Regardless of what is responsible, I think this is a step in the right direction.

Strings and Output in PHP

I’ve seen a lot of questions and false assumptions regarding strings and output. Here is a short review of some common questions:

print vs. echo

I often see people suggesting that others should use echo as opposed to print for performance reasons. While it is true that echo is faster than print, the difference is insignificant. The reason echo is faster is because print behaves like a function (even though it’s a language construct) and sets a return value. This being the case, there are some important differences between the two, and there are a few cases where you have to use print in place of echo.

As I mentioned, print behaves like a function. You can use it just as any other funciton call and it has a return value (always 1). Like print, echo is also a language contruct, but does not behave like a function. Here are some cases where print must be used in place of echo:

$b ? print "true" : print "false";
//with echo
function isOne($x)
{
	if($x == 1)
	{
		echo $x . ' = 1';
		return true;
	}
	else
	{
		echo $x . ' != 1';
		return false;
	}
}

//with print
function isOne($x)
{
	if($x == 1)
		return print($x . ' = 1');
	else
		return !print($x . ' != 1');
}

In the print implementation of isOne the speed difference between echo and print is negated.

Concatenated strings Vs. Multiple calls to echo

Is it faster to concatenate all of your output into a single string and then output it or is it faster to output your strings as you go? The answer is the former; concatenating strings is faster than doing output. Does this mean that you should cram all of your output into one string and output it at the end of your script? Not necessarily. Like the performance difference between print and echo, the performance difference here is almost always negligible. Use which ever method makes sense at the time. You should prefer to concatenate your output, but sometimes ‘output as you go’ makes for cleaner code which is more important in this case.

Double vs. Single quotes

PHP provides two different ways to encase strings: Double quotes and single quotes. There is an important difference between the two. If your string is encased in double quotes, you can place variables with in it like so:

$x = 5;

echo "x is equal to $x";

What this means is that PHP must parse the string, looking for any variables. Single quotes on the other hand are never parsed, meaning that if your string contains no variables it is preferable to use single quotes. Similar to the previous examples, the performance difference is minor; however, using double quotes for no apparent reason is just wasting runtime. Here are some good reasons to use double quotes:

$x = 5;
//your string contains a variable
echo "x = $x";
//your string contains single quotes
echo "Matt said, 'peanut butter is awesome'";

In the second case, it would be faster to escape your single quotes using \ but the performance difference is minor and using double quotes looks cleaner.

Output without echo (or print)

Print and echo are not the only ways to do output. At any point in your code you can close your PHP tags and start doing output as if it were normal HTML:

<?php
for($i = 0; $i < 10; $i++)
{
	?>
	<p>Hello</p>
	<?php
}
?>

This will output Hello, nested in paragraph tags, ten times. This method is useful when you have a large amount of output that you don’t want to have to worry about nesting in quotes. Personally, when using this method, I prefer to use PHP’s alternate control flow syntax:

<?php if($x == 10) : ?>
	<div>X is equal to 10</div>
<?php else : ?>
	<div>X is not equal to 10</div>
<?php endif; ?>

You can use this syntax with all of PHP’s control flow statements (ie while, if, for, foreach, switch, etc…).

Output Buffering

What should you do if you have a large string that needs to be manipulated? Normally if you had a very large string, such as a block of HTML, you could use the method I just described, however, doing it like that would output the string immediately. Using quotes is clumsy and unnecessary. The answer is to use an output buffer:

<?php
ob_start(); //start the output buffer
?>
<div>My Website</div>
<p>This is my website where I write about stuff and junk. Enjoy!<p>
<div class="footer">copyright 2009 tinsology.net</div>
<?php
$output = ob_get_clean();
echo str_replace('2009', '2010', $output);
?>

In the above example, ob_start() starts the output buffer. From this point on, any output, including output from print and echo, will be stored in the buffer. ob_get_clean() returns the output as a string and deletes the buffer content. PHP’s output control functions contain some very useful functionality, I highly recommend that you read the documentation carefully: PHP output control functions

Understanding the 64-bit Movement

A few years ago AMD released the Athlon 64 processor. Though this processor was not the first 64-bit processor, it was the first aimed toward general consumers. Chances are, if your computer was manufactured in the last few years, you have a 64-bit compatible processor. You may be wondering what the difference is and what this means to you. If you’ve looked into the issue in the past, you’ve probably determined that the most significant differences is that in order to have more than 4GB of RAM, you need a 64-bit processor. The reasons for this and its implications may not be so clear, so I’ll do my best to clear things up.

32-bit Vs. 64-bit

Even if you are aware of what it means to have a 64-bit processor, you may not understand what makes a 64-bit processor 64-bit. Do fully understand this you need to understand the inner workings of a processor. A comprehensive explanation of this is beyond the scope of this article, and most likely beyond my ability to explain. I can, however, point out the basics.

As you are probably aware, computers have multiple levels of memory. Most people are aware of the two largest (in terms of storage) types of memory: hard disk and RAM. You may also be aware of a third type Cache. There is, however, a fourth level; Registers. The size and access time of these different types of memory decrease in the order I mentioned them. A hard disk can typically hold many gigabytes of data, RAM can hold a few gigabytes, cache can hold a few megabytes, and registers and hold only a few bytes. In the context of the difference between 64-bit and 32-bit systems, the level of memory we are most interested in are the registers.

Your processor has several registers. These registers hold only a small amount of data. This is where the difference between a 64-bit processor and a 32-bit processors comes into play. On a 32-bit machine, a register can hold 4 bytes of data. A byte consists of 8-bits, 4 bytes is therefore equal to 32-bits. As you might have guessed already, a register on a 64-bit processor can hold 8 bytes of data, which translates to 64-bits. This is the only necessary physical difference between a 32-bit and 62-bit system.

Memory

So what are the implications of this? As I mentioned earlier, the maximum amount of RAM you can have on a 32-bit system is 4 gigabytes. To understand why this is you need to further understand how data is stored, and how the processor accesses that data.

Your processor cannot read data directly from the hard disk, RAM, or cache. All of the data must flow down into the registers before the processor can do anything with it. This puts certain limitations on your data. All data in your machine is represented in binary, a bit is one binary digit. So if a register can hold 32 bits of data, it can hold a binary number 32 digits long. 32 digits might seem like an incomprehensibly large number, and if it were in decimal it would be, but in binary the largest number that can be represented with 32 digits is around 4 billion (2 to the 32nd power minus 1).

As I mentioned before, the processor does not work directly on data in RAM. In order for your processor to read this data it must first bring it into a register. So how does the machine know how to find this data? It has an address. Every byte in your RAM has a numeric address. In order for the process to fetch data from this address, the address must be stored in a register. As I mentioned before the largest value that can be stored in a 32-bit register is around 4 billion. Since every byte in RAM has an address, there is an upper limit of around 4 billion on the number of bytes that can be stored. There are roughly a billion bytes in a gigabyte, so in a 32-bit system the maximum amount of memory you can have is 4 gigabytes.

By now you probably understand why you need a 64-bit machine to have more than 4 gigabytes of memory. In a 64-bit machine you can have 2 to the 64th power number of memory addresses (as opposed to 2 to the 32nd bytes which equals 4 gigabytes). If you’re good at math you’ve probably figured out that this is equal to 2 to the 3nd squared, or roughly 4 billion squared. 16 billion billion bytes in computer terms is 16 exabytes. To understand how much data that is, consider your own computer. If you have 4 gigabytes of memory in your computer, you would need 4 billion of them to store that much data. Currently there aren’t even 4 billion personal computers in existence, and not all that do exist have 4 gigabytes of memory.

Other Implications

Memory addresses obviously aren’t the only things stored in registers. If you are a programmer most likely you know that the largest unsigned integer is around 4 billion. This is because an int is stored in 32-bits of memory. This fact might not change as a result of a move to 64-bit systems; there might not be a practical reason for increasing the size of an int. Other data types, such as float, require more than 32-bits of data. Moving to a 64-bit system means an entire float can be placed in a register.

Not everything about moving to 64-bit is good, however. In order to enjoy the benefits of a 64-bit system, you need a 64-bit operating system. The programs you are running also need to be 64-bit compatible. If you are a programmer you might be relieved to hear that in most cases porting a program to 64-bit is a matter of compiling your existing code with a 64-bit compiler. Its not always quite as simple as that, however, and identical code compiled on different compilers may experience different issues.

In addition to this doubling the size of memory addresses means that twice the data is now moving down the pipe to and from registers. This factor also involves the operating system. Most 64-bit processors are 32-bit backward compatible, meaning that you can install a 32-bit operating system on it and not notice the difference. You will notice the difference, however, by installing a 64-bit operating system on a low end machine. I experienced this first hand by installing XP 64-bit on an AMD Turion machine from 2006. Though the Turion machine was a 64-bit, it didn’t handle the impact of a 64-bit operating system on memory bandwidth.

Should I migrate to 64-bit?

Virtually any personal computer you buy today supports 64-bit. Does that mean you should go out and buy a 64-bit operating system? Not necessarily. There is no real benefit to using a 64-bit operating system unless you need a machine with more memory. Eventually the standard amount of memory on a mid-grade machine will surpass 4 gigabytes and in fact that day is coming soon. Unless you desperately need to upgrade your current machine, you may want to wait until you purchase your next machine to consider going 64-bit. With Windows 7 around the corner, the time to buy a new system might be coming close. If your current system or the one you are considering purchasing is a low-end machine you may want to stick with 32-bit. Just as the transition from 16-bit to 32-bit was, the transition from 32 to 64 is inevitable. If you are a person who likes to buy one machine and upgrade it over the years then 64-bit is the way to go. If you periodically purchase a new machine there is no real advantage to 64-bit unless you want more memory.

Over Optimizing

The word optimal has become a buzz word that gets tossed around too easily. Something a professor of mine pointed out a few weeks ago: By definition, optimal is something that isn’t realistic to achieve in the context of programming or software engineering. If something is optimal, there is no room for improvement, which is never the case. Some people may be irritated by the misuse of this word, but I think what is worse is the practice of optimizing.

There is a fine line between making meaningful improvements to the performance of your program,  and wasting coding time. If you have some data that needs to be searched through, a good use of your time would be deciding on a data structure to store your data, and an algorithm to search through it. A bad use of your time would be to go through all of your loops and change i++ to ++i. For those of you who are not aware of the difference, ++i is slightly faster than i++. The difference, however, is so tiny that you should never spend any amount of time changing one to the other. The same thing goes for print and echo. In PHP echo is slightly faster than print, but in a real world application it would be difficult to measure the difference.

I like to call these practices trivial over optimizations. There is a non-trivial kind of over optimization that I think is an even bigger time waster. Earlier I mentioned that a good use of your time would be deciding which data structure to use to store data that you plan to search through. This being the case you might want to spend some time comparing the performance differences between a binary search tree and a priority queue, correct? Maybe. If this program is being written to control someone’s pacemaker then the answer is certainly yes. If you plan to sort someone’s play list with this program then the answer is probably no.

There is a third kind of optimization that is not an optimization at all: If you are searching through ten items, a linear search is faster than a binary search. This is because of the over-head in constructing the binary search tree. Optimization is a practice that should belong to programs that are dealing with a large amount of data, or are time-critical. If you are concerned with the performance of your program the solution is not to optimize the code you have, the solution is to learn to write code in such a way that it doesn’t need to be optimized. This is something that comes with experience and attention to detail.

Kontera for a Month

Roughly a month ago I started serving Kontera, in-text, ads. Based on a month of use here are my impressions of the platform:

Ad Relevance

I found that the ads were relevant only to the term selected for the link, meaning that the ad served was based on the word or phrase in the link. This being the case, often times the ads were irrelevant to the overall content. For example, if I wrote a post about login systems, or otherwise mentioned login systems. the word system could be selected as a link to an ad about air conditioning systems.

Revenue Generated

Before I go into detail, let me mention that I do not generate a lot of ad revenue, nor do I expect to. If I break even on the domain costs and hosting fees I’m happy.

I did generate some revenue, and I imagine on a site with more traffic that revenue could easily be significant. What money I did make, however, was far less than was I make on my AdSense ads. I also found that in spite of the fact that the kontera ads are in-text and the Google ads are outside of the main text, the AdSense ads received more clicks. If I had to guess as to why this is, I would say it is because visitors to this site are well aware of what those ads are and for the most part arn’t interested. I imagine the relevance of the ads that I discussed above also plays a factor in this.

Intrusiveness and Conclusion

I personally don’t find in-text ads to be overly intrusive. After some thought, however, I decided that having users mistakenly click what they think to be relevant links and instead be served an ad is not worth the amount of revenue generated. I found that the ‘traditional’ banner ads were more effective and relevant.

Opera Tab Override User JS

A few days ago I released a WordPress plugin that overrides the default tab keypress action. This means when you press tab it inserts a tab character rather than moving focus from one field to another. I made a few modificaitons to this script in order to turn it into an Opera UserJS application. To use Tab Override for Opera, copy the following code and place it in a file called taboverride.js in your UserJS file:

function operaTab(e)
{
	if (e.keyCode == 9)
	{
		var obj = e.target;

		obj.selection = document.selection.createRange();
		obj.selection.text = String.fromCharCode(9);

		e.stopPropagation();
		e.preventDefault();
	}
	return false;
}

function tablistener(){
	var textareaArr = document.getElementsByTagName('textarea');
	for( var i in textareaArr )
	{
		textareaArr[i].addEventListener('keypress', operaTab, true);
	}
}
window.onload=tablistener;

Using UserJS
If you don’t have UserJS enabled or are unsure if it is enabled, here’s how you do it:

  • In Opera, on the toolbar click Tools -> Preferences...
  • Click the Advanced tab and select the Content submenu.
  • Make sure Javascript is enabled and click the JavaScript Options... button.
  • Under User JavaScript Files should be the path to your userjs file. The default is C:\Program Files\Opera\userjs.
  • If the path is not set, set it to the default. All you need to do to activate a User JS file is add it to the specified folder.

Some Issues with the script

The script will only work on textarea fields. This is intentional, as in other fields the default tab action is preferable. If you want to override all tabs, modify the line var textareaArr = document.getElementsByTagName('textarea'); to read var textareaArr = document.getElementsByTagName('form');

The script won’t work on some platforms due to interfering scripts. I’ve experienced this on some but not all vBulletin forums. I believe it is possible to override these scripts, but I have not yet figured out how.

Ads and Us: An Uncomfortable Relationship

Yesterday I read some comments by someone who was upset that Widgetbox modified their policy to prevent the inclusion of ads in the widgets submitted to their directory. The author claimed that this change in policy was uncalled for. All of the ads he inserted into his widgets were easily removable. People were getting these widgets for free so it should be his right to insert his own ads into them, right?

If you are a developer or publisher you might be inclined to agree with the author. If you are more often a user of such applications you might disagree. Though the author, justifies himself by claiming the ads are easily removable, some among us, however, go to great lengths to keep ads or links within their applications from being removed. The problem, in this case, with portable web applications is the source is always available. The solution to this, too often, is to obscure the code in order to prevent users from modifying it. What about users modifying it in good faith though? I personally respect a developer’s right to include a link in their free software. This does not mean that I don’t sometimes need to make changes to the source code.

A larger problem than this is that this behavior contributes toward our conflicted viewpoint regarding ads. We both love and hate ads. If ads were a person they would be the ex-girlfriend you’re still attracted to but know you dont’ get along with. As publishers we put ads in our content, but as users will fully ignore them. How many people have sites containing ads, but also have the firefox plugin that prevents ads from being displayed? At the same time we want people to click the ads on our sites, but are unwilling to even look at the ads on others’ sites.

This conflict is most severe for tech oriented sites. If you have a tech site, your users (most likely tech people) are ad blind. Most likely a person can read this entire post without knowing what the ads even say. Ad placement plays a factor in this of course, but even when the ad is right in my face, I don’t give it a second thought. In fact, I think that intrusive ad placement as well as flashing, animated, or worst of all, ads with sound contribute to a users unwillingness to even notice your ads.

The tech oriented portion of the Internet is a special case of course. I went to a conference a couple years ago and heard a Microsoft representative claim that the ads on their mail service were not meant for us. Us refers to us at the conference: tech people. As a niche, we don’t generate ad revenue. When is the last time you went on someone else’s site and clicked an ad? When did you even read them? Without looking, do you know what any of the ads on this site say?

So what is the solution to this problem? Is it even a problem? If you’re a publisher in the tech niche you might be concerned that your users are experiencing ad blindness. I’ve already pointed out that more intrusive or attention getting ads are not the solution. The only thing you can do is give ads a chance. I encourage you of course to ignore intrusive ads as they are part of the problem. What about discrete ads placed in good faith, by tech people hoping to pay the bills or earn some pocket cash off an ad supported app? I’m not telling you to click ads. If you are a publisher or developer who uses ads in their work, I’m telling you to read the ads. See what they’re about.

This brings us back to a question I asked earlier. Does a developer have the right to insert an ad into an application that is meant to be used on someone else’s site? I think the answer depends on the method and the goals. If your ad subtracts from the content of your app or its users site then the answer is no. If your purpose is the generate ad revenue then the answer is no. We need to get over the mentality that ad revenue will make us rich. Even if you make is significant amount of money from ads, you should realize that credit does not belong to the ads, it belongs to the content.