Archive for June 2009

Creating a Secure Login System the Right Way

Making a custom login system is a common task for beginning PHP developers. Jumping right into it, however, may not be the best approach. There are several important aspects do building a login system that not only makes it work, but makes it safe.
Updated on December 15th 2009: Added Session Control Section

Getting Started

To begin with, we’ll create our login form. This doesn’t need to be anything fancy, just a couple of input fields and a submit button:

<form name="login" action="login.php" method="post">
	Username: <input type="text" name="username" />
	Password: <input type="password" name="password" />
	<input type="submit" value="Login" />
</form>

The above example is stripped down; there is no formatting or styles so it will most likely won’t look to great if you copy and paste the code. Making your form pretty is beyond the scope of this article. In the form tag, notice that it has three attributes: name, action, and method. Name identifies the form and is not very important in the context of this article. Action identifies the script that will be processing the login, often times your form and the processing code are in a single file, but this does not have to be the case. Method typically takes one of two values: post or get. If you submit a form using GET the data is URL encoded and will be visible in the address bar. If the method is post the data will not be URL encoded. As you may have guessed there is actually a lot more to it than that but the difference between post and get is a topic for another article. We want to use post.

Notice that our input fields are given name attributes, this is important as we will need to identify and access these values by this name. PHP identifies form data using the name attribute, not the id. Now that our form is complete we can move on to processing the data.

Storing our Data

Actually we can’t process the data just yet. Before that we need to worry about how our data will be represented in the database. There are three essential values we must store in our database: the username, password, and a salt. We will get into what a salt is later. In addition to this you may choose to store other things about your user. It is also common to give the user a numeric user id. This is not absolutely necessary but it is common for tables to have a numeric, sequential primary key. For our purposes assume our table structure is this:

CREATE TABLE users (
	id INT NOT NULL AUTO_INCREMENT,
	username VARCHAR(30) NOT NULL UNIQUE,
	password VARCHAR(40) NOT NULL,
	salt VARCHAR(3) NOT NULL,
	PRIMARY KEY(id)
);

In the above SQL code we create a table called users having columns id, username, password, and salt. Even though we could use usernames to uniquely identify users we will use the id for this instead. One reason for this is that integer comparisons are cheaper than string comparisons, so searching through a large number of users will require fewer resources. Convention is another reason.

There are a few keywords in the above code that I’ll define for you. Not null means that each tuple (a tuple is a row in our table, each user will have a row) must have a value for this column. In this case all of our columns are ‘not null’ so every user must have each of these values. Auto increment applies to numeric primary keys. It allows us to give each user a sequential id without worrying about collisions or what the most recent users id was; the database will assign each user a correct id automatically. Unique, as you may guess, means that the value must be unique. In our case, no two users may have the same username. Finally, primary key tells the database which field will uniquely identify each row. No two users can have the same id. It also creates and index on the specified column, meaning that lookups will be faster (at the expense of memory).

Each column also has a data type. The id is and integer while the remain fields are varchars. Varchars are just arrays of characters, as are strings. Varchar(30) means a sequence of 30 characters. For the username I chose 30 as the length for no particular reason; it allows for a reasonable amount of characters without letting users write a paragraph. The lengths for password and salt are important and I’ll get into that later. Now that we have created our table we are ready to process our data.

Populating our Table

Actually we can’t process the data just yet. You can’t login if your users table is empty. Just like on any site, you have to register before you can login. To accomplish this we will create a simple registration form.

<form name="register" action="register.php" method="post">
	Username: <input type="text" name="username" maxlength="30" />
	Password: <input type="password" name="pass1" />
	Password Again: <input type="password" name="pass2" />
	<input type="submit" value="Register" />
</form>

The above code is similar to our login form code with one notable difference. In the username input field I specify the maxlength attribute as 30. This means the field can only contain 30 characters and corresponds to the username length we specified in our SQL code. Notice I don’t enforce the length of passwords even though they are defined as 40 characters in the SQL code, you will see why it is not necessary to do so later. Now that we have our registration form we can process our data.

Sign Me Up

Our registration data that is, we still can’t process our login data. At this point we actually get to write some PHP code (are you as excited as I am?).

register.php (part 1):

<?php
//retrieve our data from POST
$username = $_POST['username'];
$pass1 = $_POST['pass1'];
$pass2 = $_POST['pass2'];

if($pass1 != $pass2)
	header('Location: register_form.php');

if(strlen($username) > 30)
	header('Location: register_form.php');

In the above code we retrieve our data from $_POST, which is an associative array where all of the post data is stored. We also check if pass1 and pass2 are equal which is an example of validating user input. If they are not equal we use the header function to redirect back to our registration form (assume the registration form is located in a file called register_form.php). Ideally we would want to display an error message, but for the sake of example we will keep it simple. We also check if the username exceeds 30 characters. Even though we set the form to allow no more than 30 characters it is important that we check as well; it is possible (and very simple) to bypass the limit imposed in the html code. In addition to this you should check for any other constrains you have placed on your data (valid characters, minimum length, etc).

Hashing

As I mentioned earlier, the length of the password varchar in the database is significant. This is because we are not actually going to (and never should) store the password in the database. We are going to store an sha1 hash which is a string always containing 40 characters. In simple terms a hash is an algorithm that maps inputs to outputs in a deterministic way. Meaning that given an input the algorithm will always produce the same output. Sha1 is an algorithm that outputs a 40 digit hexadecimal value. As in most cases, given the output of the sha1 algorithm it is not easy (though possible) to determine the input. In PHP we can get the sha1 hash of our password like this:

regsiter.php (part 2)

$hash = sha1($pass1);

Pass the Salt

As I mentioned before, there is no simple way to determine the input of the sha1 algorithm from the output. It is, however, possible through brute force or more complicated means. One way to improve the security of your users’ passwords is to use a salt. A salt is just a random string of characters that is appended to the hash, which is then hashed again.

regsiter.php (part 3)

//creates a 3 character sequence
function createSalt()
{
	$string = md5(uniqid(rand(), true));
	return substr($string, 0, 3);
}

$salt = createSalt();

$hash = sha1($salt . $hash);

Some people will tell you that a salt is not necessary, but it certainly doesn’t hurt to use one.

Now comes the database portion of our code, for this I will assume we are using a mysql database. I will also assume the database host, name, user, and password. I won’t anticipate any database connection errors in the code, but you should.

register.php (part 4):

$dbhost = 'localhost';
$dbname = 'tinsology';
$dbuser = 'tinsley';
$dbpass = 'trueblood'; //awesome tv show

$conn = mysql_connect($dbhost, $dbuser, $dbpass);
mysql_select_db($dbname, $conn);

//sanitize username
$username = mysql_real_escape_string($username);

$query = "INSERT INTO users ( username, password, salt )
		VALUES ( '$username' , '$hash' , '$salt' );";
mysql_query($query);

mysql_close();

header('Location: login_form.php');

In the above code we establish a connection to our MySQL database and add our new user to the users table. Then we redirect to our login form. Notice that we call the funciton mysql_real_escape string. This function helps to prevent SQL injections by escaping the input. Using an abstraction layer like PDO will also help to prevent this. Now that we’ve processed our registration data we can write the code to process our login data.

Logging in

Seriously this time. Our login processor will pull the login data from post and compare it to the database values.

$username = $_POST['username'];
$password = $_POST['password'];

//connect to the database here

$username = mysql_real_escape_string($username);

$query = "SELECT password, salt
		FROM users
		WHERE username = '$username';";
$result = mysql_query($query);

if(mysql_num_rows($result) < 1) //no such user exists
{
	header('Location: login_form.php');
}

$userData = mysql_fetch_array($result, MYSQL_ASSOC);
$hash = sha1( $userData['salt'] . sha1($password) );

if($hash != $userData['password']) //incorrect password
{
	header('Location: login_form.php');
}

//login successful

As I mentioned before, this is a stripped down example. Ideally you would not have the code you use to connect to your database in each file, but rather in a single file or function that you could include. In addition to this you should maintain a session throughout the process in order to store and report error message, as well as other useful data. The most important thing to remember when creating a login system is that you should never trust your users. Validate all user input, protect against SQL injections, and never store raw passwords in the database.

P.S. Session Control
Added December 15th 2009

All of the above code illustrates how to register a user, and allow them to login. There is, however, one fundamental piece that is missing: session control. In order for a login system to be useful, it must provide some means to distinguish between a logged in user and a non-logged in user across the entire site. Sessions are the means by which we do this. What we want to do is, after a user has successfully logged in is indicate, using a session variable, that the user has done so.

Accessing Session Data

Activating and managing sessions in PHP is very straightforward, you only need one function to both create and recall a particular session: session_start(). Here is a generic example of how to use session_start to store session data:

page1.php

session_start();

$_SESSION['foo'] = 'bar';

page2.php

session_start();

echo $_SESSION['foo']; //will output bar

In the above example you can see that in page1 we start a session and assign the session variable “foo” the value “bar”. If the user visits page2 sometime there after the session will be recalled and the value of foo (bar) will be echo’d. Notice that session_start creates a new session if one does not exist or recalls that session if it already exists.

There are a few subtleties relating to session_start that are important to remember. You must call session_start before any headers are sent to the browser. This means that your script cannot have any output or calls to header() (or any other function that sends headers) before calling session_start. The simplest way to avoid this problem is to call session_start before anything else. One common case where this problem can occur is when some code that normally wouldn’t have any output generates an error or warning prior to calling session_start. The default error handler will automatically output any error messages to the browser.

Generally (though not necessarily), a session lives as long as the browser remains open.

Using Sessions in Our Login System

There are three basic functions we want to incorporate into our login system: validating a user (i.e. indicating that user has logged on), checking if a user is logged on, and logging a user out.

Validating a User

function validateUser()
{
	session_regenerate_id (); //this is a security measure
	$_SESSION['valid'] = 1;
	$_SESSION['userid'] = $userid;
}

This function simply sets the session variable ‘valid’ to 1. You may also want to use this function to store certain variables. It is a good idea to store frequently accessed data about a particular user (such as a user id, username, NOT a password or sensitive data). The $_SESSION['userid'] = $userid; line is an example of how to store user info. Additional information about session security will be provided in the following section.

Checking if a User is Logged On

function isLoggedIn()
{
	if($_SESSION['valid'])
		return true;

	return false;
}

This function simply checks if the session variable ‘valid’ is set to 1.

Logging Out

function logout()
{
	$_SESSION = array(); //destroy all of the session variables
	session_destroy();
}

When it is time to log a user out, we destroy the session. All of these functions assume session_start has already been called.

Now it is time to incorporate this into our login script:

session_start(); //must call session_start before using any $_SESSION variables

$username = $_POST['username'];
$password = $_POST['password'];

//connect to the database here

$username = mysql_real_escape_string($username);

$query = "SELECT password, salt
		FROM users
		WHERE username = '$username';";
$result = mysql_query($query);

if(mysql_num_rows($result) < 1) //no such user exists
{
	header('Location: login_form.php');
	die();
}

$userData = mysql_fetch_array($result, MYSQL_ASSOC);
$hash = sha1( $userData['salt'] . sha1($password) );

if($hash != $userData['password']) //incorrect password
{
	header('Location: login_form.php');
	die();
}
else
{
	validateUser(); //sets the session data for this user
}

//redirect to another page or display "login success" message

Now we can use the isLoggedIn function to determine if a user is logged in and act accordingly:

membersonly.php

session_start();

//if the user has not logged in
if(!isLoggedIn())
{
	header('Location: login.php');
	die();
}

//page content follows

A Note About Session Security

By default, sessions are cookie based. This means that a particular session is associated with a user by use of a cookie. This being the case, there are certain vulnerabilities that arise. There are a variety of methods by which a session can be hijacked (XSS for example; a javascript injection can cause a user to give up their session id). Unfortunately you cannot expect to eliminate the possibility that a user has hijacked someone else’s session. Further research into the subject may yield a few tricks, but ultimately the best practice is to be cautious about certain tasks. If a user wants to change their password (after logging in), require the old password and whenever possible avoid displaying sensitive data.

Earlier I mentioned that you shouldn’t use session variables to store sensitive data. It is important to remember that session data lives on the server, this means that a user cannot directly view or modify session data. On a shared server, however, other users of that server may be able to access this information.

Just a reminder, all of the code examples I provided are exactly that: examples. They are meant to serve as a starting point, and hopefully shed some light on a few key concepts, but they are not is a real world implementation. I don’t recommend, for example, using header() to bounce your users around to different pages; there are better methods that reduce the number of page loads and give the user a more fluid experience. Using this method, however, simplifies things and makes the examples easier to follow.

Scripting Vs. Programming

Occasionally I’ll see someone make a point of distinguishing coding in a particular language as scripting as opposed to programming. Often times the distinction is arbitrary. I’ve seen justifications for this distinction ranging from scripting languages not being as strict as programming languages, to scripting languages not being turing complete. Web languages in particular (HTML, javascript, PHP, etc) seem to have the stigma of being scripting languages. To this day, however, I have not seen a non-trivial definition of the difference (or perhaps just one that satisfies me).

This does not mean that I don’t think there is a difference; in my own mind I tend to draw a distinction. I do this, however, based on the practice of coding itself rather than the language. As an aside, before I go into detail, I’d like to mention that I’m not trying to pass my opinion off as a definition or absolute truth, just as my opinion. When I think of script, I think of HTML in particular. This isn’t because it isn’t Turing complete, or because it isn’t compiled, or because it is a “web” language. I make this distinction based on the tolerance for error in the practice of coding HTML. Go to any site and validate its source. Chances are you come up with multiple errors. This would not be tolerated in a language like C++. I’m not trying to say that HTML is inferior, when in fact it really wouldn’t be fair to compare it to the “traditional” programming languages.

Some languages are often referred to as scripting languages, but I feel are more like programming languages. PHP (and its web programming counter parts) for example, is often considered a scripting language. If you take into consideration my earlier assessment of HTML however, you will see why I do not consider it as such. If I forget to close a tag or if I use improper syntax in an HTML script there is a good chance that it will display just fine in a browser. In PHP on the other hand, if I forget a closing bracket or use incorrect syntax, my script will fail. Even if it is able to run, I will see unexpected behavior, and there is no attempt (and should not be) by the interpreter to correct these mistakes.

I don’t think it is the case that you can divide all of the languages in use today into scripting languages or programming languages. There is some gray area, and many languages have elements that are script like even though it would be difficult to consider them entirely a scripting language. Javascript and Xquery, in my mind are examples of this. As I mentioned earlier these are just my perceptions. I think that the difference between scripting and programming is completely arbitrary. I think one of the main reasons that such a distinction exists is simply so programmers can point out that writing HTML or PHP isn’t “real programming”. The difference really isn’t that important, which is why I don’t think it is necessary to create a formal definition.

Client Side Vs. Server Side Code

In my experience, one of the most common pitfalls for beginning programmers is not understanding the relationships between objects in their environment. This is especially the case in web development where there is in almost every case a blend between multiple client side and server side scripts. Failure to understand the the way browsers and servers communicate or the relationships between (X)HTML (or JavaScript or CSS etc) and PHP (insert alternative language here) will certainly lead to a poor or incorrect implementation. If you are an experienced programmer you probably won’t gain much from reading this, but if you are a beginner, hopefully I can provide some insight that will save you a lot of trouble.

The difference between client side and server side code is fairly simple. Client side code is processed by the client (the browser to be more specific) while server side code is processed by the server. HTML for example is parsed by the browser; the browser is responsible for taking that code and turning it into what you see in your window. For the purposes of parsing web pages, there is a short list of the types of code the browser can deal with. A typical web page, as far as the client is concerned, consists of some flavor of HTML often supplemented by CSS, or JavaScript (an exhaustive list of the types of client side code is beyond the scope of this entry).

Server side code, on the other hand, is never seen by the browser. The browser is not and should never need to be aware of server side scripts such as PHP. While a web page consists of client side code, this code is often either partially or entirely generated by a server side script. For example:

$title = 'Client Side Vs. Server Side Code';
if($title == '')
	echo '<title>Tinsology</title>';
else
	echo "<title>$title</title>";

When you navigate to a page containing the code above the browser will see “<title>Client Side Vs. Server Side Code</title>”. That’s it. The browser does not see any of the PHP code that generated the title. When you request a page containing PHP code from the server, the server first processes that page and then sends the resulting output to the client.

Server side code is browser independent (unless explicitly coded otherwise). This means that if the page you create looks different in Internet Explorer than it does in Opera it has nothing to do with your PHP code, but rather the resulting client side code.

That Time of the Quarter

F.I.N.A.L.S.
F%*k I never actually learned s&^t.

Back in one week.

Bubble Sort is Never the Answer

It is not too often in the real world that you have to implement your own sort. Generally, whatever language you are using has a library with this functionality built in. If the occasion does arise, however, it is important to understand which algorithms are applicable in which situations. As with most choices, there is no absolute correct answer; there are many trade offs to consider. When choosing an algorithm there are three things you should consider: performance, overhead, and ease of implementation.

You should give equal consideration to each of these factors, disregarding any one of them can lead to poor choices. It is common, for instance, for people to ignore the ease of implementation and focus on the performance of the algorithm. The problem with this is that not every operation is critical. No one is going to die if they songs on their play list do not get sorted quickly enough. Programmer time is more expensive than run time as a professor of mine often said. In addition to this, some high performance algorithms can slower than simpler algorithms due to overhead. If you are sorting 100 items, you can probably insertion sort them just as fast or faster than you can heap sort them. The same would not be true with one million items; heap sort would be faster.

Once we consider all of the factors, you should find that no one algorithm is ideal in every case. There are some algorithms, however, that are not ideal in any case. Unfortunately one of these algorithms is among the most popular: bubble sort. Bubble sort is a very simple algorithm to implement and it has little overhead. The problem lies in its performance. You might think that this conflicts with my earlier point that even simple, low performance algorithms can be faster than others in the right situation. You also might think that bubble sort, being easy to implement, makes up for its performance short comings. This would be true, if it were not the case that there are algorithms that are equally simple to implement, require just as little overhead, and perform better in practice.

Insertion sort is one such algorithm. Like bubble sort it is an in place sort, and is just as easy to implement. Both algorithms have the same time complexity (O notation), but in practice insertion sort performs better in most cases. This being the case you may wonder why bubble sort is even around. Certainly if it is obsolete pages regarding its implementation should be torn out of books and mentioning it should be punishable by a swift slap with a keyboard. Maybe not. When I learned bubble sort it was as an example of how not to sort. In my non-expert opinion, it is equally important to understand how NOT to do things as it is important to understand how TO do them. My point? Learn bubble sort, but never use it.

CSS Drop Cap Effect

If you’ve ever read a magazine you’ve probably noticed that often the first character on a page stands out. Usually its larger, a different color, or stylized in some way. This effect is called an initial or a drop cap. Using CSS it is fairly simple to achieve this effect. CSS supports the pseudo element “first-letter” which allows you to modify the appearance on the first letter of a paragraph:

p:first-letter {
//style here
}

Notice that the code above will effect the first letter of every paragraph. For the purpose of creating a drop cap effect, we only want the first paragraph to be affected, so we must also use the first-child pseudo class:

p:first-child:first-letter {
//style here
}

Using this method you can now implement a drop cap effect. To do this you first need to decide between two common styles. In some cases the initial falls below the first line, in others the base of the initial is consistent with the base of the rest of the line. With respect to CSS, the difference between the two will be a float.

p:first-child:first-letter {
	/* The float causes the top of the intial to
	be consistent with the rest of the line, while
	the base is allowed to extend below. Removing
	this line will cause the base to line up with
	the rest of the line */
	float: left;
	//style here
}

You can add any of the following properties to your first-letter element:

  • font properties
  • color properties
  • background properties
  • word-spacing
  • letter-spacing
  • text-decoration
  • vertical-align
  • text-transform
  • line-height
  • clear

If you decide to use the float, I recommend using the after pseudo element to clear the float following the paragraph. Failing to do so may cause the drop cap the interfere with the following paragraph, if the first paragraph is short.

p:first-child:first-letter {
	float: left;
	//style here
}

p:first-child:after {
	content: "";
	display: block;
	height: 0;
	clear: both;
	visibility: hidden;
}

If you are using this in your WordPress theme, you should specify the class of the div your post is wrapped with. In my case it is the “post” class:

.post p:first-child:first-letter {
	float: left;
	//style here
}

.post p:first-child:after {
	content: "";
	display: block;
	height: 0;
	clear: both;
	visibility: hidden;
}

In addition to this you may also want to use the first-line pseudo element the modify the style of the first line, which is also common in publications. Here is the current drop cap code I’m using:

.post p:first-child:first-letter {
	float:left;
	background-color: #eeeeee;
	line-height:30px;
	padding: 5px;
	color: #237ab2;
	font-weight: bold;
	font-size:40px;
}

.post p:first-child:first-line{
	font-variant: small-caps;
}

.post p:first-child:after {
	content: "";
	display: block;
	height: 0;
	clear: both;
	visibility: hidden;
}

Know What to Expect from your Programming Language

I often see people asking how to do things with a given programming language that it was not intended to do. Recently I read a post from someone who wanted to know how to take a java program and compile it to a .exe. For anyone who is not aware, Java programs are not compiled in the same way a C++ program is compiled. The java source code is first compiled to bytecode. That bytecode is then interpreted by the java virtual machine. The writer was intending to get a performance boost by having the code compiled rather than interpreted.

While it is true that a compiled language can be faster than a interpreted language, it is not the case that every compiler can out perform every interpreter. This is especially true if you are compiling code that was intended to be interpreted. There are Java compilers out there, but the Java interpreter is much more mature than any of them. In addition to the non-existent performance increase, by compiling this code you eliminate one of Java’s key features: portability. If you bypass the JVM then you will have to worry about which systems your code will run on and which they will not.

Ultimately, if you need code that is very fast the answer is not to take code in one language and tweak it into something it was never meant to be. This echos another problem: language dependence. Too often I see people who learn everything there is to know about one programming language, and never bother to learn another. Being a programmer does not mean being a C++ programming, or a Java programmer, or a PHP programmer, it means understanding the concepts of programming and having that understanding transcend multiple languages. This will remain to be true until someone comes up with a catch-all language that is ideal in every case. Until then if you need really fast code think about C++, if you want safe portable code think about Java. You should also be aware that these trade-offs are not absolute. Not every Java program is slower than an equivalent C++ program, and a poorly written C++ program is certainly slower than a well written Java program.

The example I mentioned above is only one of many. I’ve seen people that want to write desktop applications in PHP, write .NET apps that work without the .NET framework, and use javascript in an offline application. Though someone might have hacked something together that facilitates this, be aware that in most cases these implementations are not ideal. If you need to do something that the language you are using was not intended to do then that is a sign you need to branch out and become a programmer.

PHP: Complex Variables in Strings

If you are at all familiar with PHP you are probably aware that you can put variables inside double quotes. For example:

$x = 5;
echo "x is equal to $x";

The above code will output “x is equal to 5″. This method works fine with simple variables, but will fail with references to member variables of objects or arrays:

//will not work
echo "x is equal to $myObject->x";

//also won't work
echo "x is equal to $myArra['x']";

To avoid this problem, PHP allows you to use curly braces to seperate variables that need to be parsed:

//will work
echo "x is equal to {$myObject->x}";

This is also useful in situations where you want to output a variable in the middle of a word:

//will not work
$birthday = 16;
echo "My birthday is on the $birthdayth";

//will work
echo "My birthday is on the {$birthday}th";

Note that { cannot be escaped in a string. If { is followed by $ PHP will assume that you want to parse a variable. To get the literal {$ you must escape it like this:

echo "Curly brace followed by dollar sign: {\$";

//will output
//Curly brace followed by dollar sign: {$