I’ve just added a section about session management to Creating a Secure Login System the Right Way.
Check it out
I’ve just added a section about session management to Creating a Secure Login System the Right Way.
Check it out
Anyone who has ever had an email account is probably very familiar with the “Do not reply” email. They can take the form of notifications from your bank, reminders to pay a bill, a newsletter, or just plain spam. Generally the read a little something like:
Please do not reply to this email. This mailbox is not monitored and you will not
receive a response.
Usually this is followed by some instructions to follow if you have a question or concern. I imagine at some point in the history of automated emailers someone decided that they did not want to receive any reply from their users via email, or at least from email directed to the address in question, and everyone blindly proceeded to follow this example. That is what contact forms and support email addresses are for correct? To a certain degree this is true. Having to monitor another email account (or more) for support questions would only exacerbate things in the nightmare that is customer support. So the solution is to toss out any incoming email to these accounts and direct users to ask their questions via support form or an alternate email address.
In my mind this is a cop-out. Consider what you lose by doing this. In most cases if I suddenly have a question after receiving an email from my bank, this question is with regard to the message I just received. If I were able to just hit the reply button the original message would be right there on the bottom of my message. If I am to use some other means to ask my question I might forget to include it or decided that it is of no importance.
In addition to this there is the hassle of the alternative process. I have to go to my bank’s website, find the support form, fill out all the necessary information which in my case means first and last name, date of birth, account number, social, and email address (twice). Believe it or not this is the case even if I’m logged into my account. This is a minor inconvenience, but the process introduces the chance for user error. If I put the wrong account number or some other typo occurs my question might never get answered or I might be asked to do it all over again. If I were able to just hit reply all of this information could be derived (automatically) from my email address.
The problems don’t stop there. Often it is the case that support forms ask you to choose the department or type that your question falls under (i.e. is it a billing question, a technical question, etc.). Sometimes the lines between question categories can easily be blurred and submitting a question to the wrong department could potentially increase the response time. It takes x time for person A to get to your question, if A is not the right person to answer the question then it is redirected to person B, who takes y time to get to the message. Your wait time is now x + y. If you were able to just reply to the email the type of question would be determined automatically and directed to person B. Your wait time is now just y.
Obviously the ubiquitous “do not reply” emails are not the result of someone failing to realize that it may be convenient for the user. The problem is that it complicates things for the support team. Or it least it appears this way at first glance.
Wouldn’t this mean monitor additional email accounts? There is no reason why any person working in support should have to monitor an email account; there are better ways. A properly implemented support system would make the source of the message, whether it be an email or a form, transparent.
It is important the users be directed to the website so that they can find answers in our FAQ or knowledge base before inundating the support system. The solution here is to replace your “do not reply” messages with links to places where the user can find answers. Additionally this gives you the opportunity to serve specialized questions and answers based on the content of the message. Ultimately, users asking obvious questions or questions that have been answered many times in the past is unavoidable.
What about emails containing sensitive information? A user may inadvertently expose their private information to the support team by hitting reply. The solution here is to be careful about what information you put in an email. I am not suggesting that Do not reply emails should be done away with entirely. If it is absolutely necessary to put sensitive information into an email (password resetting for example) then it is appropriate to use a Do not reply address.
I’d like to conclude this entry by imagining a support system with Do not reply (almost) eliminated. Imagine the support system for a web hosting company. There are three support departments: Billing, accounts, and technical, each of which has an email address. There is also a Do not reply address used exclusively for password resetting. There is a support form which asks the user to select the appropriate department. Finally, there is a knowledge base containing common and previously answered questions. The knowledge base also has a feature that allows it, given a question or some other block of text, to attempt to find similar questions that have already been answered.
When a member of the technical support team accesses the support system, they see only questions for their department, both from the support form and from the technical department email account. The same is true for the other departments.
If one of the departments sends out a notification, whether it be automatic (bill is due, a change to your account, etc.) or manual (planned downtime, a new feature, or some other announcement), before the message is sent, its content is passed to the knowledge base which determines common questions and answers relating to the content of the message. It selects a few and appends them to the message. The message is then sent. If the user has a question regarding the message they have the option of hitting reply and having their message (original message and all) sent to the correct department.
If you use Opera you’re probably aware that it supports shortcuts in the address bar that allow you to run a search on various search engines and websites. For example, if I type g tinsology in the address bar, you’ll get the Google search results for the keyword tinsology. You can do similar things with yahoo, amazon, ask and other sites that come preconfigured in Opera.
Personally, I find myself frequently using this shortcut to Google PHP documentation. For example if I’m looking up documentation for the implode function, I’ll type g PHP implode. More often than not the first result is what I’m looking for and it is just a matter of waiting for the search results to load, clicking the first result, and waiting for the php.net page to load.
Ideally, however, I would want to be able to go directly from typing my search in the address bar to the php.net results page. It just so happens that Opera allows you to do this by adding a custom search engine. What we want to do is to be able to type p [my search] in the address bar. To begin we need to open the Search Preferences pane:
Tools -> Preferences... or press Ctrl+F12Search tab.Add... button.In the add window there are three fields we are interested in: Name, Keyword, and Address (if you don’t see the address field click the details button). Name is just the name of this search shortcut; I named it PHP, but it doesn’t really matter what you name it. Keyword is the keyword you type in the address bar before your search query. For a Google search it is g. I chose p, but once again you can choose anything you’d like. Also, the keyword doesn’t have to be a single letter, for instance you could use php. The address field tells Opera what to do with your search query. Without explaining too much I’ll just say that the value we want to use is: http://us2.php.net/manual-lookup.php?pattern=%s. The %s token will be replaced by our search query. For instance typing p implode will cause Opera to open http://us2.php.net/manual-lookup.php?pattern=implode
That’s it; leave all of the remaining field blank. You can now use Opera’s address bar to instantly search the PHP documentation. You can use similar methods for running searches on other sites, the hardest part is finding the correct search URL (its even harder if the search query cannot be URL encoded, that’s when the Use Post option comes in handy).
I suddenly recall something interesting a professor of mine pointed out a couple years ago while on a tangent during lecture. It has to do with the nature of infinity and how accepting something perfectly reasonable as true leads to less intuitive, but equally true conclusions.
The following expression is true and most people would not argue otherwise:
1/3 = .3333333333 . . .
Assume, of course, that there is an infinite number of 3s trailing the decimal point. The following expression is also true, and even fewer people would argue otherwise:
1/3 + 1/3 + 1/3 = 1
This may seem obvious, but what may be less obvious is what follows logically from the two expressions above:
.333333 . . . .333333 . . . + .333333 . . . _______________________ .999999 . . . = 1
You may be reluctant to accept that the third expression is true, but if you accept the first two expressions, there is no avoiding it. Most people with a background in computer science or mathematics probably won’t be blown away by this, but it is fun, nerdy thing to point out to your friends in the humanities department.
If you’ve ever used PHP’s library functions you’ve most likely noticed that several function such as array() can take an indeterminate number of arguments. Normally when defining a function you specify each argument in the function declaration. Obviously it would be impossible to define an infinite number of arguments in such a way. PHP does, however, allow you to accomplish this through the function func_get_args().
func_get_args() returns an array consisting of all of the arguments passed to a function. Using this method you can bypass the conventional method of defining parameters in the function definition all-together. Here is an example:
function add()
{
$total = 0;
$args = func_get_args();
foreach($args as $arg)
{
if(is_numeric($arg))
$total += $arg;
}
return $total;
}
echo add(1, 2, 3); //will return 6
If for whatever reason you need to know the total number of arguments passed to a function, PHP provides the func_num_args() function.
When retrieving arguments in this manner it is important to remember that func_get_args only returns an array of arguments passed by the user. It does not account for default values.
For the most up-to-date information regarding this script go to the PHP Lorem Ipsum page in the scripts section.
The other day I needed to populate a database with some placeholder content. Doing this manually was out of the question so I decided I’d find a text generator, specifically a Lorem Ipsum generator. For anyone unaware, Lorem Ipsum is non-sense, placeholder text used in publishing and design. It allows the developer to see their work completely populated with text, without having to actually create the text. Obviously, for this purpose, any kind of text generator would work to some extent, but traditionally Lorem Ipsum is used.
To get to the point, I successfully located several web-based generators, but no stand-alone PHP class or function. To be honest, I didn’t look too hard and someone a little more determined not to write any code most likely would have found it, but I decided to create my own PHP Lorem Ipsum generator. Here is a rundown of some of the features in the current version:
PHP Lorem Ipsum Generator
Version: 1.0
License: BSD
Download:
Link moved Here.
Features
Feel free to request additional features.
Usage
The only public method in the class is getContent.
Description
string getContent( int $wordCount [, string $format = html] [, boolean $loremipsum = true] )
Returns the desired amount of content as a string.
Parameters
wordCount
The number of words to be returned.
format
The output mode, one of ‘html’, ‘txt’, or ‘plain’. HTML by default.
loremipsum
Whether or not the content should begin with ‘Lorem ipsum’. True by default.
Example
require('LoremIpsum.class.php');
$generator = new LoremIpsumGenerator;
//100 words in html format
$generator->getContent(100);
//100 words without any formatting
$generator->getContent(100, 'plain');
//100 words with 'text' formatting
$generator->getContent(100, 'txt');
//100 words with html format, not beginning with lorem ipsum
$generator->getContent(100, NULL, false);
//or
$generator->getContent(100, 'html', false);
Additional Notes
Both the HTML and Text output modes use paragraph formatting. The mean word count of each paragraph is predetermined and can be set in the constructor, currently the default is 100. Note that this is the mean word count, the actual word count for each paragraph will vary in the same way the length of each sentence will vary.
Changelog
Version 1.1 (Planned)
Version 1.0:
Even if you are new to programming, you probably have an understanding of functions and their purpose. What may be less clear, however, is what is happening underneath the hood when you pass a value to a function. In some languages, such as Java, when you pass a value (such as an integer or a char, this only applies to primitive types) the function receives a copy of that value. In this instance you are guaranteed that the original version of the value will remain unchanged after the function call. In other languages, such as C/C++ and PHP, a copy of the value need not be made; you have the option of passing the value by reference. In this case any changes made to the value within the function will persist beyond the function call. Here is an example, using PHP:
//x is passed by value
function foo($x)
{
$x++;
}
//x is passed by reference
function bar(& $x)
{
$x++;
}
$a = 1;
foo($a); //a is still 1
bar($a); //a is 2
In PHP the & symbol denotes that the variable is a reference. In this case the syntax is fairly simple, as references can be handled the same as values. In a type safe language like C++ however, a reference cannot be handled in such a way; the corresponding value must be accessed manually. This being the case the C++ reference syntax is a bit more complicated (and often messy).
Understanding the difference between passing by value and passing by reference is only half the battle. Understanding when to use it is equally important. There are two cases (that I will mention) that passing by reference comes in handy. The first may seem obvious: when you want the value to be changed by the function call. Generally, if you expect the value to remain unchanged after the function call, or if you no longer have use for the variable, then you should pass it by value. Passing by reference is not a substitute for return values. In the previous example, in practice, the function bar should be written like this:
function bar($x)
{
$x++;
return $x;
}
$a = 1;
$a = bar($a); //a is 2
Passing by reference is more appropriate in cases where the return value of a function cannot be the altered value that was passed:
function bar(& $x)
{
$x++;
if($x > 5)
return true;
else
return false;
}
$a = 1;
$greaterThanFive = bar($a);
//a is 2
The second case where passing by reference is useful is when passing by value will have a negative impact on your application’s performance. In a typical case such as passing an integer to a function, the performance impact of making a copy of that value is negligible. What if, however, you need to pass an array of thousands of items to a function? If you needed to do this several times there would be a noticeable difference with regard to performance between passing by value and passing by reference. A reference to an object is always the same size (a 32-bit memory address for example). An object, however, may be very large, and making copies of it needlessly may not be the best idea.
Throughout the life of a database there may come times when it needs to be updated to incorporate changes or new features. This may involve adding new attributes to existing entities; adding new columns to tables. The problem with this is that in a populated database, modifying the database schema can be very expensive with regard to performance. This is not something you want to do frequently on a live site. One method which not only makes your database more resilient to future change, but also improves modularity is the use of metadata.
You don’t have to look very hard to find multiple definitions of the term metadata, and as far as I am aware there is no universally accepted definition. The literal definition is “data about data” which is a bit vague. For our purposes, in the context of a database, we’ll think of metadata as data that supplements or further describes the data of another table. For example, if we have a table ‘users’ which has columns id, username, and password, metadata for that table might include first name and last name, though it would more likely contain data that does not pertain to every user all of the time.
Lets say that your site has a registration script that sends out an email to confirm that a user is human or that the email address is valid. Typically this involves sending a link with some sort of verification key encoded in the URL to the user. In order to verify the user you will need to store this key somehow. Having a verification key field in the users table is one option, but considering that the key can be discarded and never looked at again after the user is verified, this might not be the best solution. A better approach may be to store this in the metadata table for that user.
So how is our metadata structured and how does it interact with the data it describes? A metadata table has three essential fields: an id that refers to the data it is supplementing, a meta key, and a meta value. It may also incorporate its own meta id as a primary key, though it is uncommon to look up this data by its id. We’ll assume our metadata table corresponds to the ‘users’ table I mentioned earlier and as the following structure:
CREATE TABLE user_meta ( meta_id INT NOT NULL AUTO_INCREMENT, user_id INT NOT NULL, meta_key TEXT, meta_value TEXT, PRIMARY KEY(meta_id) );
Earlier as an example I mentioned verification keys as an example of metadata. If we chose to store this information in the user_meta table we would insert it like so:
INSERT INTO user_meta (user_id , meta_key , meta_value) VALUES(1 , verification_key , abc123);
The code assumes that the user has the id 1 and that user’s verification key is abc123. In a real world scenario the user id would be determined and the verification key generated prior to executing this query. As I pointed out earlier, it is rare to look up meta data based on the meta id. More often a lookup will look something like this:
SELECT meta_value FROM user_meta WHERE user_id = '1' AND meta_key = 'verification_key';
The purpose of having a separate primary key is to allow for the same user to have multiple values for the same key, which is appropriate in some instances; having the user id and meta_key as the primary key would prevent this.
You can probably see how this allows for you to make changes to your database without making expensive alterations to existing tables. If you have a new value that you want to be associated with users all you need to do is add that value for each user to the metadata. How though do we determine from the point of inception which data belongs in the users table and which data can be tossed into the metadata. Before we can answer this it is important to understand the trade offs between adding a value as metadata and adding a dedicated column to another table. Assume from this point that the tables are not yet populated.
You can expect over the lifetime of your database that each user will have multiple rows associated with him/her in the user_meta table. This being the case, the user_meta table will have many more rows than the users table. In addition to this, a lookup in the metadata table is typically done using a user id and a meta key, neither of which is a primary key. In the users table, however, we will be looking up data based on the user id frequently. A lookup based on a primary key is significantly faster than a lookup on non-indexed columns. Creating an index on the user_id and meta_key columns in the user_meta table would help to bridge the speed gap, but it would come at the expense of memory.
When you are designing your database you should consider how frequently you will need to access values. A username for instance might be looked up frequently, while a verification key may only be looked up once. Your goal should be to minimize the number of columns in your users table while at the same time reducing the number of items in the metadata that will be accessed frequently. You should also consider how sparsely populated a column will be. Every user, for example has a password. Going back the verification key example once again, only users who’s status is still pending will have a verification key. After they are verified the key can be discarded.
The use of metadata can improve the modularity and performance of your database, but only if used correctly.
Making a custom login system is a common task for beginning PHP developers. Jumping right into it, however, may not be the best approach. There are several important aspects to building a login system that not only makes it work, but makes it safe.
Updated on December 15th 2009: Added Session Control Section
Updated on December 15th 2010: Switched to sha256
Getting Started
To begin with, we’ll create our login form. This doesn’t need to be anything fancy, just a couple of input fields and a submit button:
<form name="login" action="login.php" method="post"> Username: <input type="text" name="username" /> Password: <input type="password" name="password" /> <input type="submit" value="Login" /> </form>
The above example is stripped down; there is no formatting or styles so it will most likely won’t look to great if you copy and paste the code. Making your form pretty is beyond the scope of this article. In the form tag, notice that it has three attributes: name, action, and method. Name identifies the form and is not very important in the context of this article. Action identifies the script that will be processing the login, often times your form and the processing code are in a single file, but this does not have to be the case. Method typically takes one of two values: post or get. If you submit a form using GET the data is URL encoded and will be visible in the address bar. If the method is post the data will not be URL encoded. As you may have guessed there is actually a lot more to it than that but the difference between post and get is a topic for another article. We want to use post.
Notice that our input fields are given name attributes, this is important as we will need to identify and access these values by this name. PHP identifies form data using the name attribute, not the id. Now that our form is complete we can move on to processing the data.
Storing our Data
Actually we can’t process the data just yet. Before that we need to worry about how our data will be represented in the database. There are three essential values we must store in our database: the username, password, and a salt. We will get into what a salt is later. In addition to this you may choose to store other things about your user. It is also common to give the user a numeric user id. This is not absolutely necessary but it is common for tables to have a numeric, sequential primary key. For our purposes assume our table structure is this:
CREATE TABLE users ( id INT NOT NULL AUTO_INCREMENT, username VARCHAR(30) NOT NULL UNIQUE, password VARCHAR(64) NOT NULL, salt VARCHAR(3) NOT NULL, PRIMARY KEY(id) );
In the above SQL code we create a table called users having columns id, username, password, and salt. Even though we could use usernames to uniquely identify users we will use the id for this instead. One reason for this is that integer comparisons are cheaper than string comparisons, so searching through a large number of users will require fewer resources. Convention is another reason.
There are a few keywords in the above code that I’ll define for you. Not null means that each tuple (a tuple is a row in our table, each user will have a row) must have a value for this column. In this case all of our columns are ‘not null’ so every user must have each of these values. Auto increment applies to numeric primary keys. It allows us to give each user a sequential id without worrying about collisions or what the most recent users id was; the database will assign each user a correct id automatically. Unique, as you may guess, means that the value must be unique. In our case, no two users may have the same username. Finally, primary key tells the database which field will uniquely identify each row. No two users can have the same id. It also creates and index on the specified column, meaning that lookups will be faster (at the expense of memory).
Each column also has a data type. The id is and integer while the remain fields are varchars. Varchars are just arrays of characters, as are strings. Varchar(30) means a sequence of 30 characters. For the username I chose 30 as the length for no particular reason; it allows for a reasonable amount of characters without letting users write a paragraph. The lengths for password and salt are important and I’ll get into that later. Now that we have created our table we are ready to process our data.
Populating our Table
Actually we can’t process the data just yet. You can’t login if your users table is empty. Just like on any site, you have to register before you can login. To accomplish this we will create a simple registration form.
<form name="register" action="register.php" method="post"> Username: <input type="text" name="username" maxlength="30" /> Password: <input type="password" name="pass1" /> Password Again: <input type="password" name="pass2" /> <input type="submit" value="Register" /> </form>
The above code is similar to our login form code with one notable difference. In the username input field I specify the maxlength attribute as 30. This means the field can only contain 30 characters and corresponds to the username length we specified in our SQL code. Notice I don’t enforce the length of passwords even though they are defined as 40 characters in the SQL code, you will see why it is not necessary to do so later. Now that we have our registration form we can process our data.
Sign Me Up
Our registration data that is, we still can’t process our login data. At this point we actually get to write some PHP code (are you as excited as I am?).
register.php (part 1):
<?php
//retrieve our data from POST
$username = $_POST['username'];
$pass1 = $_POST['pass1'];
$pass2 = $_POST['pass2'];
if($pass1 != $pass2)
header('Location: register_form.php');
if(strlen($username) > 30)
header('Location: register_form.php');
In the above code we retrieve our data from $_POST, which is an associative array where all of the post data is stored. We also check if pass1 and pass2 are equal which is an example of validating user input. If they are not equal we use the header function to redirect back to our registration form (assume the registration form is located in a file called register_form.php). Ideally we would want to display an error message, but for the sake of example we will keep it simple. We also check if the username exceeds 30 characters. Even though we set the form to allow no more than 30 characters it is important that we check as well; it is possible (and very simple) to bypass the limit imposed in the html code. In addition to this you should check for any other constrains you have placed on your data (valid characters, minimum length, etc).
Hashing
As I mentioned earlier, the length of the password varchar in the database is significant. This is because we are not actually going to (and never should) store the password in the database. We are going to store an sha256 hash which is a string always containing 64 characters. In simple terms a hash is an algorithm that maps inputs to outputs in a deterministic way. Meaning that given an input the algorithm will always produce the same output. Sha256 is an algorithm that outputs a 64 digit hexadecimal value. As in most cases, given the output of the sha256 algorithm it is not easy to determine the input. In PHP we can get the sha256 hash of our password like this:
regsiter.php (part 2)
$hash = hash('sha256', $pass1);
Pass the Salt
As I mentioned before, there is no simple way to determine the input of the sha256 algorithm from the output. It is possible however through brute force, provided the input string (ie the password) is short enough. One way to improve the security of your users’ passwords is to use a salt. A salt is just a random string of characters that is appended to the hash, which is then hashed again.
regsiter.php (part 3)
//creates a 3 character sequence
function createSalt()
{
$string = md5(uniqid(rand(), true));
return substr($string, 0, 3);
}
$salt = createSalt();
$hash = hash('sha256', $salt . $hash);
Some people will tell you that a salt is not necessary, but it certainly doesn’t hurt to use one.
Now comes the database portion of our code, for this I will assume we are using a mysql database. I will also assume the database host, name, user, and password. I won’t anticipate any database connection errors in the code, but you should.
register.php (part 4):
$dbhost = 'localhost';
$dbname = 'tinsology';
$dbuser = 'tinsley';
$dbpass = 'myrealpassword'; //not really
$conn = mysql_connect($dbhost, $dbuser, $dbpass);
mysql_select_db($dbname, $conn);
//sanitize username
$username = mysql_real_escape_string($username);
$query = "INSERT INTO users ( username, password, salt )
VALUES ( '$username' , '$hash' , '$salt' );";
mysql_query($query);
mysql_close();
header('Location: login_form.php');
In the above code we establish a connection to our MySQL database and add our new user to the users table. Then we redirect to our login form. Notice that we call the funciton mysql_real_escape string. This function helps to prevent SQL injections by escaping the input. Using an abstraction layer like PDO will also help to prevent this. Now that we’ve processed our registration data we can write the code to process our login data.
Logging in
Seriously this time. Our login processor will pull the login data from post and compare it to the database values.
$username = $_POST['username'];
$password = $_POST['password'];
//connect to the database here
$username = mysql_real_escape_string($username);
$query = "SELECT password, salt
FROM users
WHERE username = '$username';";
$result = mysql_query($query);
if(mysql_num_rows($result) < 1) //no such user exists
{
header('Location: login_form.php');
}
$userData = mysql_fetch_array($result, MYSQL_ASSOC);
$hash = hash('sha256', $userData['salt'] . hash('sha256', $password) );
if($hash != $userData['password']) //incorrect password
{
header('Location: login_form.php');
}
//login successful
As I mentioned before, this is a stripped down example. Ideally you would not have the code you use to connect to your database in each file, but rather in a single file or function that you could include. In addition to this you should maintain a session throughout the process in order to store and report error message, as well as other useful data. The most important thing to remember when creating a login system is that you should never trust your users. Validate all user input, protect against SQL injections, and never store raw passwords in the database.
P.S. Session Control
Added December 15th 2009
All of the above code illustrates how to register a user, and allow them to login. There is, however, one fundamental piece that is missing: session control. In order for a login system to be useful, it must provide some means to distinguish between a logged in user and a non-logged in user across the entire site. Sessions are the means by which we do this. What we want to do is, after a user has successfully logged in is indicate, using a session variable, that the user has done so.
Accessing Session Data
Activating and managing sessions in PHP is very straightforward, you only need one function to both create and recall a particular session: session_start(). Here is a generic example of how to use session_start to store session data:
page1.php
session_start(); $_SESSION['foo'] = 'bar';
page2.php
session_start(); echo $_SESSION['foo']; //will output bar
In the above example you can see that in page1 we start a session and assign the session variable “foo” the value “bar”. If the user visits page2 sometime there after the session will be recalled and the value of foo (bar) will be echo’d. Notice that session_start creates a new session if one does not exist or recalls that session if it already exists.
There are a few subtleties relating to session_start that are important to remember. You must call session_start before any headers are sent to the browser. This means that your script cannot have any output or calls to header() (or any other function that sends headers) before calling session_start. The simplest way to avoid this problem is to call session_start before anything else. One common case where this problem can occur is when some code that normally wouldn’t have any output generates an error or warning prior to calling session_start. The default error handler will automatically output any error messages to the browser.
Generally (though not necessarily), a session lives as long as the browser remains open.
Using Sessions in Our Login System
There are three basic functions we want to incorporate into our login system: validating a user (i.e. indicating that user has logged on), checking if a user is logged on, and logging a user out.
Validating a User
function validateUser()
{
session_regenerate_id (); //this is a security measure
$_SESSION['valid'] = 1;
$_SESSION['userid'] = $userid;
}
This function simply sets the session variable ‘valid’ to 1. You may also want to use this function to store certain variables. It is a good idea to store frequently accessed data about a particular user (such as a user id, username, NOT a password or sensitive data). The $_SESSION['userid'] = $userid; line is an example of how to store user info. Additional information about session security will be provided in the following section.
Checking if a User is Logged On
function isLoggedIn()
{
if(isset($_SESSION['valid']) && $_SESSION['valid'])
return true;
return false;
}
This function simply checks if the session variable ‘valid’ is set to 1.
Logging Out
function logout()
{
$_SESSION = array(); //destroy all of the session variables
session_destroy();
}
When it is time to log a user out, we destroy the session. All of these functions assume session_start has already been called.
Now it is time to incorporate this into our login script:
session_start(); //must call session_start before using any $_SESSION variables
$username = $_POST['username'];
$password = $_POST['password'];
//connect to the database here
$username = mysql_real_escape_string($username);
$query = "SELECT password, salt
FROM users
WHERE username = '$username';";
$result = mysql_query($query);
if(mysql_num_rows($result) < 1) //no such user exists
{
header('Location: login_form.php');
die();
}
$userData = mysql_fetch_array($result, MYSQL_ASSOC);
$hash = hash('sha256', $userData['salt'] . hash('sha256', $password) );
if($hash != $userData['password']) //incorrect password
{
header('Location: login_form.php');
die();
}
else
{
validateUser(); //sets the session data for this user
}
//redirect to another page or display "login success" message
Now we can use the isLoggedIn function to determine if a user is logged in and act accordingly:
membersonly.php
session_start();
//if the user has not logged in
if(!isLoggedIn())
{
header('Location: login.php');
die();
}
//page content follows
A Note About Session Security
By default, sessions are cookie based. This means that a particular session is associated with a user by use of a cookie. This being the case, there are certain vulnerabilities that arise. There are a variety of methods by which a session can be hijacked (XSS for example; a javascript injection can cause a user to give up their session id). Unfortunately you cannot expect to eliminate the possibility that a user has hijacked someone else’s session. Further research into the subject may yield a few tricks, but ultimately the best practice is to be cautious about certain tasks. If a user wants to change their password (after logging in), require the old password and whenever possible avoid displaying sensitive data.
Earlier I mentioned that you shouldn’t use session variables to store sensitive data. It is important to remember that session data lives on the server, this means that a user cannot directly view or modify session data. On a shared server, however, other users of that server may be able to access this information.
A Note About Hashing Algorithms
Added December 15th 2010
Lately I’ve been seeing a lot of articles talking about password cracking. I wrote about one here. Recently I responded to a question regarding a similar article (link). It is important to know that when someone says that they can crack as hashed password (which is to say they can determine the input of the hash function given only the output) they might mean one of a few things. They might mean that they can perform a preimage attack, which means that given any arbitrary hash they can find a corresponding input string in a certain amount of time. They might also mean that they can perform a brute force attack, which means that given a hash they can find a corresponding input string by generating a hash for every possible string up to a certain length.
The key difference here is that one attack depends on a vulnerability in the hash algorithm while the other depends on a vulnerability in the password itself. My advice is this:
Something Important to Remember
Just a reminder, all of the code examples I provided are exactly that: examples. They are meant to serve as a starting point, and hopefully shed some light on a few key concepts, but they are not a real world implementation. I don’t recommend, for example, using header() to bounce your users around to different pages; there are better methods that reduce the number of page loads and give the user a more fluid experience. I use this method here because it simplifies things and makes the examples easier to follow.
Occasionally I’ll see someone make a point of distinguishing coding in a particular language as scripting as opposed to programming. Often times the distinction is arbitrary. I’ve seen justifications for this distinction ranging from scripting languages not being as strict as programming languages, to scripting languages not being turing complete. Web languages in particular (HTML, javascript, PHP, etc) seem to have the stigma of being scripting languages. To this day, however, I have not seen a non-trivial definition of the difference (or perhaps just one that satisfies me).
This does not mean that I don’t think there is a difference; in my own mind I tend to draw a distinction. I do this, however, based on the practice of coding itself rather than the language. As an aside, before I go into detail, I’d like to mention that I’m not trying to pass my opinion off as a definition or absolute truth, just as my opinion. When I think of script, I think of HTML in particular. This isn’t because it isn’t Turing complete, or because it isn’t compiled, or because it is a “web” language. I make this distinction based on the tolerance for error in the practice of coding HTML. Go to any site and validate its source. Chances are you come up with multiple errors. This would not be tolerated in a language like C++. I’m not trying to say that HTML is inferior, when in fact it really wouldn’t be fair to compare it to the “traditional” programming languages.
Some languages are often referred to as scripting languages, but I feel are more like programming languages. PHP (and its web programming counter parts) for example, is often considered a scripting language. If you take into consideration my earlier assessment of HTML however, you will see why I do not consider it as such. If I forget to close a tag or if I use improper syntax in an HTML script there is a good chance that it will display just fine in a browser. In PHP on the other hand, if I forget a closing bracket or use incorrect syntax, my script will fail. Even if it is able to run, I will see unexpected behavior, and there is no attempt (and should not be) by the interpreter to correct these mistakes.
I don’t think it is the case that you can divide all of the languages in use today into scripting languages or programming languages. There is some gray area, and many languages have elements that are script like even though it would be difficult to consider them entirely a scripting language. Javascript and Xquery, in my mind are examples of this. As I mentioned earlier these are just my perceptions. I think that the difference between scripting and programming is completely arbitrary. I think one of the main reasons that such a distinction exists is simply so programmers can point out that writing HTML or PHP isn’t “real programming”. The difference really isn’t that important, which is why I don’t think it is necessary to create a formal definition.