Archive for the ‘Databases’ Category.

Metadata

Throughout the life of a database there may come times when it needs to be updated to incorporate changes or new features. This may involve adding new attributes to existing entities; adding new columns to tables. The problem with this is that in a populated database, modifying the database schema can be very expensive with regard to performance. This is not something you want to do frequently on a live site. One method which not only makes your database more resilient to future change, but also improves modularity is the use of metadata.

You don’t have to look very hard to find multiple definitions of the term metadata, and as far as I am aware there is no universally accepted definition. The literal definition is “data about data” which is a bit vague. For our purposes, in the context of a database, we’ll think of metadata as data that supplements or further describes the data of another table. For example, if we have a table ‘users’ which has columns id, username, and password, metadata for that table might include first name and last name, though it would more likely contain data that does not pertain to every user all of the time.

Lets say that your site has a registration script that sends out an email to confirm that a user is human or that the email address is valid. Typically this involves sending a link with some sort of verification key encoded in the URL to the user. In order to verify the user you will need to store this key somehow. Having a verification key field in the users table is one option, but considering that the key can be discarded and never looked at again after the user is verified, this might not be the best solution. A better approach may be to store this in the metadata table for that user.

So how is our metadata structured and how does it interact with the data it describes? A metadata table has three essential fields: an id that refers to the data it is supplementing, a meta key, and a meta value. It may also incorporate its own meta id as a primary key, though it is uncommon to look up this data by its id. We’ll assume our metadata table corresponds to the ‘users’ table I mentioned earlier and as the following structure:

CREATE TABLE user_meta (
	meta_id INT NOT NULL AUTO_INCREMENT,
	user_id INT NOT NULL,
	meta_key TEXT,
	meta_value TEXT,
	PRIMARY KEY(meta_id)
);

Earlier as an example I mentioned verification keys as an example of metadata. If we chose to store this information in the user_meta table we would insert it like so:

INSERT INTO user_meta (user_id , meta_key , meta_value)
	VALUES(1 , verification_key , abc123);

The code assumes that the user has the id 1 and that user’s verification key is abc123. In a real world scenario the user id would be determined and the verification key generated prior to executing this query. As I pointed out earlier, it is rare to look up meta data based on the meta id. More often a lookup will look something like this:

SELECT meta_value FROM user_meta
	WHERE user_id = '1'
	AND meta_key = 'verification_key';

The purpose of having a separate primary key is to allow for the same user to have multiple values for the same key, which is appropriate in some instances; having the user id and meta_key as the primary key would prevent this.

You can probably see how this allows for you to make changes to your database without making expensive alterations to existing tables. If you have a new value that you want to be associated with users all you need to do is add that value for each user to the metadata. How though do we determine from the point of inception which data belongs in the users table and which data can be tossed into the metadata. Before we can answer this it is important to understand the trade offs between adding a value as metadata and adding a dedicated column to another table. Assume from this point that the tables are not yet populated.

You can expect over the lifetime of your database that each user will have multiple rows associated with him/her in the user_meta table. This being the case, the user_meta table will have many more rows than the users table. In addition to this, a lookup in the metadata table is typically done using a user id and a meta key, neither of which is a primary key. In the users table, however, we will be looking up data based on the user id frequently. A lookup based on a primary key is significantly faster than a lookup on non-indexed columns. Creating an index on the user_id and meta_key columns in the user_meta table would help to bridge the speed gap, but it would come at the expense of memory.

When you are designing your database you should consider how frequently you will need to access values. A username for instance might be looked up frequently, while a verification key may only be looked up once. Your goal should be to minimize the number of columns in your users table while at the same time reducing the number of items in the metadata that will be accessed frequently. You should also consider how sparsely populated a column will be. Every user, for example has a password. Going back the verification key example once again, only users who’s status is still pending will have a verification key. After they are verified the key can be discarded.

The use of metadata can improve the modularity and performance of your database, but only if used correctly.

Creating a Secure Login System the Right Way

Making a custom login system is a common task for beginning PHP developers. Jumping right into it, however, may not be the best approach. There are several important aspects do building a login system that not only makes it work, but makes it safe.
Updated on December 15th 2009: Added Session Control Section

Getting Started

To begin with, we’ll create our login form. This doesn’t need to be anything fancy, just a couple of input fields and a submit button:

<form name="login" action="login.php" method="post">
	Username: <input type="text" name="username" />
	Password: <input type="password" name="password" />
	<input type="submit" value="Login" />
</form>

The above example is stripped down; there is no formatting or styles so it will most likely won’t look to great if you copy and paste the code. Making your form pretty is beyond the scope of this article. In the form tag, notice that it has three attributes: name, action, and method. Name identifies the form and is not very important in the context of this article. Action identifies the script that will be processing the login, often times your form and the processing code are in a single file, but this does not have to be the case. Method typically takes one of two values: post or get. If you submit a form using GET the data is URL encoded and will be visible in the address bar. If the method is post the data will not be URL encoded. As you may have guessed there is actually a lot more to it than that but the difference between post and get is a topic for another article. We want to use post.

Notice that our input fields are given name attributes, this is important as we will need to identify and access these values by this name. PHP identifies form data using the name attribute, not the id. Now that our form is complete we can move on to processing the data.

Storing our Data

Actually we can’t process the data just yet. Before that we need to worry about how our data will be represented in the database. There are three essential values we must store in our database: the username, password, and a salt. We will get into what a salt is later. In addition to this you may choose to store other things about your user. It is also common to give the user a numeric user id. This is not absolutely necessary but it is common for tables to have a numeric, sequential primary key. For our purposes assume our table structure is this:

CREATE TABLE users (
	id INT NOT NULL AUTO_INCREMENT,
	username VARCHAR(30) NOT NULL UNIQUE,
	password VARCHAR(40) NOT NULL,
	salt VARCHAR(3) NOT NULL,
	PRIMARY KEY(id)
);

In the above SQL code we create a table called users having columns id, username, password, and salt. Even though we could use usernames to uniquely identify users we will use the id for this instead. One reason for this is that integer comparisons are cheaper than string comparisons, so searching through a large number of users will require fewer resources. Convention is another reason.

There are a few keywords in the above code that I’ll define for you. Not null means that each tuple (a tuple is a row in our table, each user will have a row) must have a value for this column. In this case all of our columns are ‘not null’ so every user must have each of these values. Auto increment applies to numeric primary keys. It allows us to give each user a sequential id without worrying about collisions or what the most recent users id was; the database will assign each user a correct id automatically. Unique, as you may guess, means that the value must be unique. In our case, no two users may have the same username. Finally, primary key tells the database which field will uniquely identify each row. No two users can have the same id. It also creates and index on the specified column, meaning that lookups will be faster (at the expense of memory).

Each column also has a data type. The id is and integer while the remain fields are varchars. Varchars are just arrays of characters, as are strings. Varchar(30) means a sequence of 30 characters. For the username I chose 30 as the length for no particular reason; it allows for a reasonable amount of characters without letting users write a paragraph. The lengths for password and salt are important and I’ll get into that later. Now that we have created our table we are ready to process our data.

Populating our Table

Actually we can’t process the data just yet. You can’t login if your users table is empty. Just like on any site, you have to register before you can login. To accomplish this we will create a simple registration form.

<form name="register" action="register.php" method="post">
	Username: <input type="text" name="username" maxlength="30" />
	Password: <input type="password" name="pass1" />
	Password Again: <input type="password" name="pass2" />
	<input type="submit" value="Register" />
</form>

The above code is similar to our login form code with one notable difference. In the username input field I specify the maxlength attribute as 30. This means the field can only contain 30 characters and corresponds to the username length we specified in our SQL code. Notice I don’t enforce the length of passwords even though they are defined as 40 characters in the SQL code, you will see why it is not necessary to do so later. Now that we have our registration form we can process our data.

Sign Me Up

Our registration data that is, we still can’t process our login data. At this point we actually get to write some PHP code (are you as excited as I am?).

register.php (part 1):

<?php
//retrieve our data from POST
$username = $_POST['username'];
$pass1 = $_POST['pass1'];
$pass2 = $_POST['pass2'];

if($pass1 != $pass2)
	header('Location: register_form.php');

if(strlen($username) > 30)
	header('Location: register_form.php');

In the above code we retrieve our data from $_POST, which is an associative array where all of the post data is stored. We also check if pass1 and pass2 are equal which is an example of validating user input. If they are not equal we use the header function to redirect back to our registration form (assume the registration form is located in a file called register_form.php). Ideally we would want to display an error message, but for the sake of example we will keep it simple. We also check if the username exceeds 30 characters. Even though we set the form to allow no more than 30 characters it is important that we check as well; it is possible (and very simple) to bypass the limit imposed in the html code. In addition to this you should check for any other constrains you have placed on your data (valid characters, minimum length, etc).

Hashing

As I mentioned earlier, the length of the password varchar in the database is significant. This is because we are not actually going to (and never should) store the password in the database. We are going to store an sha1 hash which is a string always containing 40 characters. In simple terms a hash is an algorithm that maps inputs to outputs in a deterministic way. Meaning that given an input the algorithm will always produce the same output. Sha1 is an algorithm that outputs a 40 digit hexadecimal value. As in most cases, given the output of the sha1 algorithm it is not easy (though possible) to determine the input. In PHP we can get the sha1 hash of our password like this:

regsiter.php (part 2)

$hash = sha1($pass1);

Pass the Salt

As I mentioned before, there is no simple way to determine the input of the sha1 algorithm from the output. It is, however, possible through brute force or more complicated means. One way to improve the security of your users’ passwords is to use a salt. A salt is just a random string of characters that is appended to the hash, which is then hashed again.

regsiter.php (part 3)

//creates a 3 character sequence
function createSalt()
{
	$string = md5(uniqid(rand(), true));
	return substr($string, 0, 3);
}

$salt = createSalt();

$hash = sha1($salt . $hash);

Some people will tell you that a salt is not necessary, but it certainly doesn’t hurt to use one.

Now comes the database portion of our code, for this I will assume we are using a mysql database. I will also assume the database host, name, user, and password. I won’t anticipate any database connection errors in the code, but you should.

register.php (part 4):

$dbhost = 'localhost';
$dbname = 'tinsology';
$dbuser = 'tinsley';
$dbpass = 'trueblood'; //awesome tv show

$conn = mysql_connect($dbhost, $dbuser, $dbpass);
mysql_select_db($dbname, $conn);

//sanitize username
$username = mysql_real_escape_string($username);

$query = "INSERT INTO users ( username, password, salt )
		VALUES ( '$username' , '$hash' , '$salt' );";
mysql_query($query);

mysql_close();

header('Location: login_form.php');

In the above code we establish a connection to our MySQL database and add our new user to the users table. Then we redirect to our login form. Notice that we call the funciton mysql_real_escape string. This function helps to prevent SQL injections by escaping the input. Using an abstraction layer like PDO will also help to prevent this. Now that we’ve processed our registration data we can write the code to process our login data.

Logging in

Seriously this time. Our login processor will pull the login data from post and compare it to the database values.

$username = $_POST['username'];
$password = $_POST['password'];

//connect to the database here

$username = mysql_real_escape_string($username);

$query = "SELECT password, salt
		FROM users
		WHERE username = '$username';";
$result = mysql_query($query);

if(mysql_num_rows($result) < 1) //no such user exists
{
	header('Location: login_form.php');
}

$userData = mysql_fetch_array($result, MYSQL_ASSOC);
$hash = sha1( $userData['salt'] . sha1($password) );

if($hash != $userData['password']) //incorrect password
{
	header('Location: login_form.php');
}

//login successful

As I mentioned before, this is a stripped down example. Ideally you would not have the code you use to connect to your database in each file, but rather in a single file or function that you could include. In addition to this you should maintain a session throughout the process in order to store and report error message, as well as other useful data. The most important thing to remember when creating a login system is that you should never trust your users. Validate all user input, protect against SQL injections, and never store raw passwords in the database.

P.S. Session Control
Added December 15th 2009

All of the above code illustrates how to register a user, and allow them to login. There is, however, one fundamental piece that is missing: session control. In order for a login system to be useful, it must provide some means to distinguish between a logged in user and a non-logged in user across the entire site. Sessions are the means by which we do this. What we want to do is, after a user has successfully logged in is indicate, using a session variable, that the user has done so.

Accessing Session Data

Activating and managing sessions in PHP is very straightforward, you only need one function to both create and recall a particular session: session_start(). Here is a generic example of how to use session_start to store session data:

page1.php

session_start();

$_SESSION['foo'] = 'bar';

page2.php

session_start();

echo $_SESSION['foo']; //will output bar

In the above example you can see that in page1 we start a session and assign the session variable “foo” the value “bar”. If the user visits page2 sometime there after the session will be recalled and the value of foo (bar) will be echo’d. Notice that session_start creates a new session if one does not exist or recalls that session if it already exists.

There are a few subtleties relating to session_start that are important to remember. You must call session_start before any headers are sent to the browser. This means that your script cannot have any output or calls to header() (or any other function that sends headers) before calling session_start. The simplest way to avoid this problem is to call session_start before anything else. One common case where this problem can occur is when some code that normally wouldn’t have any output generates an error or warning prior to calling session_start. The default error handler will automatically output any error messages to the browser.

Generally (though not necessarily), a session lives as long as the browser remains open.

Using Sessions in Our Login System

There are three basic functions we want to incorporate into our login system: validating a user (i.e. indicating that user has logged on), checking if a user is logged on, and logging a user out.

Validating a User

function validateUser()
{
	session_regenerate_id (); //this is a security measure
	$_SESSION['valid'] = 1;
	$_SESSION['userid'] = $userid;
}

This function simply sets the session variable ‘valid’ to 1. You may also want to use this function to store certain variables. It is a good idea to store frequently accessed data about a particular user (such as a user id, username, NOT a password or sensitive data). The $_SESSION['userid'] = $userid; line is an example of how to store user info. Additional information about session security will be provided in the following section.

Checking if a User is Logged On

function isLoggedIn()
{
	if($_SESSION['valid'])
		return true;

	return false;
}

This function simply checks if the session variable ‘valid’ is set to 1.

Logging Out

function logout()
{
	$_SESSION = array(); //destroy all of the session variables
	session_destroy();
}

When it is time to log a user out, we destroy the session. All of these functions assume session_start has already been called.

Now it is time to incorporate this into our login script:

session_start(); //must call session_start before using any $_SESSION variables

$username = $_POST['username'];
$password = $_POST['password'];

//connect to the database here

$username = mysql_real_escape_string($username);

$query = "SELECT password, salt
		FROM users
		WHERE username = '$username';";
$result = mysql_query($query);

if(mysql_num_rows($result) < 1) //no such user exists
{
	header('Location: login_form.php');
	die();
}

$userData = mysql_fetch_array($result, MYSQL_ASSOC);
$hash = sha1( $userData['salt'] . sha1($password) );

if($hash != $userData['password']) //incorrect password
{
	header('Location: login_form.php');
	die();
}
else
{
	validateUser(); //sets the session data for this user
}

//redirect to another page or display "login success" message

Now we can use the isLoggedIn function to determine if a user is logged in and act accordingly:

membersonly.php

session_start();

//if the user has not logged in
if(!isLoggedIn())
{
	header('Location: login.php');
	die();
}

//page content follows

A Note About Session Security

By default, sessions are cookie based. This means that a particular session is associated with a user by use of a cookie. This being the case, there are certain vulnerabilities that arise. There are a variety of methods by which a session can be hijacked (XSS for example; a javascript injection can cause a user to give up their session id). Unfortunately you cannot expect to eliminate the possibility that a user has hijacked someone else’s session. Further research into the subject may yield a few tricks, but ultimately the best practice is to be cautious about certain tasks. If a user wants to change their password (after logging in), require the old password and whenever possible avoid displaying sensitive data.

Earlier I mentioned that you shouldn’t use session variables to store sensitive data. It is important to remember that session data lives on the server, this means that a user cannot directly view or modify session data. On a shared server, however, other users of that server may be able to access this information.

Just a reminder, all of the code examples I provided are exactly that: examples. They are meant to serve as a starting point, and hopefully shed some light on a few key concepts, but they are not is a real world implementation. I don’t recommend, for example, using header() to bounce your users around to different pages; there are better methods that reduce the number of page loads and give the user a more fluid experience. Using this method, however, simplifies things and makes the examples easier to follow.

Validating User Input

Whenever you write an application that takes user input, you must assume users fall into two categories. Users who are incompetent, meaning that they are likely to provide incorrect input, and users that are attempting to exploit the system, meaning that they are trying to access, destroy, or manipulate information that they should not be able to. Obviously there is a third category: Users who neither malicious, nor incompetent and are using the system in good faith. In the context of making a secure and robust application, however, we do not care about this third group of users.

Robust Applications

An application is robust if it is not prone to crashing or misbehaving regardless of the input it is given. If an application is given improper input it should respond by informing the user of their mistake. This means that the programmer must determine the nature of a user’s input, before using it. If your program is expecting a number as input, it should not proceed if that input is a string. Furthermore, if only a certain range of numbers (ie 1 through 10) are valid, then the program should not proceed if given a number outside of that range.

Regular expressions can further aid in validating data. If you are expecting for a user to submit an email address, simply verifying that the input is a string is not sufficient. You want to ensure that the input meets certain criteria: It should consist of at least 1 character followed by the @ symbol followed by a domain name, the . symbol and finally, a TLD. A regular expression to accomplish this would be:

/^([a-zA-Z0-9])+([a-zA-Z0-9\.\\+=_-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9\._-]+)+$/

Certainly there are more comprehensive ways to validate an email address, but it is important not to get carried away when validating data. The main purpose is not only to ensure that incorrect input is dealt with, but also that correct input is passed on without incident.

Secure Applications

In most cases incorrect input is harmless. It might cause the application to behave poorly or even crash, but generally restarting the programming or submitting a form over again will fix the problem. Some input, however, is intended to exploit a vulnerability in the system. An SQL injection is a common example of this type of input. Ensuring that input conforms to any applicable constraints is a first wave of defence against this type of attack. In addition to this, however, it is important to either escape or exclude certain characters from user input. Quotes for example should be be properly escaped.

Examples

So far I’ve talked about robust and secure applications in general. Now I will give examples of how to secure your application in PHP. Before I give code examples I would like to outline a few practices that will prevent SQL injections, as well as good faith mistakes:

  • Email addresses should be validated with a regular expression.
  • Number values, such as dates should be validated as integers.
  • User names should be constrained to a subset of characters (ie A-Z, a-z, 0-9 and _) and validated with a regular expression.
  • Passwords should be encrypted (i.e. sha1) before submitting to the database
  • All data should have quotes escaped

Here are a few PHP functions to validate user input:

<?php

/**
Ensures that the input is number, and if specified, lies
between the values min and max. For example, if you want to
validate that an input is a valid day of the month call
validateNumeric($input , $min = 0 , $max = 31);
**/
function validateNumeric($value , $min = 'none' , $max = 'none')
{
	if(!is_numeric($value))
		return false;

	if(is_numeric($min) && $min > $value)
		return false;

	if(is_numeric($max) && $max < $value)
		return false;

	return true;
}

/**
Ensures that the given email address is correctly
formatted
**/
function validateEmailAddress($address)
{
	if(!preg_match( "/^([a-zA-Z0-9])+([a-zA-Z0-9\.\\+=_-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9\._-]+)+$/", $address))
	{
		return false;
	}
	return true;
}

/**
Ensures that the given username contains only
letters and numbers and is longer than the
give minimum length.
**/
function validateUsername($user , $minLength)
{
	if(eregi('[^A-Za-z0-9]', $u) > 0 || strlen($u) < $minLength)
	{
		return false;
	}

	return true;
}

/**
Escapes the given string. It is best to use whatever
real_escape_string method that PHP supplies for your
particular database. I use MySQL here as a default.
If no real_escape_string method exists, the addslashes
function is used.
**/
function escapeString($value)
{
	if(function_exists('mysql_real_escape_string'))
		return @mysql_real_escape_string($value);
	else
		return addslashes($value);
}

/**
Encrypts the given password using sha1 (twice).
Also supports the use of a salt, which is recommended.
**/
function encryptPassword($password , $salt = '')
{
	$hash = sha1( $salt . sha1($password) );
	return $hash;
}

?>

PDO… Use It

The majority of PHP code I see, whether it be posted by a beginner on a forum, or built into a large application, seems to suggest that the current standard when accessing a database is to make use of whatever database specific commands PHP provides for your particular database (ie mysql_query, mssql_query, etc). Often times, calls to these function are coupled with various attempts to prevent sql injections (such as calls to mysql_real_escape_string). Altogether, however, to me it seems clumsy and insecure.

PHP5 provies a built in (partial) database abstraction layer that can not only simplify the process of querying your database not only protecting against sql injections, but optimize your queries and make your application more portable: PDO (PHP Data Objects). A typical database transaction with PHP might look something like this:

mysql_connect('localhost', 'user', 'password');
mysql_select_db('myDB');

$data = mysql_real_escape_string($_POST['data']);
$query = 'SELECT column FROM table WHERE data = \'' . $data . '\'';

$result = mysql_query($query);
while($row = mysql_fetch_array($result, FETCH_NUM))
{
     echo $row[0];
}

Now with PDO

$dsn = 'mysql:dbname=myDB;host=127.0.0.1';
try {
     $db = new PDO($dsn , 'user' , 'password');
}
catch(PDOException $e) {
     echo $e->getMessage();
}

$query = 'SELECT column FROM table WHERE data = ?';
$statement = $db->prepare($query);
$statement->bindParam(1 , $_POST['data']);
$statement->execute();

$rows = $statement->fetchAll(PDO::FETCH_NUM);

foreach($rows as $row)
{
     echo $row[0];
}

Outwardly, the second example might not seem like an improvement on the first. What is important, however, is what is going on underneath. From PHP.net:

Calling PDO::prepare() and PDOStatement::execute() for statements that will be issued multiple times with different parameter values optimizes the performance of your application by allowing the driver to negotiate client and/or server side caching of the query plan and meta information, and helps to prevent SQL injection attacks by eliminating the need to manually quote the parameters.

In addition to this, it simplifies the process of making your application compatible with another database. If you were to use database specific function throughout your application, you would need to change each to allow it to work with another database.

PDO, however, is not a complete database abstraction layer. While it does aid you in making your application portable, it will not emulate features of a particular database. If your queries are incompatible with your database PDO will not help you. Rarely, however, will it be the case that your queries are not compatible with most sql databases.

There are actually a few cases where where PDO will not be able to execute your queries (example: http://dev.mysql.com/doc/refman/5.0/en/c-api-prepared-statement-problems.html) but these cases are few and far between. Really it is rarely the case that you should NOT be using PDO.

Using PDO does not guarantee that user data is safe. You should always validate and sanitize user input, but PDO provides a decent extra line of defense.

http://us.php.net/pdo

Database Analysis Through Simulation

Making adjustments to a database schema after it has gone into use is a daunting task. Whether it be because of efficiency issues or the incorporation of a new feature, this is a situation you should avoid at all costs. Often times, however, mistakes and inefficiencies are difficult to spot at implementation time. Only when your database has become populated, often by users who are counting on your applications to be reliable, do these things come to light. So what can you do? One solution is to run a simulation. By this I mean systematically project how your database will look in the future when it has come into use.

Creating a simulation is not a difficult task, provided you have experience in virtually any programming language. All you need to do is write a program that simulates the growth of your site over time. The output would be a sequence of SQL commands (mostly inserts, maybe updates).

How it Works

For the purposes of this example lets assume that your website is some sort of forum. We’ll simplify things by limiting the actions that can be performed. Lets say that a user will be able to:

  • Register
  • Post a new thread
  • Comment on an existing thread
  • Send messages to another user

Start with a small initial population of users, as would be the case after your site was first launched. Now consider the upper limit of your simulation; how many users will your site have when the simulation completes. The simulation will run until your current population size exceeds your upper limit.

Since we want to take a systematic approach to this simulation, it should occur over intervals. An interval is an arbitrary period of time, over which, your population increases by a certain amount, we’ll call this the growth rate. The bulk of our simulation will occur in these intervals. During an interval, each user has a chance of performing one of the actions associated with our site. You must determine the chance of each action occuring in a realistic manner, which reference to your growth rate. Say you expect your site to grow by 5% a week. What is the probability that each user will post a new thread in that time period?

double population = 10.000;
double maxPopulation = 1000;
double growthRate = 1.05;

while(population <= maxPopulation)
{
     //simulation interval

    population *= growthRate;
}

In the example above, we start with a population size of 10 and we increase that population by 5 percent until it exceeds the max population of 1000. In each interval we must cycle over each user in the population, and determine if that user performs one of our actions, based on the probabilites we determined. We must also remember to generate new users based on how much our population increased. Now that we have a general frame for our simulation, we must generate our initial population and start generating data in our intervals.

List commandList = empty list;

List users = empty list;
List threads = empty list; //each thread is assumed to contain a list of comments
List messages = empty list;

double population = 10.000;
double maxPopulation = 1000;
double growthRate = 1.05;

double newThreadChance = .05; //for each user, there is a 2 percent
                                            //that they will post a new thread
double newCommentChance = .30;
double newMessageChance = .05;

//generates random numbers
Random r = new Random();

//generate our initial population
for(int i = 0; i < population; i++)
{
     User u = new User("username", "email", "other info");
     commandList.add(u.toSQL());
     users.add(u);
}

//begin intervals
while(population <= maxPopulation)
{
    //for each user
    for(User u : users)
    {
          if(newThreadChance <= r.nextDouble())
          {
               Thread t = new Thread(u, "title", "content");
               commandList.add(t.toSQL());
               threads.add(t);
          }

          if(newCommentChance <= r.nextDouble())
          {
               //select a random thread
               int index = r.nextInt(threads.size());
               Thread t = threads.get(index);

               Comment c = new Comment(t, u, "content");
               t.addComment(c);
               threads.set(index, t);

               commandList.add(c.toSQL());
          }

          if(newMessageChance <= r.nextDouble())
          {
               //select a random user
               int index = r.nextInt(users.size());
               User recipient = users.get(index);

               Message m = new Message(u, recipient, "subject", "content");
               messages.add(m);
               commandList.add(m.toSQL());
          }
     }

     //increase our population
     for(int i = population; i < population * growthRate; i++)
     {
           User u = new User("username", "email", "other info");
           users.add(u);

           commandList.add(u.toSQL());
      }

     population *= growthRate;
}

The above example is a completed simulation. When it is complete the list commandList will contain a complete list of all of the SQL insert commands, in order, for us to population our database with.

There are some parts of the simulation that I left for the reader to complete on their own. The details of implementing the user, message, thread, and comment objects have been left out. Notice that each of these entities contains a toSQL method. This will simplify the process of converting your objects to SQL. Also, you will have to dump the commandList to a text file so it can be run on your database. This is just one example of how to carry out a simulation. Obviously if you choose to use a non-object oriented approach your implementation will look different.

Once your database is populated you can then navigate your site as if it is teaming with users. This will not only allow you to rate the performance of your site, but allow you to see what it will look like once it has gone into use.