Validating User Input

Whenever you write an application that takes user input, you must assume users fall into two categories. Users who are incompetent, meaning that they are likely to provide incorrect input, and users that are attempting to exploit the system, meaning that they are trying to access, destroy, or manipulate information that they should not be able to. Obviously there is a third category: Users who neither malicious, nor incompetent and are using the system in good faith. In the context of making a secure and robust application, however, we do not care about this third group of users.

Robust Applications

An application is robust if it is not prone to crashing or misbehaving regardless of the input it is given. If an application is given improper input it should respond by informing the user of their mistake. This means that the programmer must determine the nature of a user’s input, before using it. If your program is expecting a number as input, it should not proceed if that input is a string. Furthermore, if only a certain range of numbers (ie 1 through 10) are valid, then the program should not proceed if given a number outside of that range.

Regular expressions can further aid in validating data. If you are expecting for a user to submit an email address, simply verifying that the input is a string is not sufficient. You want to ensure that the input meets certain criteria: It should consist of at least 1 character followed by the @ symbol followed by a domain name, the . symbol and finally, a TLD. A regular expression to accomplish this would be:

/^([a-zA-Z0-9])+([a-zA-Z0-9\.\\+=_-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9\._-]+)+$/

Certainly there are more comprehensive ways to validate an email address, but it is important not to get carried away when validating data. The main purpose is not only to ensure that incorrect input is dealt with, but also that correct input is passed on without incident.

Secure Applications

In most cases incorrect input is harmless. It might cause the application to behave poorly or even crash, but generally restarting the programming or submitting a form over again will fix the problem. Some input, however, is intended to exploit a vulnerability in the system. An SQL injection is a common example of this type of input. Ensuring that input conforms to any applicable constraints is a first wave of defence against this type of attack. In addition to this, however, it is important to either escape or exclude certain characters from user input. Quotes for example should be be properly escaped.

Examples

So far I’ve talked about robust and secure applications in general. Now I will give examples of how to secure your application in PHP. Before I give code examples I would like to outline a few practices that will prevent SQL injections, as well as good faith mistakes:

  • Email addresses should be validated with a regular expression.
  • Number values, such as dates should be validated as integers.
  • User names should be constrained to a subset of characters (ie A-Z, a-z, 0-9 and _) and validated with a regular expression.
  • Passwords should be encrypted (i.e. sha1) before submitting to the database
  • All data should have quotes escaped

Here are a few PHP functions to validate user input:

<?php
/**
Ensures that the input is number, and if specified, lies
between the values min and max. For example, if you want to
validate that an input is a valid day of the month call
validateNumeric($input , $min = 0 , $max = 31);
**/
function validateNumeric($value , $min = 'none' , $max = 'none')
{
	if(!is_numeric($value))
		return false;
	if(is_numeric($min) && $min > $value)
		return false;
	if(is_numeric($max) && $max < $value)
		return false;
	return true;
}
/**
Ensures that the given email address is correctly
formatted
**/
function validateEmailAddress($address)
{
	if(!preg_match( "/^([a-zA-Z0-9])+([a-zA-Z0-9\.\\+=_-])*@([a-zA-Z0-9_-])+([a-zA-Z0-9\._-]+)+$/", $address))
	{
		return false;
	}
	return true;
}
/**
Ensures that the given username contains only
letters and numbers and is longer than the
give minimum length.
**/
function validateUsername($user , $minLength)
{
	if(preg_match('/[^A-Za-z0-9]/i', $user) > 0 || strlen($user) < $minLength)
	{
		return false;
	}
	return true;
}
/**
Escapes the given string. It is best to use whatever
real_escape_string method that PHP supplies for your
particular database. I use MySQL here as a default.
If no real_escape_string method exists, the addslashes
function is used.
**/
function escapeString($value)
{
	if(function_exists('mysql_real_escape_string'))
		return @mysql_real_escape_string($value);
	else
		return addslashes($value);
}
/**
Encrypts the given password using sha1 (twice).
Also supports the use of a salt, which is recommended.
**/
function encryptPassword($password , $salt = '')
{
	$hash = sha1( $salt . sha1($password) );
	return $hash;
}
?>

Getting started with C++

I am by no means a C++ expert, but I did for, for a summer, teach introduction to programming in C++ (I would have preferred teaching in Java, but the job called for C++). This being the case, I’m aware of many of the issues that people who are just picking up the language have. I’ll address a few of the common questions and issues here. Continue reading

Consolidating Error Pages with .htaccess

Before I get into the topic of how to consolidate all of your error pages, let me first explain how to use .htaccess to create custom error pages. If you already know how to do this feel free to skip to the next section.

Creating Custom Error Pages

.htaccess, among other things, allows you to specify custom error pages for your site. Say a user requests a file that does not exist, typically that person will get an error page that looks somewhat like this:

—————————————————-
Not Found

The requested URL /somepage.html was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
—————————————————-

Not only is this message not helpful, but it is unappealing. Most likely you worked tirelessly making your site look presentable, so it would be a shame for a user attempting to access a page on your site to be given an ugly error message.

.htaccess allows you to specify the page you would like to use as an error page for a particular error (404, 301, 500 etc.). To do this, if it does not already exist, create a file called .htaccess in the root ( / ) directory of your site, or whatever directory you want to use custom error pages for. Note that these custom error pages will be used not only for the directory that the .htaccess file is located in, but all of the ones below it, unless you specifically override it.

Once you have created your .htaccess file, for each error you would like a custom page for add the following line:
ErrorDocument
So for example, if you would like to use a custom error page for a 404 (file not found) do the following:
ErrorDocument 404 /404.html
Where 404.html is your custom error page. Note that not every code is an error code. If you tried to set up an error page for code 200 you would end up creating an infinite loop.

Consolidating Your Error Pages

So you’ve setup your custom error pages. Mostly likely you haven’t taken the time to create a custom page for every error, and its most likely the case you don’t need to. I can honestly say I’ve never gone to a site and have it come up with a 414 Request URI Too Long error. Still, you might have taken the time to create several pages for the more common errors. You may even have a whole directory dedicated to error pages. Instead, you may want to consider using a single error page for all errors. If you’re experienced with .htaccess, you may already know how to accomplish this, if not I’ll show you.

In your .htaccess file you’ll still need to add a line for each error code you want to use a custom page for. Make sure to use the same scheme when naming all of your error pages, for example fourohfour.html, and 5hundred.php would be a bad choice. For this example I’ll use error.php (ie error404.php). These pages don’t actually need to (and should not) exist. Now we just need to create a RewriteRule for our error pages:
RewriteEngine On
ErrorDocument 404 /error404.php
ErrorDocument 500 /error500.php
RewriteRule ^error([0-9]+) error.php?code=$1 [NC]


You could eleminate the need for the rewrite rule by redirecting all of your errors to the same page like so:
RewriteEngine On
ErrorDocument 404 /error.php?code=404
ErrorDocument 500 /error.php?code=500


This, however, reveals the underlying system. This might not be a problem for most people, but some people prefer the look of urls that don’t contain parameters (?code=xxx). Also, if you decide you want to track what errors your users are getting (like what pages they are linking to that don’t exist) you can store this information in a database, in which case you wouldn’t want users to be aware of the underlying system.

Now whenever a user gets a 404 or a 500 error they will be redirected to error.php and the error code will be passed to that script. In error.php you can now set up custom messages for each error code. Here is a simple script as an example:

$code = $_GET['code']; //the error code
$code .= ''; //avoid integer indexing of the array
$errors = array( '300' => 'Multiple Choices',
                 '301' => 'Moved Permanently',
                 '302' => 'Moved Temporarily',
                 '303' => 'See Other',
                 '304' => 'Not Modified',
                 '305' => 'Use Proxy',
                 '400' => 'Bad Request',
                 '401' => 'Authorization Required',
                 '402' => 'Payment Required',
                 '403' => 'Forbidden',
                 '404' => 'Not Found',
                 '405' => 'Method Not Allowed',
                 '406' => 'Not Acceptable',
                 '407' => 'Proxy Authentication Required',
                 '408' => 'Request Timed Out',
                 '409' => 'Conflicting Request',
                 '410' => 'Gone',
                 '411' => 'Content Length Required',
                 '412' => 'Precondition Failed',
                 '413' => 'Request Entity Too Long',
                 '414' => 'Request URI Too Long',
                 '415' => 'Unsupported Media Type',
                 '500' => 'Internal Server Error',
                 '501' => 'Not Implemented',
                 '502' => 'Bad Gateway',
                 '503' => 'Service Unavailable',
                 '504' => 'Gateway Timeout',
                 '505' => 'HTTP Version Not Supported'
                      );
echo $code . ' ' . $errors[$code];

Clearly the error page generated by the script above would be no better than the default ones, but it is just an example. You could easily embed it into your site in order to maintain consistency for your users.

Expanding My Horizons

The more I learn the more I realize there is so much more out there that I have yet to experience. With regard to programming, I’ve gotten to a point where I am no longer limited to any one language. I do feel, however, that I have yet to experience the huge variety of languages out there. When it comes to software engineering, object oriented programming is pervasive. This approach, has so far, dominated my experiences as a programmer.

Yet there are many more approaches. Here is a fairly comprehensive list of the various paradigms out there:

Each one of these paradigms has its own list of languages associated with it. Some of them you’ve heard of, some of them are more obscure. Some languages transcend multiple paradigms (C++, PHP, Oz) while others are pure forms of its associated paradigm (Java).

My Goal

I’m primarily a Java and PHP programmer. This means that I’ve really only experienced two paradigms: Object-Oriented and Imperative (I’ve dabbled in functional using LISP, but not enough). My goal is to either learn a language, or a new approach with a language that I already know, that falls under every one of the categories above.

To get started, I think I’m going to take a shot a parallel programming, either in join java or Oz. This is most likely going to be an ongoing project for quite a while as school is devouring all of my time (honestly I shouldn’t even be writing this right now).

Database Analysis Through Simulation

Making adjustments to a database schema after it has gone into use is a daunting task. Whether it be because of efficiency issues or the incorporation of a new feature, this is a situation you should avoid at all costs. Often times, however, mistakes and inefficiencies are difficult to spot at implementation time. Only when your database has become populated, often by users who are counting on your applications to be reliable, do these things come to light. So what can you do? One solution is to run a simulation. By this I mean systematically project how your database will look in the future when it has come into use.

Creating a simulation is not a difficult task, provided you have experience in virtually any programming language. All you need to do is write a program that simulates the growth of your site over time. The output would be a sequence of SQL commands (mostly inserts, maybe updates).

How it Works

For the purposes of this example lets assume that your website is some sort of forum. We’ll simplify things by limiting the actions that can be performed. Lets say that a user will be able to:

  • Register
  • Post a new thread
  • Comment on an existing thread
  • Send messages to another user

Start with a small initial population of users, as would be the case after your site was first launched. Now consider the upper limit of your simulation; how many users will your site have when the simulation completes. The simulation will run until your current population size exceeds your upper limit.

Since we want to take a systematic approach to this simulation, it should occur over intervals. An interval is an arbitrary period of time, over which, your population increases by a certain amount, we’ll call this the growth rate. The bulk of our simulation will occur in these intervals. During an interval, each user has a chance of performing one of the actions associated with our site. You must determine the chance of each action occuring in a realistic manner, which reference to your growth rate. Say you expect your site to grow by 5% a week. What is the probability that each user will post a new thread in that time period?

double population = 10.000;
double maxPopulation = 1000;
double growthRate = 1.05;
while(population <= maxPopulation)
{
     //simulation interval
    population *= growthRate;
}

In the example above, we start with a population size of 10 and we increase that population by 5 percent until it exceeds the max population of 1000. In each interval we must cycle over each user in the population, and determine if that user performs one of our actions, based on the probabilites we determined. We must also remember to generate new users based on how much our population increased. Now that we have a general frame for our simulation, we must generate our initial population and start generating data in our intervals.

List commandList = empty list;
List users = empty list;
List threads = empty list; //each thread is assumed to contain a list of comments
List messages = empty list;
double population = 10.000;
double maxPopulation = 1000;
double growthRate = 1.05;
double newThreadChance = .05; //for each user, there is a 2 percent
                                            //that they will post a new thread
double newCommentChance = .30;
double newMessageChance = .05;
//generates random numbers
Random r = new Random();
//generate our initial population
for(int i = 0; i < population; i++)
{
     User u = new User("username", "email", "other info");
     commandList.add(u.toSQL());
     users.add(u);
}
//begin intervals
while(population <= maxPopulation)
{
    //for each user
    for(User u : users)
    {
          if(newThreadChance <= r.nextDouble())
          {
               Thread t = new Thread(u, "title", "content");
               commandList.add(t.toSQL());
               threads.add(t);
          }
          if(newCommentChance <= r.nextDouble())
          {
               //select a random thread
               int index = r.nextInt(threads.size());
               Thread t = threads.get(index);
               Comment c = new Comment(t, u, "content");
               t.addComment(c);
               threads.set(index, t);
               commandList.add(c.toSQL());
          }
          if(newMessageChance <= r.nextDouble())
          {
               //select a random user
               int index = r.nextInt(users.size());
               User recipient = users.get(index);
               Message m = new Message(u, recipient, "subject", "content");
               messages.add(m);
               commandList.add(m.toSQL());
          }
     }
     //increase our population
     for(int i = population; i < population * growthRate; i++)
     {
           User u = new User("username", "email", "other info");
           users.add(u);
           commandList.add(u.toSQL());
      }
     population *= growthRate;
}

The above example is a completed simulation. When it is complete the list commandList will contain a complete list of all of the SQL insert commands, in order, for us to population our database with.

There are some parts of the simulation that I left for the reader to complete on their own. The details of implementing the user, message, thread, and comment objects have been left out. Notice that each of these entities contains a toSQL method. This will simplify the process of converting your objects to SQL. Also, you will have to dump the commandList to a text file so it can be run on your database. This is just one example of how to carry out a simulation. Obviously if you choose to use a non-object oriented approach your implementation will look different.

Once your database is populated you can then navigate your site as if it is teaming with users. This will not only allow you to rate the performance of your site, but allow you to see what it will look like once it has gone into use.

Genetic Algorithm Problems

For anyone who is not familiar with Genetic Algorithms I’ll begin with a short summery of what they are and how they are used.

The Algorithm

A genetic algorithm (GA) can be thought of as a search. Given some initial state, the algorithm is searching for an optimal state. It does this in a way that mimics nature (hence the name). Say you have a population of a certain species. The first generation of these creatures may not be optimally suited for their environment. Over time the individuals who are less suited die off while those that are well suited reproduce and dominate the others. In addition to reproduction between well suited individuals (cross-overs in the context of a GA) the offspring of those individuals experience mutation, meaning that the child of individuals A and B is not purely a cross between the two, but has its own unique traits. Generally mutation occurs at a low probability.

In the context of programming, a GA can be expressed as a function that takes as input a population and a fitness function. The population is a collection of individuals and the fitness function is a means of determining how fit an individual is. At generation zero the population is usually randomly generated. In order to get from generation zero to generation 1, the algorithm uses the fitness function to determine which individuals to include in the cross-over (reproduce), leaving out the rest. The children of those individuals are then passed to a mutate function, that alters them in some way, usually at a very low probability. Here’s some pseudo code that might help to understand how this might be implemented:

GeneticAlgorithm(Population pop, FitnessFunction fn) returns optimal state
{
     while(solution not found)
     {
          Population nextGeneration = empty list;
          for(int i; i < size(pop); i++)
          {
               Individual x = randomSelection(pop, fn);
               Individual y = randomSelection(pop, fn);
               child = reproduce(x, y);
               if(small probability)
                    child = mutate(child);
               nextGeneration.add(child);
          }
     }
     return the most fit individual;
}

How They Are Used

There are a variety of problems that can be solved with genetic algorithms. GAs are adept for optimization problems in particular. K-SAT problems for example can be solved with a genetic algorithm (though other means exist).

For anyone not familiar with K-SAT problems I’ll give a short explanation. SAT (or satisfaction) problems attempt to assign values to a boolean formula in such a way that it evaluates to true. So if my SAT problem consists of two variables: A and B and one clause: A OR B then one solution would be A = true, B = true. Clauses are the components of the boolean formula, in the example I gave the formula consists of only one clause. A larger SAT problem may consist of hundreds of variables and thousands of clauses and cannot be solved on paper in a reasonable amount of time. Here is an example of a larger sat problem in Conjunctive Normal Form (CNF):

(A OR B OR C) AND (A OR !B OR !C) AND (!A OR B OR !C)

This formula consists of three variables (A, B, C) and three clauses. A solution to this problem would be A = true, B = true, C = false. Notice that there are many different assignments of these variables that satisfy the formula. If there were more clauses this might not be the case.

To solve a SAT problem with a genetic algorithm you start of with a population of randomly generated “solutions”, each solution consisting of a random assignment of true of false to each variable. This population is generation zero. In this context the fitness function is defined as the number of satisfied (or unsatisfied) clauses in the boolean formula. Using the fitness function, for each individual in generation zero, a fitness value is determined. It might be the case that one of these individuals satisfies the formula, in which case you’re done. Otherwise, in order to get from generation zero to generation one, we must choose a portion of the population to “reproduce”, for example, those having a fitness above the average.

Once we’ve made our selection we perform the cross over by producing a new individual with a portion of its assignments coming from each parent (the size of the portion may be determined randomly). For example, for individuals X and Y and X(A,B,C) = {True , False, False} and Y(A,B,C) = {False, True, True} a possible child would be Child(A,B,C) = {False, False, False}.

After we’ve generated a new population we then randomly mutate each individual at a very low probability. At probabilities above 5% in many cases a solution will not be found in a reasonable amount of time. A mutation takes an assignment and flips it. So for the individual X(A,B,C) = {True, False, False} if a mutation event occurs on the variable B, it will become X(A,B,C) {True, True, False}. Without this mutation the algorithm does not approach a solution.

At Generation zero for a large problem, there is very little chance of a solution existing. After each passing generation, however, the average fitness increases and it becomes likely that an individual satisfies the formula.

Problems with Genetic Algorithms

After each generation the individuals of a population begin to approach the solution. In the context of a SAT problem this means they satisfy more and more clauses. There is, however, no guarantee that they will ever satisfy all of them. This is because individuals that have a fitness near the maximum, may actually be very different from the solution. For example, say a SAT problem has the solution 000011000 where each character in the bitstring represents a variable and the 0s represent false, and the 1s represent true. The string 111100111 might satisfy 90% of the clauses. If this is the case, the children produced by this individual will look similar to it and the likelihood of it being mutated into the solution is essentially zero. The following graph illustrates this problem:

Local Max Problem

Local Max Problem

From the graph you can see that there are two peaks, one reaching 100, the other 75. The higher one represents the solution to the problem, while the other is called a local maximum. A genetic algorithm may reach the peak of a local maximum and become stuck because all similar solutions have a lower fitness, while the actual solution is unsimilar to the current state.

Possible Solutions

A possible way to fix this problem would be to reset the search. Generated a new set of random solutions as the algorithm did at generation zero and proceed from there. This is called a random-reset. Hopefully after the reset the search will approach the solution rather than a local max. Another similar solution would be to mutate each individual in the current population at a much higher rate, possibly 100%. This would produce a population that very different from the one that existed at the local maximum.

These solutions would fix the problem in a case where there were only a few local maximums, but for some problems it might be the case that there are numerous local maximums. For these problems, genetic algorithms with random-reset might find solutions that have very high fitness, but never the solution. Currently, I’ve found that the 100% mutation solution performs better than random-reset when it comes to avoiding local maximums in the context of a K-SAT problem, but for very large problems I consistently end up with solutions that satisfy 99% of the clauses and go on indefinitely without approaching the solution.

Don’t Fear the Re(cursion)aper

For some reason whenever I see someone post code that cycles through an array or does something repeatedly, that code takes the form of some kind of loop. I admit that I’m guilty of this as well, but when I think about why I’m cannot come up with a reason. I’m comfortable with the alternative (recursion) and in many cases I have to think harder about a problem to formulate the solution in a loop rather than recursion.

For those of you who have not been schooled in the arts of recursion, in simple terms it involves a function calling itself or recursing until it finds a solution. A basic recursion has two portions: the base case, and the recursive case. The base case establishes a condition for returning, without it the recursion would be infinite. The recursive case is the case where the function needs to be called an additional time. To demonstrate this lets say we have an array of ten integers numbered 1 through 10 in order. I’ll write two functions for finding a value within that array, one using a loop and the other using recursion. Assume our array is called myArray and is already initialized.

<?php
function find($value)
{
     foreach($myArray as $x)
     {
          if($x == $value)
               return true;
     }
return false;
}
?>
<?php
function find($value, $position = 0)
{
     //this is an error case
     if($position > 9)
          return false;
     //this is our base case
     if($myArray[$position] == $value)
          return true;
     //this is our recursive case
     return find($value, $position + 1);
}
?>

A few things to take note of: First, our recursive version of the find function has a type of case I didn’t mention, an error case. Not every recursion will have an error case, but it must exist in case the value we are searching for does not exist in the array. Otherwise we would end up indexing outside of our array. A second thing to note is that the recursive version of the find function has a second parameter to keep track of the position in the array. The foreach loop in the loop version does not appear to have a value to keep track of our position but it still exists. Foreach is a shortcut syntax, the position is still being tracked but the programmer does not have to be aware of it (which is good in some cases, and bad in others).

Typically in recursive functions like the one above you want to be able to call it without having to give additional inputs for the recursion. This is possible in the case above because in PHP there exists default values. Above I default the variable position to 0 so even if that parameter isn’t given in the call the search will start at 0. In other languages this may not be possible, but there are alternatives. In java for instance there are no default values so we must rely on another feature: polymorphism. In java functions are identified by their signature, not their name. I can define a method foo that takes as input an integer and another method foo that takes as input a string. I’ll use this feature to implement the same recursive function in java.

public class RecursionTest {
     private int[] myArray;
     public static void main(String[] args)
     {
          //assume myArray is initialized
          find(5);
     }
     private boolean find(int value)
     {
          return find(value, 0);
     }
     private boolean find(int value, int position)
     {
          //error case
          if(position > 9)
               return false;
          //base case
          if(myArray[position] == value)
               return true;
          //recursive case
          return find(value, position + 1);
     }
}

You may be asking “What’s the point of all this recursion stuff when the loop accomplishes the same thing?”. I have two answers. The first is performance and the second is modeling your problem. When you use a loop to solve a problem, any variables you create are allocated in the heap. Alternatively with recursion, your function calls are all placed in the stack.

Typically stack allocations are cleaner than heap allocations. The stack basically keeps track of function calls and their inputs. When a function is called it and its inputs are placed on the stack. When that function returns it and its inputs are removed from the stack. In this case all of the memory allocation occurs in sequence, the system doesn’t have to worry about searching for free space on the stack and the programmer doesn’t have to worry about making sure that information is deleted from the stack (in a non-garbage collected language I mean).

In the heap on the other hand, the runtime system must find memory large enough for your data to fit in, which could lead to fragmentation. When the data is no longer needed the memory must be marked as free.

Does this mean that loops are inferior to recursion? No. There are cases were a loop would perform better than a recursion.

Now, on to my second point: modeling your problem. Once you get comfortable with recursion you will find that there are some things that seem much simpler to code without loops. Recursion also forces you to create a function or method for a certain task rather than just throwing a loop into your code, which can lead to massive blobs of unorganized code. If you have trouble with organizing your code try recursion for a while and I guarantee you’ll produce code that is easier to deal with.

For the time being I’m going to set out to solve more problems with recursion rather than loops, just for the sake of being an outcast.

P.S. the title is a blue oyster cult reference and I don’t know why.

Intercepting Email with PHP

Here I will show you how to intercept an incoming email through piping. I have also written a guide on How to Access Email in a Mailbox through IMAP

One of the most frequent questions I’m asked, or I see asked on forums is “how do I send out an email using PHP?” The answer to that is fairly simple and well documented. It more or less involves the use of a single function mail().

Something a bit more complicated and, I think, more interesting is how you can intercept an incoming email with a pretty small chunk of PHP code. The answer to how to do this, though not so difficult to find out, is a far less trivial thing.

The Scenario (optional if you don’t feel like reading)

Say we are building a contact form for a web application. This pretty basic contact form takes whatever input  the user gives, to keep things simple we’ll just say its the users email address and a message. This message is then sent to a database where it can be read from an administrative control panel of some sort. This is pretty simple and for the most part effective. Personally I have a problem with this type of setup. If users are not logged in when using the form there is no guaranteed way to respond. The email address they provide may or may not be valid, or they could just be spamming some nonsense because they are anonymous in the Internet and no one can stop them. A simple solution: Force users to log in. But Wait! what if users are having trouble creating an account or logging in and this is the reason why thy need to contact you? OK, to solve this problem we’ll just set up an email address that users can contact us with in addition to the form. This solves the problem of reliable contact information and minimizes the spam problem (the email address can still be spammed but a lot of it can be filtered). So once this is all setup will have a contact form that sends info that can be read by all of our admins in our admin panel… and the occasional email to some email account that all of our admins will need the password to to check regularly.

This may be acceptable to some… but not to us! We know of a better way to consolidate all of our communications with our users while at the same time having our desired setup. OK maybe you don’t or else why would you read this. Here’s how we do it:

The Code(this is where you should start reading if you like to get to the point)

Some of you may be thinking that PHP is so magical that there is a simple function that is going to go out and fetch the email we want from where ever it is on the server. Well you’re wrong (not about the magical part). What we need to do is setup a script that can capture the email when the server recieves it. This is actually pretty simple, all we need to do is extract the email data from standard input (php://stdin).

$file = fopen('php://stdin', 'r');

Once we establish a file pointer to our input stream the rest is the same as reading in any other file. Here’s the complete code:

< ?php
$data = '';
$file = fopen('php://stdin', 'r');
while(!feof($file))
{
     $data .= fgets($file, 4096);
}
fclose($file);
?>

$data now stores the raw content of the email, headers and body. At this point we just need to decide what to do with the email, I’ll leave that for you to decide. So other than that our script is complete correct? False. We have not yet addressed one crucial step. The server needs to be told how to execute this script. You may be wondering, why would we have to tell the server what to do with a .php file… its .php! The main difference between a user requesting a .php file via the internet and what we are doing here is what is handling the request. When a user goes to index.php on your website, apache knows what to do with the file because that is how it has been configured. The server on the otherhand will not necissarily assume that the file should be parsed by php. So how do we tell it to you say? Hashbang. A hashbang tells the server how to handle a file, in our case we will use the following hashbang #!/usr/bin/php -q. Note that this may or may not work for you depending on the path to php on your server. The one I’m using is the default, so if you’re not sure use that one, otherwise contact your hosting provider. Ok so here is our completed, hashbanged code:

#!/usr/bin/php -q
< ?php
$data = '';
$file = fopen('php://stdin', 'r');
while(!feof($file))
{
     $data .= fgets($file, 4096);
}
fclose($file);
?>

Note that the hashbang is outside of the php tags. This is absolutely necissary for the script to work. Save this code and take note of the path to it.

The Setup

Now that we’ve written our script we need to tell the server what to do with incoming email. This setup is going to vary depending on your server configuration, I’ll tell you how to do it on a linux server with sendmail and cpanel but for anything else you’re on your own (or you can post back explaining how and I’ll add it to the tutorial). What we need to do is setup a forwarder. If you have cPanel this is fairly straightforward. In your mail section find the forwarders option and click it. From there click add new forwarder. At this point you’ll have a few different options: you can forward to another address, discard and send an error message, do nothing, or pipe to a program. We want to pipe to a program. Just type in the address that you want piped, select the pipe option and then set the path to the path to the file you just saved. If you have access to your home directory, you can forego using cpanel and use a .forward file. All you have to do is create a file called .forward and add the following line:

email@address.com,”|/path/to/script.php”

Of course replace the email address with the address you want forwarded and /path/to/script.php with the actual path to the script. You can also omit the email address portion to have all mail forwarded to the script. I believe this process is the same using Exim or Qmail.

Well that’s it for the most part. Feel free to post back with any problems you have or additions you’d like to make. I’ll write about taking a raw email and converting it to a more friendly format, in this case an associative array, in the near future.