Archive for July 2009

Infinity

I suddenly recall something interesting a professor of mine pointed out a couple years ago while on a tangent during lecture. It has to do with the nature of infinity and how accepting something perfectly reasonable as true leads to less intuitive, but equally true conclusions.

The following expression is true and most people would not argue otherwise:

1/3 = .3333333333 . . .

Assume, of course, that there is an infinite number of 3s trailing the decimal point. The following expression is also true, and even fewer people would argue otherwise:


1/3 + 1/3 + 1/3 = 1

This may seem obvious, but what may be less obvious is what follows logically from the two expressions above:

  .333333 . . .
  .333333 . . .
+ .333333 . . .
_______________________
  .999999 . . .  =  1

You may be reluctant to accept that the third expression is true, but if you accept the first two expressions, there is no avoiding it. Most people with a background in computer science or mathematics probably won’t be blown away by this, but it is fun, nerdy thing to point out to your friends in the humanities department (or anyone with a B.A.)

Unlimited PHP Function Parameters

If you’ve ever used PHP’s library functions you’ve most likely noticed that several function such as array() can take an indeterminate number of arguments. Normally when defining a function you specify each argument in the function declaration. Obviously it would be impossible to define an infinite number of arguments in such a way. PHP does, however, allow you to accomplish this through the function func_get_args().

func_get_args() returns an array consisting of all of the arguments passed to a function. Using this method you can bypass the conventional method of defining parameters in the function definition all-together. Here is an example:

function add()
{
	$total = 0;
	$args = func_get_args();

	foreach($args as $arg)
	{
		if(is_numeric($arg))
			$total += $arg;
	}
	return $total;
}

echo add(1, 2, 3); //will return 6

If for whatever reason you need to know the total number of arguments passed to a function, PHP provides the func_num_args() function.

When retrieving arguments in this manner it is important to remember that func_get_args only returns an array of arguments passed by the user. It does not account for default values.

PHP Lorem Ipsum Generator

For the most up-to-date information regarding this script go to the PHP Lorem Ipsum page in the scripts section.

The other day I needed to populate a database with some placeholder content. Doing this manually was out of the question so I decided I’d find a text generator, specifically a Lorem Ipsum generator. For anyone unaware, Lorem Ipsum is non-sense, placeholder text used in publishing and design. It allows the developer to see their work completely populated with text, without having to actually create the text. Obviously, for this purpose, any kind of text generator would work to some extent, but traditionally Lorem Ipsum is used.

To get to the point, I successfully located several web-based generators, but no stand-alone PHP class or function. To be honest, I didn’t look too hard and someone a little more determined not to write any code most likely would have found it, but I decided to create my own PHP Lorem Ipsum generator. Here is a rundown of some of the features in the current version:

PHP Lorem Ipsum Generator
Version: 1.0
License: BSD

Download:
Link moved Here.

Features

  • Generates content in three modes: Plain, HTML (content blocks nested in <p> tags), and Text (plain text in paragraph form)
  • Sentences are punctuated and vary in length based on statistics collected here: http://hearle.nahoo.net/Academic/Maths/Sentence.html. Sentence length will vary on a Guassian distribution.
  • HTML output is ‘clean-code’ formatted with tabs and new lines rather than just blobs of code
  • More output formats to come…

Feel free to request additional features.

Usage

The only public method in the class is getContent.

Description
string getContent( int $wordCount [, string $format = html] [, boolean $loremipsum = true] )

Returns the desired amount of content as a string.

Parameters

wordCount
The number of words to be returned.

format
The output mode, one of ‘html’, ‘txt’, or ‘plain’. HTML by default.

  • HTML: The content is divided into paragraphs, using the paragraph ( <p></p> ) tag.
  • Text: The content is divided into paragraphs with the leading line of each paragraph tabbed
  • Plain: The content is returned unformatted

loremipsum
Whether or not the content should begin with ‘Lorem ipsum’. True by default.

Example

require('LoremIpsum.class.php');

$generator = new LoremIpsumGenerator;

//100 words in html format
$generator->getContent(100);

//100 words without any formatting
$generator->getContent(100, 'plain');

//100 words with 'text' formatting
$generator->getContent(100, 'txt');

//100 words with html format, not beginning with lorem ipsum
$generator->getContent(100, NULL, false);

//or
$generator->getContent(100, 'html', false);

Additional Notes
Both the HTML and Text output modes use paragraph formatting. The mean word count of each paragraph is predetermined and can be set in the constructor, currently the default is 100. Note that this is the mean word count, the actual word count for each paragraph will vary in the same way the length of each sentence will vary.

Changelog
Version 1.1 (Planned)

  • Additional output modes. List mode and possibly more

Version 1.0:

  • Initial Release

Passing by Reference or Value

Even if you are new to programming, you probably have an understanding of functions and their purpose. What may be less clear, however, is what is happening underneath the hood when you pass a value to a function. In some languages, such as Java, when you pass a value (such as an integer or a char, this only applies to primitive types) the function receives a copy of that value. In this instance you are guaranteed that the original version of the value will remain unchanged after the function call. In other languages, such as C/C++ and PHP, a copy of the value need not be made; you have the option of passing the value by reference. In this case any changes made to the value within the function will persist beyond the function call. Here is an example, using PHP:

//x is passed by value
function foo($x)
{
	$x++;
}

//x is passed by reference
function bar(& $x)
{
	$x++;
}

$a = 1;
foo($a); //a is still 1

bar($a); //a is 2

In PHP the & symbol denotes that the variable is a reference. In this case the syntax is fairly simple, as references can be handled the same as values. In a type safe language like C++ however, a reference cannot be handled in such a way; the corresponding value must be accessed manually. This being the case the C++ reference syntax is a bit more complicated (and often messy).

Understanding the difference between passing by value and passing by reference is only half the battle. Understanding when to use it is equally important. There are two cases (that I will mention) that passing by reference comes in handy. The first may seem obvious: when you want the value to be changed by the function call. Generally, if you expect the value to remain unchanged after the function call, or if you no longer have use for the variable, then you should pass it by value. Passing by reference is not a substitute for return values. In the previous example, in practice, the function bar should be written like this:

function bar($x)
{
	$x++;
	return $x;
}

$a = 1;
$a = bar($a); //a is 2

Passing by reference is more appropriate in cases where the return value of a function cannot be the altered value that was passed:

function bar(& $x)
{
	$x++;
	if($x > 5)
		return true;
	else
		return false;
}

$a = 1;
$greaterThanFive = bar($a);
//a is 2

The second case where passing by reference is useful is when passing by value will have a negative impact on your application’s performance. In a typical case such as passing an integer to a function, the performance impact of making a copy of that value is negligible. What if, however, you need to pass an array of thousands of items to a function? If you needed to do this several times there would be a noticeable difference with regard to performance between passing by value and passing by reference. A reference to an object is always the same size (a 32-bit memory address for example). An object, however, may be very large, and making copies of it needlessly may not be the best idea.

Metadata

Throughout the life of a database there may come times when it needs to be updated to incorporate changes or new features. This may involve adding new attributes to existing entities; adding new columns to tables. The problem with this is that in a populated database, modifying the database schema can be very expensive with regard to performance. This is not something you want to do frequently on a live site. One method which not only makes your database more resilient to future change, but also improves modularity is the use of metadata.

You don’t have to look very hard to find multiple definitions of the term metadata, and as far as I am aware there is no universally accepted definition. The literal definition is “data about data” which is a bit vague. For our purposes, in the context of a database, we’ll think of metadata as data that supplements or further describes the data of another table. For example, if we have a table ‘users’ which has columns id, username, and password, metadata for that table might include first name and last name, though it would more likely contain data that does not pertain to every user all of the time.

Lets say that your site has a registration script that sends out an email to confirm that a user is human or that the email address is valid. Typically this involves sending a link with some sort of verification key encoded in the URL to the user. In order to verify the user you will need to store this key somehow. Having a verification key field in the users table is one option, but considering that the key can be discarded and never looked at again after the user is verified, this might not be the best solution. A better approach may be to store this in the metadata table for that user.

So how is our metadata structured and how does it interact with the data it describes? A metadata table has three essential fields: an id that refers to the data it is supplementing, a meta key, and a meta value. It may also incorporate its own meta id as a primary key, though it is uncommon to look up this data by its id. We’ll assume our metadata table corresponds to the ‘users’ table I mentioned earlier and as the following structure:

CREATE TABLE user_meta (
	meta_id INT NOT NULL AUTO_INCREMENT,
	user_id INT NOT NULL,
	meta_key TEXT,
	meta_value TEXT,
	PRIMARY KEY(meta_id)
);

Earlier as an example I mentioned verification keys as an example of metadata. If we chose to store this information in the user_meta table we would insert it like so:

INSERT INTO user_meta (user_id , meta_key , meta_value)
	VALUES(1 , verification_key , abc123);

The code assumes that the user has the id 1 and that user’s verification key is abc123. In a real world scenario the user id would be determined and the verification key generated prior to executing this query. As I pointed out earlier, it is rare to look up meta data based on the meta id. More often a lookup will look something like this:

SELECT meta_value FROM user_meta
	WHERE user_id = '1'
	AND meta_key = 'verification_key';

The purpose of having a separate primary key is to allow for the same user to have multiple values for the same key, which is appropriate in some instances; having the user id and meta_key as the primary key would prevent this.

You can probably see how this allows for you to make changes to your database without making expensive alterations to existing tables. If you have a new value that you want to be associated with users all you need to do is add that value for each user to the metadata. How though do we determine from the point of inception which data belongs in the users table and which data can be tossed into the metadata. Before we can answer this it is important to understand the trade offs between adding a value as metadata and adding a dedicated column to another table. Assume from this point that the tables are not yet populated.

You can expect over the lifetime of your database that each user will have multiple rows associated with him/her in the user_meta table. This being the case, the user_meta table will have many more rows than the users table. In addition to this, a lookup in the metadata table is typically done using a user id and a meta key, neither of which is a primary key. In the users table, however, we will be looking up data based on the user id frequently. A lookup based on a primary key is significantly faster than a lookup on non-indexed columns. Creating an index on the user_id and meta_key columns in the user_meta table would help to bridge the speed gap, but it would come at the expense of memory.

When you are designing your database you should consider how frequently you will need to access values. A username for instance might be looked up frequently, while a verification key may only be looked up once. Your goal should be to minimize the number of columns in your users table while at the same time reducing the number of items in the metadata that will be accessed frequently. You should also consider how sparsely populated a column will be. Every user, for example has a password. Going back the verification key example once again, only users who’s status is still pending will have a verification key. After they are verified the key can be discarded.

The use of metadata can improve the modularity and performance of your database, but only if used correctly.