Posts tagged ‘Database’

Database Analysis Through Simulation

Making adjustments to a database schema after it has gone into use is a daunting task. Whether it be because of efficiency issues or the incorporation of a new feature, this is a situation you should avoid at all costs. Often times, however, mistakes and inefficiencies are difficult to spot at implementation time. Only when your database has become populated, often by users who are counting on your applications to be reliable, do these things come to light. So what can you do? One solution is to run a simulation. By this I mean systematically project how your database will look in the future when it has come into use.

Creating a simulation is not a difficult task, provided you have experience in virtually any programming language. All you need to do is write a program that simulates the growth of your site over time. The output would be a sequence of SQL commands (mostly inserts, maybe updates).

How it Works

For the purposes of this example lets assume that your website is some sort of forum. We’ll simplify things by limiting the actions that can be performed. Lets say that a user will be able to:

  • Register
  • Post a new thread
  • Comment on an existing thread
  • Send messages to another user

Start with a small initial population of users, as would be the case after your site was first launched. Now consider the upper limit of your simulation; how many users will your site have when the simulation completes. The simulation will run until your current population size exceeds your upper limit.

Since we want to take a systematic approach to this simulation, it should occur over intervals. An interval is an arbitrary period of time, over which, your population increases by a certain amount, we’ll call this the growth rate. The bulk of our simulation will occur in these intervals. During an interval, each user has a chance of performing one of the actions associated with our site. You must determine the chance of each action occuring in a realistic manner, which reference to your growth rate. Say you expect your site to grow by 5% a week. What is the probability that each user will post a new thread in that time period?

double population = 10.000;
double maxPopulation = 1000;
double growthRate = 1.05;

while(population <= maxPopulation)
{
     //simulation interval

    population *= growthRate;
}

In the example above, we start with a population size of 10 and we increase that population by 5 percent until it exceeds the max population of 1000. In each interval we must cycle over each user in the population, and determine if that user performs one of our actions, based on the probabilites we determined. We must also remember to generate new users based on how much our population increased. Now that we have a general frame for our simulation, we must generate our initial population and start generating data in our intervals.

List commandList = empty list;

List users = empty list;
List threads = empty list; //each thread is assumed to contain a list of comments
List messages = empty list;

double population = 10.000;
double maxPopulation = 1000;
double growthRate = 1.05;

double newThreadChance = .05; //for each user, there is a 2 percent
                                            //that they will post a new thread
double newCommentChance = .30;
double newMessageChance = .05;

//generates random numbers
Random r = new Random();

//generate our initial population
for(int i = 0; i < population; i++)
{
     User u = new User("username", "email", "other info");
     commandList.add(u.toSQL());
     users.add(u);
}

//begin intervals
while(population <= maxPopulation)
{
    //for each user
    for(User u : users)
    {
          if(newThreadChance <= r.nextDouble())
          {
               Thread t = new Thread(u, "title", "content");
               commandList.add(t.toSQL());
               threads.add(t);
          }

          if(newCommentChance <= r.nextDouble())
          {
               //select a random thread
               int index = r.nextInt(threads.size());
               Thread t = threads.get(index);

               Comment c = new Comment(t, u, "content");
               t.addComment(c);
               threads.set(index, t);

               commandList.add(c.toSQL());
          }

          if(newMessageChance <= r.nextDouble())
          {
               //select a random user
               int index = r.nextInt(users.size());
               User recipient = users.get(index);

               Message m = new Message(u, recipient, "subject", "content");
               messages.add(m);
               commandList.add(m.toSQL());
          }
     }

     //increase our population
     for(int i = population; i < population * growthRate; i++)
     {
           User u = new User("username", "email", "other info");
           users.add(u);

           commandList.add(u.toSQL());
      }

     population *= growthRate;
}

The above example is a completed simulation. When it is complete the list commandList will contain a complete list of all of the SQL insert commands, in order, for us to population our database with.

There are some parts of the simulation that I left for the reader to complete on their own. The details of implementing the user, message, thread, and comment objects have been left out. Notice that each of these entities contains a toSQL method. This will simplify the process of converting your objects to SQL. Also, you will have to dump the commandList to a text file so it can be run on your database. This is just one example of how to carry out a simulation. Obviously if you choose to use a non-object oriented approach your implementation will look different.

Once your database is populated you can then navigate your site as if it is teaming with users. This will not only allow you to rate the performance of your site, but allow you to see what it will look like once it has gone into use.