Metadata
Throughout the life of a database there may come times when it needs to be updated to incorporate changes or new features. This may involve adding new attributes to existing entities; adding new columns to tables. The problem with this is that in a populated database, modifying the database schema can be very expensive with regard to performance. This is not something you want to do frequently on a live site. One method which not only makes your database more resilient to future change, but also improves modularity is the use of metadata.
You don’t have to look very hard to find multiple definitions of the term metadata, and as far as I am aware there is no universally accepted definition. The literal definition is “data about data” which is a bit vague. For our purposes, in the context of a database, we’ll think of metadata as data that supplements or further describes the data of another table. For example, if we have a table ‘users’ which has columns id, username, and password, metadata for that table might include first name and last name, though it would more likely contain data that does not pertain to every user all of the time.
Lets say that your site has a registration script that sends out an email to confirm that a user is human or that the email address is valid. Typically this involves sending a link with some sort of verification key encoded in the URL to the user. In order to verify the user you will need to store this key somehow. Having a verification key field in the users table is one option, but considering that the key can be discarded and never looked at again after the user is verified, this might not be the best solution. A better approach may be to store this in the metadata table for that user.
So how is our metadata structured and how does it interact with the data it describes? A metadata table has three essential fields: an id that refers to the data it is supplementing, a meta key, and a meta value. It may also incorporate its own meta id as a primary key, though it is uncommon to look up this data by its id. We’ll assume our metadata table corresponds to the ‘users’ table I mentioned earlier and as the following structure:
CREATE TABLE user_meta ( meta_id INT NOT NULL AUTO_INCREMENT, user_id INT NOT NULL, meta_key TEXT, meta_value TEXT, PRIMARY KEY(meta_id) );
Earlier as an example I mentioned verification keys as an example of metadata. If we chose to store this information in the user_meta table we would insert it like so:
INSERT INTO user_meta (user_id , meta_key , meta_value) VALUES(1 , verification_key , abc123);
The code assumes that the user has the id 1 and that user’s verification key is abc123. In a real world scenario the user id would be determined and the verification key generated prior to executing this query. As I pointed out earlier, it is rare to look up meta data based on the meta id. More often a lookup will look something like this:
SELECT meta_value FROM user_meta WHERE user_id = '1' AND meta_key = 'verification_key';
The purpose of having a separate primary key is to allow for the same user to have multiple values for the same key, which is appropriate in some instances; having the user id and meta_key as the primary key would prevent this.
You can probably see how this allows for you to make changes to your database without making expensive alterations to existing tables. If you have a new value that you want to be associated with users all you need to do is add that value for each user to the metadata. How though do we determine from the point of inception which data belongs in the users table and which data can be tossed into the metadata. Before we can answer this it is important to understand the trade offs between adding a value as metadata and adding a dedicated column to another table. Assume from this point that the tables are not yet populated.
You can expect over the lifetime of your database that each user will have multiple rows associated with him/her in the user_meta table. This being the case, the user_meta table will have many more rows than the users table. In addition to this, a lookup in the metadata table is typically done using a user id and a meta key, neither of which is a primary key. In the users table, however, we will be looking up data based on the user id frequently. A lookup based on a primary key is significantly faster than a lookup on non-indexed columns. Creating an index on the user_id and meta_key columns in the user_meta table would help to bridge the speed gap, but it would come at the expense of memory.
When you are designing your database you should consider how frequently you will need to access values. A username for instance might be looked up frequently, while a verification key may only be looked up once. Your goal should be to minimize the number of columns in your users table while at the same time reducing the number of items in the metadata that will be accessed frequently. You should also consider how sparsely populated a column will be. Every user, for example has a password. Going back the verification key example once again, only users who’s status is still pending will have a verification key. After they are verified the key can be discarded.
The use of metadata can improve the modularity and performance of your database, but only if used correctly.
Related posts: