We hear the word database a lot these days, it is thrown around the office, shops, the news and I have even heard it slurred in a pub a few times. From your colleague who struggles to setup a printer on a laptop to your conspiracy guru mate who insists our details are all stored in one, we all know what it is, it is where we store data. "Simples" says the meerkat. Well yes and no, a database does indeed store data but so does a shopping list. However, a shopping list does not a database make. The simple definition for a database is: A collection of data, organised for ease of access.
Using the example of a food shopping, let us pretend that you own a local shop and have kept a good record of items sold for the last year in the form of till readouts. How would you go about finding out how many times you sold an apple? You would have to look through your copy of the receipts, one by one. No doubt this would be both tedious, impractical and possibly soul destroying.
So, you get a clever program that scans all the readouts into a spreadsheet and now it is a simple matter of filtering the records to find all the entries where the product name equals “apple”. Easy, right? Not quite, product names can change, so a filter on the product name will not capture both the new and old names unless you remember to add it to the filter criteria. This can result in over or under ordering stock, confusion in accounting etc. Now let us extrapolate this out a bit, what other things can change for a particular product? Manufacturer, weight, price, ingredients, labelling, logo etc. Now it is unlikely that as a small local shop will need to search for or record such information but you can imagine how complex and unwieldy a spreadsheet can become if you did record such information. Imagine a spreadsheet with thousands of columns where only certain columns are used for certain types of products containing tens if not hundreds of thousands of records.
This is where the database comes in, or what is referred to when people use the term database. What is actually meant here is a set of tables where the records in the table has a relationship with records in one or more other tables (if it helps think of a table as a spreadsheet). These table comprise what is called a relational database.
So your local shop has now done quite well and you have added home deliveries to your business. You have a customer, Mr Smith, who has a lot of cats (we all know someone like this) and every month an order comes in from Mr Smith for a heavy load of cat food.
Now imagine that you take another order over the phone from Mr Smith but forget to ask for the address and for some reason he is not answering your calls. Looking at your spreadsheet you might assume that he has moved because the last address is different from the previous ones, but can you be sure that this change is permanent? Perhaps he had it delivered to a friend’s house. Is it a work address? Is the last address a mistake by you or your staff? The point here is, you do not know.
Now take a look the example below:
Unless you are the only person working in your store there is a very high likelihood that there will be inconsistencies in the entry of the data. There is no way you can be certain that Mr J. Smith is the same as Mr. John Smith, it is the same item the name is similar but the address is different. Also look at the Date field. The format is different for every record, how would you search for all items sold in a particular month?
The point I am trying to make here is that relational databases, if configured and designed properly can eliminate these issues. And it does this with the concept of an Entity. An Entity is an object, person or thing. So in our example of a shop a particular customer would be an entity, as would a particular product. In a relational database all customers would be stored in a table and all products would be stored in another table. The name of a table usually reflects the entity it stores, customer details are stored in a table called “Customer” etc. Although this is not always the case, I have seen naming conventions that rely on a series of numbers and letters (for security or whatever reason).
Each table will have attributes/fields/columns (these can all be used, no need to get all purist here) associated with the entity to be stored in said table. So for the Customer table you may see columns such as:
FirstName, LastName, Gender, Age, DOB *Note* it is quite normal to not have spaces in column names.
And then each of these fields can have rules governing what can be stored in that field. For example Age will only accept a whole number (also called an integer). But the most important aspect of data entities is that each record is UNIQUE! I guarantee you there is always more than one John Smith (or equivalent) in any large town or city. This is where the dreaded “Customer Number” or “Account Number” those people on the phone ALWAYS ask for when you ring them up for help or to get something done comes from.
Next post (if there is any interest in this stuff at all): Entity attributes/fields/columns in greater detail.