Real World use of Database NULLs -- Part 2
Article: Real World use of Database NULLs -- Part 2
Database: MS SQL Server, MS access and MySql. Could also apply to Progress and Oracle.
Author: Robert Cunningham
Review: In part 1, the concept of a NULL is bad was discussed. There were a few examples where a having the possibility of a NULL would cause programmers and report developers to handle data differently. Simple concatenation of FirstName and LastName (SQL: select FirstName + ' ' + LastName) would cause undesirable outcomes if NULLs are allowed. Also discussed was updating existing databases to eliminate NULLs without changing existing programs or reports. In Part 2, we will look at not using NULLs in database design, performance and changing existing databases. The concept does go against common practices and I am sure database purist will disagree with the article. Again the concept is: Real world use don't use NULLs
Database Design In part 1, there was a very simple table called Name. There was a column called MiddleName that allowed NULLs. There is a way to get rid of the Name table and start over and eliminate the option for NULLs. But that method is complicated. But here is pseudo code to redesign the table. 1) remove any foreign key reference 2) rename table Name to Name_old 3) create new table Name. With MiddleName NOT NULL 4) If there was an auto number ID (NameID is not), turn off restrictions to allow keeping the old ID 5) insert all of the Name_old records into the newly created Name table 6) turn on auto number if needed 7 delete Name_old. As you read the above steps, you might think "why can't I just alter the table and column?" The reason is because you cannot turn a column into a NOT NULL. As you can see, changing an existing table can be complicated. Leave the table alone and just use these two commands:
| update Name Set MiddleName ='' where MiddleName is null alter table Name add constraint dfMiddleName default '' for MiddleName |
There are a few things to keep in mind why the above will work:
- The data type of MiddleName is varchar. With a varchar, even a ' ' (tic space tic) will be truncated and made into an empty string.
- If the data type of MiddleName were char, then there would be some other issues. A char data type stores spaces. If MiddleName was defined as char(25), the length would always be 25. Good database design would define variable length data as varchar.
- Having a default defined will force all further inserts or updates to not have a NULL.
- Having a default on a NOT NULL column can also help as a insert or update statement can exclude the column.
Fixing Data Relationship Using the Name table example from Part 1:
| create table Name ( NameID integer NOT NULL, LastName varchar(25) NOT NULL, FirstName varchar(25) NOT NULL, MiddleName varchar(25) NULL); |
We will add a new column called NameTypeID. And also define a reference or relationship to a new table called NameType. Create The New Table:
| Create table NameType ( NameTypeID integer NOT NULL, Description varchar(50) NOT NULL); alter table NameType add constraint PK_NameType primary key (NameTypeID); |
Insert Records Into NameType:
| Insert into NameType values (1, 'given name') Insert into NameType values (2, 'married name') |
Create New Column for Name Table:
| Alter table Name Add NameTypeID integer NOT NULL default 0 |
Please note that you cannot add a new column with the NOT NULL option without having a default defined. Define Relationship: If you added the relationship without adding a 0 record, you will get an error. Remember there are three records in the Name Table. Adding the NameTypeID column to the Name Table will set each existing record to zero. This code will insert a zero 0 record and then define the relationship between the NameType and Name Tables:
| Insert into NameType values (0, 'unknown') alter table Name add constraint fkNameType_Name foreign key(NameTypeID) references NameType (NameTypeID) |
Seeing the Data The Join is now simple because NULLs are not used:
| select Name.FirstName, Name.LastName, NameType.Description from Name inner join NameType on NameType.NameTypeID = Name.NameTypeID |
The query will return all names, plus the Description of the NameType. The description will all be "unknown" as none of the records have a name type defined. Here is what the query would look like if NULLs were used:
| select Name.FirstName, Name.LastName, Description=coalesce(NameType.Description,'') from Name right join NameType ON NameType.NameTypeID = Name.NameTypeID |
How to handle Zero To Many record NULLs In the above example with NameTypeID, we could handle the relationship as it is -- a "lookup" relationship. The relationship is still a One to Many, but for each Name record had a corresponding NameTypeID because the 0 record was added. Now let's look at another real world condition. Zero to Many record relationship. The difference is that the relationship doesn't not have to exist. Keeping with the simple Person type of data, let's add a new table called Address. A person can have zero or more address.
| create table Address ( AddressID int not null primary key AUTO_INCREMENT, NameID int not null, Street varchar(50) null, City varchar(50) null, State varchar(50) null, PostalCode varchar(50) null); |
Add one Address record for Fred Anderson:
| Insert into Address (NameID, Street, City, State, PostalCode) Values( 101, '939 Lincoln Street', 'Mercerville', 'NJ','8619') |
Knowing that the relationship between Name and Address is a Zero to Many, the keywords LEFT JOIN must be used because we want to return all Names and Addresses (if we have them).
| select Name.LastName, Address.Street, Address.City, Address.State, Address.PostalCode from Name left join Address on Address.NameID = Name.NameID; |
Executing the select statement will return all names and a single address for Fred Anderson. The Addresses do not exist for Sally and Matt so they are NULL.
| LastName | Street | City | State | PostalCode |
| Anderson | 939 Lincoln Street | Mercerville | NJ | 8619 |
| Johnson | NULL | NULL | NULL | NULL |
| Smith | NULL | NULL | NULL | NULL |
Now to eliminate NULLs, we must check for existence of NULLs. The following code uses a ANSI standard keyword coalesce. This code will make any NULL an empty string.
| select Name.LastName coalesce (Address.Street,'') as Street, coalesce (Address.City,'') as Address, coalesce (Address.State,'') as State, coalesce (Address.PostalCode,'') as PostalCode from Name left join Address on Address.NameID = Name.NameID |
You might be thinking, that is a lot of work to get rid of NULLs. In the real word, NULLs still exist. There are two arguments about where (location) you should handle the NULLs. 1) The first argument states that all presentation of data should be handled by the application. The application should handle NULLs, data formatting and etc. 2) The second argument is that the database should handle NULLs and formatting. As it is just data. 3) I would add a third argument. NULLs should be handled on the database side and formatting of data on the application side. Taking argument 3, how do you eliminate having to code or change code to get rid of NULLs? How do you consistently eliminate nulls without having to put code in many places (i.e. reporting, exports, user interface)? One solution without coding is to add the missing rows to the Address Table. Every Person has some type of address. So, they should have at least one record. We are talking about changing the relationship to be a one to many (not zero to many). This code will add all missing Address records. Keep in mind that more disk space will be used having a one to many relationship.
| insert into address(NameID, Street, City, State, PostalCode) select NameID, Street='',City='',State='',PostalCode='' from Name where not exists (select 1 from Address where Address.NameID = Name.NameID) |
After running the above code, the left join is no longer needed. Turn it into inner join. The performance would also be improved.
Conclusion
[End of Part 2]
- Existing queries and reports do not need to change
- There are ways to take existing databases and tables and eliminate the need for NULLs
- Adding Defaults to columns can eliminate NULLs
- Do keep in mind, that not all relationships can be change into a one to many.
- In the real word, NULLs still exist.