SQL query practics focusing on Null, Count, Sum, and Avg

Annie Wang
6 min readOct 7, 2020
Icon made by Freepik from www.flaticon.com

SQL is the language for all database queries. It is easy to start due to its structural characteristics, but sometimes, to get correct results SQL queries, especially for the beginner of SQL user, practice is needed to understand the possible situations.

In this article, Null, Count, Sum, and Avg will be the focuses. The code is based on MySQL. Let’s start.

  • NULL

NULL is different from zero or a space. For example, for business, zero might mean that sales have attempted, but no success in selling the product, but null means there is no attempt. That’s a meaningful difference. Zero or space are values, the syntax is value=0 or ´´, but for null, the syntax is: IS NULL or IS NOT NULL.

Using the below codes to create a table:

CREATE TABLE Person (PersonId int, FirstName varchar(255), LastName varchar(255));
INSERT INTO Person (PersonId, LastName, FirstName) values ('1', 'Wang', 'Allen');
INSERT INTO Person (PersonId, LastName, FirstName) values ('2', '', 'Jan');
INSERT INTO Person (PersonId, LastName, FirstName) values ('3', 'NULL', 'Lisa');
INSERT INTO Person (PersonId, LastName, FirstName) values ('4', NULL, 'Mark');

The table looks like this:

Record 1: with full information, record 2: last name is empty, record 3: last name is text with NULL, record 4: last name is null.

How to get record 1 is direct, and let’s see how to get the different records for 2 to 4:

SELECT * FROMlt_175.person
WHERE LastName = '';

The record is

SELECT * FROMlt_175.person
WHERE LastName = ‘NULL’;
  • The record is
SELECT * FROMlt_175.person
WHERE LastName IS NULL;
  • COUNT

Use the below code to create a table and insert data:

CREATE TABLE count_demos (
id INT AUTO_INCREMENT PRIMARY KEY,
val INT
);
INSERT INTO count_demos(val)
VALUES(1),(1),(2),(2),(NULL),(3),(4),(NULL),(5);

Count is an aggregate function. It can be used to find the null or non-null data, not only numerical but also text:

The COUNT(*) returns the number of rows including duplicate, non-NULL and NULL rows.

The COUNT(expression) returns the number of rows that do not contain NULL values as the result of the expression.

The COUNT(DISTINCT expression) returns the number of distinct rows that do not contain NULL values as the result of the expression.

The return type of the COUNT() function is BIGINT. The COUNT() function returns 0 if there is no matching row found.

The demo database looks like this:

COUNT(*)

SELECT COUNT(*) FROM count_demos;

The result looks like this:

Using star(*) will count all of the records including null.

SELECT COUNT(*) FROM count_demos WHERE val = 2;

The result looks like:

COUNT(expression):

`SELECT COUNT(val) FROM count_demos;

The result looks like this:

In this case, we can see that given a specific column, the null will not be counted.

SELECT COUNT(*) FROM count_demos WHERE val IS NULL;

The result looks like this:

COUNT(DISTINCT expression)

SELECT COUNT(DISTINCT val) FROM count_demos;

The result looks like this:

With distinct, null will not be counted at the result.

  • SUM and mathmatical operator +:

the mathematical operator did not use aggregate, but SUM does. SUM can’t be used as a count with a star: it needs to specify the column.

Let’s create a database as below:

CREATE TABLE count_demos_2 (
id INT AUTO_INCREMENT PRIMARY KEY,
val_1 INT,
val_2 INT
);
INSERT INTO count_demos_2(val_1,val_2)
VALUES(1,1),(1,1),(2,1),(2,2),(2,NULL),(3,3),(4,4),(NULL,NULL),(5,5);

The table is:

SELECT val_1 + valf_2 AS total From count_demos_2;

The result looks like:

SELECT SUM(VAL_1) AS sum1 FROM count_demos_2:

The result looks like this:

SELECT SUM(val_2) as sum2 FROM count_demos_2;

The result looks like this:

SELECT SUM(val_1+val_2) as sum3 FROM count_demos_2;

The result looks like this:

Now you might have found the difference between sum(val_1+val_2) =35 and sum(val_1)+sum(val_2)=37, this difference comes when using operator for null: it can’t be recognized as null automatically, and how to overcome this problem:

using COALESCE to convert null to zero can help with it

SELECT SUM( COALESCE(val_1,0) + COALESCE(val_2,0) ) FROM count_demos_2;

The result looks like this:

  • Avg

Avg() function itself doesn’t include the null value.

Based on the table above from Sum, let’s check the average value:

SELECT avg(val_1) FROM coun_demos_2;
SELECT avg(val_2) FROM count_demos_2;
SELCT avg(val_1+val_2) FROM count_demos_2;

Normally, by definition, AVG(val_1) = SUM(val_1)/COUNT(*) but, because of the existence of null, if the question is to be in line with the definition AVG() should be changed as AVG(val_1)=SUM(val_1)/COUNT(val_1)

Let’s check the result if the question is to get the average of all id, including the null value:

SELECT SUM(val_1)/count(*) FROM count_demos_2;

The result looks like this:

If the question is to get the average without the id if the value is null:

SELECT sum(val_1)/count(val_1) FROM count_demos_2;

The result looks like this:

Same for val_2:

SELECT sum(val_2)/count(val_2) FROM count_demos_2;

The result looks like this:

But how about Avg(val_1+val_2)? As we already know, if using avg(val_1+val_2), it equals 5, which doesn’t include null+null order 2+null, the count is 7 which also excludes the null values.

SELECT sum(val_1+val_2)/count(val_2) FROM count_demos_2;

The result looks like this:

If the question is to get all of the values including null (transferred into zero) for all of the values including null:

SELECT SUM( COALESCE(val_1,0) + COALESCE(val_2,0) )/COUNT(*) from count_demos_2;

The result looks like this:

As you can see, including or excluding null value can change the value very differently, sometimes it is very hard to recognize it if we don’t understand the basic definition behind it.

Hope it helps directly and simply.

Unlisted

--

--