JOINs, and some VIEWs, in MySQL

warning

This post is more than 5 years old. While math doesn't age, code and operating systems do. Please use the code/ideas with caution and expect some issues due to the age of the content. I am keeping these posts up for archival purposes because I still find them useful for reference, even when they are out of date!

This is the third in a series of posts on MySQL, starting with Installing MySQL on Ubuntu 14.04 and Employees database for MySQL, setup and simple queries . The second post also covered simple queries using SELECT, LIMIT, ORDER BY, etc. If you are starting out, you should start with these posts. In this post I will go over multi-table queries, using JOINs (with some help from VIEWs). As you might expect, I'll use the employees sample database , setup in the previous post. If that sounds like fun, or at least useful, follow along.

Let's get started... first I use our user account to logon:

$ mysql -u username -p

then, I switch to the employees database -- if you don't have these setup look at my previous posts on these topics (links above):

mysql> USE employees;

Let's take a look at the tables again to remember what's in the database:

mysql> SHOW TABLES;
+---------------------+
| Tables_in_employees |
+---------------------+
| departments |
| dept_emp |
| dept_manager |
| employees |
| salaries |
| titles |
+---------------------+
6 rows in set (0.00 sec)

I'll focus on employee salaries, so let's look at these tables, as well as the number of entries:

mysql> DESCRIBE employees;
+------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
mysql> SELECT COUNT(emp_no) AS NumEmployees FROM employees;
+--------------+
| NumEmployees |
+--------------+
| 300024 |
+--------------+
1 row in set (0.07 sec)
mysql> DESCRIBE salaries;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| salary | int(11) | NO | | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | NO | | NULL | |
+-----------+---------+------+-----+---------+-------+
4 rows in set (0.01 sec)
mysql> SELECT COUNT(emp_no) AS NumSalaries FROM salaries;
+-------------+
| NumSalaries |
+-------------+
| 2844047 |
+-------------+
1 row in set (0.66 sec)

Hmm, many more entries in the salaries table. This table must include a complete history with all salary levels. We can use the DISTINCT command to get a count of the number of unique employees in the salaries table:

mysql> SELECT COUNT(DISTINCT emp_no) AS NumSalaries FROM salaries;
+-------------+
| NumSalaries |
+-------------+
| 300024 |
+-------------+
1 row in set (0.67 sec)

Good, that matches with the number of employees, as expected. However, when I do later queries I don't want to deal with multiple salaries for each employee, instead let's use the maximum value available. To do this I will create a VIEW -- a virtual table made from a query that we can use for later queries.

A detour on VIEWs

A VIEW can be created with a query as follows:

mysql> CREATE VIEW salaries_max AS
-> SELECT emp_no, MAX(salary) AS salary
-> FROM salaries
-> GROUP BY emp_no;
Now, if I look at the tables in the database I see the new VIEW, salaries_max, created above:
mysql> SHOW TABLES;
+---------------------+
| Tables_in_employees |
+---------------------+
| departments |
| dept_emp |
| dept_manager |
| employees |
| salaries |
| salaries_max |
| titles |
+---------------------+
7 rows in set (0.00 sec)

Back to JOINS

Let's try to get the names and genders of the top-10 paid employees. To do this I need to pull from two tables. This requires a JOIN. In this case, the query is:

mysql> SELECT msa.salary, em.emp_no, em.first_name, em.last_name, em.gender
-> FROM salaries_max AS msa
-> JOIN employees AS em
-> ON msa.emp_no = em.emp_no
-> ORDER BY msa.salary DESC
-> LIMIT 10;
+--------+--------+------------+-----------+--------+
| salary | emp_no | first_name | last_name | gender |
+--------+--------+------------+-----------+--------+
| 158220 | 43624 | Tokuyasu | Pesch | M |
| 156286 | 254466 | Honesty | Mukaidono | M |
| 155709 | 47978 | Xiahua | Whitcomb | M |
| 155513 | 253939 | Sanjai | Luders | M |
| 155377 | 109334 | Tsutomu | Alameldin | M |
| 154459 | 80823 | Willard | Baca | M |
| 154376 | 493158 | Lidong | Meriste | M |
| 153715 | 205000 | Charmane | Griswold | M |
| 152710 | 266526 | Weijing | Chenoweth | F |
| 152687 | 237542 | Weicheng | Hatcliff | F |
+--------+--------+------------+-----------+--------+
10 rows in set (0.92 sec)

Let's point out the essential features of this query:

  • In the SELECT portion of the query I must specify the source tables, here I use msa and em-- these are aliases defined later in the query.
  • The FROM portion of the query uses AS to alias (give short names) to the source tables and specify the JOIN -- in this case I use an INNER JOIN. I'll go over the different type below.
  • The ON statement specifies the field(s) used to JOIN the tables. In this case I use the unique employee number emp_no.
  • The ORDER BY and LIMIT serve the same purpose as we've seen with single-table queries in previous posts-- to sort and limit the number of records returned.

So, that's the basics of a JOIN. Note: If I had not constructed our new VIEW, I would get the following (notice that I use the salaries table instead of salaries_max view):

mysql> SELECT sa.salary, em.emp_no, em.first_name, em.last_name, em.gender
-> FROM salaries AS sa
-> JOIN employees AS em
-> ON sa.emp_no = em.emp_no
-> ORDER BY sa.salary DESC
-> LIMIT 10;
+--------+--------+------------+-----------+--------+
| salary | emp_no | first_name | last_name | gender |
+--------+--------+------------+-----------+--------+
| 158220 | 43624 | Tokuyasu | Pesch | M |
| 157821 | 43624 | Tokuyasu | Pesch | M |
| 156286 | 254466 | Honesty | Mukaidono | M |
| 155709 | 47978 | Xiahua | Whitcomb | M |
| 155513 | 253939 | Sanjai | Luders | M |
| 155377 | 109334 | Tsutomu | Alameldin | M |
| 155190 | 109334 | Tsutomu | Alameldin | M |
| 154888 | 109334 | Tsutomu | Alameldin | M |
| 154885 | 109334 | Tsutomu | Alameldin | M |
| 154459 | 80823 | Willard | Baca | M |
+--------+--------+------------+-----------+--------+
10 rows in set (2.05 sec)

which has the undesirable multiple entries for some employees.

JOINs: INNER, LEFT and RIGHT

Now that we have some sense of the spirit of the JOIN from the above example let's consider the types of JOINs available in MySQL. A useful set of reference pages are available at w3schools:

Take a look at those for more examples. Also, as I mentioned in previous posts, I found the new book Jump Start MySQL very clear and helpful.

First I will create two VIEWs, based on the employees database, that are small and will help show the difference between the types of JOINs:

mysql> CREATE VIEW small_salaries AS
-> SELECT *
-> FROM salaries_max
-> WHERE emp_no IN (254466, 47978, 253939);

and

mysql> CREATE VIEW small_employees AS
-> SELECT emp_no, first_name, gender
-> FROM employees
-> WHERE emp_no IN (254466, 47978, 237542);

After creating these VIEWs we can see what's in them with a simple SELECT

mysql> SELECT * FROM small_salaries;
+--------+--------+
| emp_no | salary |
+--------+--------+
| 47978 | 155709 |
| 253939 | 155513 |
| 254466 | 156286 |
+--------+--------+
3 rows in set (0.86 sec)
mysql> SELECT * FROM small_employees;
+--------+------------+--------+
| emp_no | first_name | gender |
+--------+------------+--------+
| 47978 | Xiahua | M |
| 237542 | Weicheng | F |
| 254466 | Honesty | M |
+--------+------------+--------+
3 rows in set (0.00 sec)

The import differences between these VIEWs are

  • small_employees does not have emp_no 253939
  • small_salaries does not have emp_no 237542.
This difference is by design and will allow us to see the result of different types of JOINs.

To help in visually understanding the JOIN I will use Venn diagrams that show the emp_no's in the two VIEWs:

Venn diagram for the two tables.

From this figure we can see that:

  • emp_no's 254466, 47978 and 253939 are all in small_salaries-- this is indicated by the fact that all these numbers are inside the blue circle (right).
  • Similarly, we can see that emp_no's 254466, 47978 and 237542 are in small_employees-- indicated by the fact that these number are inside the green circle (left).
  • Because emp_no's 254466 and 47978 are inside both circles, we know that these numbers appear in both VIEWs.
  • However, 237542 and 253939 appear in only one VIEW and this is visually reflected by the fact that the numbers appear in only one of the circles, green or blue, not both.
In the examples below, I'll highlight regions of the Venn diagram to indicate the emp_no's returned by each of the JOIN types.

JOIN or INNER JOIN

The JOIN is an INNER JOIN and returns rows that are in both TABLEs or VIEWs. So, using this command:

mysql> SELECT sem.*, ssa.*
-> FROM small_employees AS sem
-> JOIN small_salaries AS ssa
-> ON sem.emp_no = ssa.emp_no;

or, this one:

mysql> SELECT sem.*, ssa.*
-> FROM small_employees AS sem
-> INNER JOIN small_salaries AS ssa
-> ON sem.emp_no = ssa.emp_no;

will produce:

+--------+------------+--------+--------+--------+
| emp_no | first_name | gender | emp_no | salary |
+--------+------------+--------+--------+--------+
| 47978 | Xiahua | M | 47978 | 155709 |
| 254466 | Honesty | M | 254466 | 156286 |
+--------+------------+--------+--------+--------+
2 rows in set (1.04 sec)

In this example, only 47978 and 254466 are in both VIEWs so we get those rows. Notice that the first three columns are information from small_employees and the last two columns are from small_salaries. This ordering comes from the use of SELECT sem.*, ssa.* in both queries. I note this here because this ordering is important for understanding the other JOINs.

Finally, we visualize the INNER JOIN using a Venn diagram, as introduced above. Here the overlap of the circles reflects the INNER JOIN and shows emp_no's that are returned:

Venn diagram for the two tables with a JOIN or INNER JOIN.

LEFT OUTER JOIN

Next, we try the LEFT OUTER JOIN (or LEFT JOIN). Both:

mysql> SELECT sem.*, ssa.*
-> FROM small_employees AS sem
-> LEFT OUTER JOIN small_salaries AS ssa
-> ON sem.emp_no = ssa.emp_no;

and

mysql> SELECT sem.*, ssa.*
-> FROM small_employees AS sem
-> LEFT JOIN small_salaries AS ssa
-> ON sem.emp_no = ssa.emp_no;

produce

+--------+------------+--------+--------+--------+
| emp_no | first_name | gender | emp_no | salary |
+--------+------------+--------+--------+--------+
| 47978 | Xiahua | M | 47978 | 155709 |
| 237542 | Weicheng | F | NULL | NULL |
| 254466 | Honesty | M | 254466 | 156286 |
+--------+------------+--------+--------+--------+
3 rows in set (1.05 sec)

In the LEFT JOIN all rows in the first-- this makes it the left-- TABLE or VIEW are returned. In this example, small_employees is the first (left) VIEW, as defined in the FROM section of the query. So, all of its rows are returned. However, the emp_no 237542 does not appear in the second (right) VIEW small_salaries so it has no data. The NULL's appear in the fourth and fifth columns, corresponding to the small_salaries VIEW -- this is why the column ordering is important to remember.

Using the Venn diagram, we can show the results of the :code:`LEFT JOIN` as:

Venn diagram for the two tables with a LEFT JOIN or LEFT OUTER JOIN.

where the highlighted emp_no's are returned.

RIGHT OUTER JOIN

Finally, we try the RIGHT OUTER JOIN (or RIGHT JOIN). Again, both:

mysql> SELECT sem.*, ssa.*
-> FROM small_employees AS sem
-> RIGHT OUTER JOIN small_salaries AS ssa
-> ON sem.emp_no = ssa.emp_no;

and

mysql> SELECT sem.*, ssa.*
-> FROM small_employees AS sem
-> RIGHT JOIN small_salaries AS ssa
-> ON sem.emp_no = ssa.emp_no;

produce

+--------+------------+--------+--------+--------+
| emp_no | first_name | gender | emp_no | salary |
+--------+------------+--------+--------+--------+
| 47978 | Xiahua | M | 47978 | 155709 |
| NULL | NULL | NULL | 253939 | 155513 |
| 254466 | Honesty | M | 254466 | 156286 |
+--------+------------+--------+--------+--------+
3 rows in set (0.87 sec)

In this case the RIGHT JOIN returns all rows in the right (second) TABLE or VIEW. For our example the second (right) VIEW is small_salaries so the NULLs appear in the first three columns, corresponding to the small_employees VIEW. The Venn diagram nicely visualizes the RIGHT JOIN as:

Venn diagram for the two tables with a RIGHT JOIN or RIGHT OUTER JOIN.

Summing Up

So, that's it. Hopefully JOINs of all types make more sense and you found the Venn diagrams a useful tool for visualizing the results of the different JOIN types. As always, corrections, comments and questions are welcome below.