SQL Beginner's Tutorial: JOIN Me and Together We Will Rule the Universe

in sql-forbeginners •  7 years ago 

JOIN_Me_Meme.jpg

Hi all of my Jedi Knights and Sith Lords,

Today you will JOIN me in my adventure to teach you more SQL (with Attitude) to continue on towards our final goal of creating a game. Yes, again, you heard that right!! We are making a game! It's going to go very slowly, b/c (let's face it) I also have a day job and a life outside of Steemit. Maybe not much of a life outside of Steemit, but it's there and my wife appreciates that I pay attention to her.


[Us]

So, today's Lesson: Using the JOIN statement.


Using the JOIN Statement in Your Queries

Of the many statements and keywords that you will use in querying data, the JOIN is the one that will be the most confusing in actual use but the most effective in filtering unwanted data out of the results. The JOIN keyword essentially tells us to take a pair of tables and join them together. It's like if you had a pair of Excel-based spreadsheets and needed to include both worksheets as a single result-set. In Excel you'd have to find some common linkage or know what corresponds to what. Then you'd have to actually do some Excel magic (which I'm not going into any further as this is not an Excel Tutorial).

In SQL, the JOIN will effectively do this for you as long as you know what the Foreign Keys are between the tables. When I say Foreign Keys, I mean those data columns that will be the same in both tables. As a quick example, let's say I have school children and backpacks that I've set up in a database (don't ask why... maybe the teacher wants to store this information for some organizational reason). The child has been given a unique ID that they are set up under in a table called "SchoolChildren"; and the backpacks are given a name-tag that is attached to it... in the database, the "BackPack" table has a column called "backpack_id" which directly corresponds to the child's ID. The child's ID would be a Primary Key and backpack's "backpack_id" would be a Foreign Key. Let's say now that child "10101" has 5 backpacks (1 for homework, 1 for Show-and-Tell, 1 for Snacks, 1 for extra clothes, and 1 for toys... it could happen). How do you tie back all 5 backpacks? by giving each one the same "10101" value for its "backpack_id".


BackPack

Certain SQL-based systems can query to pull in all of the data regardless of commonalities; however, Microsoft SQL Server Management Studio (which is the interface I use) does not allow an OUTER JOIN which does this. What we can do, though, is pull in all of the data from 1 table regardless of commonality with the other (or vice versa). To do so, we would use the LEFT OUTER JOIN and RIGHT OUTER JOIN to specify which table to use as the base table.

As an example, I have mocked up those 2 tables and added some data to each:


[The 2 Tables separately]

...and now, if I perform a LEFT OUTER JOIN and a RIGHT OUTER JOIN against the 2 tables I can pull back all data from 1 table and any corresponding details from the other.


[LEFT OUTER JOIN (image, top results) & RIGHT OUTER JOIN (image, bottom results)]

The LEFT OUTER JOIN takes all data from the left side of the JOIN "conversation" and any pertinent details from the right side of it. To use the above examples, the one using the LEFT OUTER JOIN shows all data from the "SchoolChildren" table and only the common details in the "BackPack" table.

Conversely, the RIGHT OUTER JOIN takes all data from the right side of the JOIN "conversation" and any pertinent details from the left side of it. Again using the above examples, the one using the RIGHT OUTER JOIN shows all data from the "BackPack" table and only the common details in the "SchoolChildren" table.

Lastly, there is another common JOIN statement that only shows data if it's common to both tables... the INNER JOIN statement (as can be seen below). This statement tells the SQL interface to query and return results only if the details occur in both of the tables.


[The INNER JOIN Query]


Extending the JOIN Statements

It is accurate to say, then, that you can join together multiple tables in this way as long as there are common links between the details. I have personally had to deal with joining as many as 20 tables together (loose estimate) to pull in certain details that pertain to all items. This data is then used within a tool called Crystal Reports that allows the report to query data to populate information on the report. It is fundamental to my normal job.

Merging data in this way becomes more and more complicated by which JOIN you used as well as which field filters (the ON clause that you can see in the image examples) that you use. If you use the incorrect filters, the wrong data will populate or you will find yourself missing the data completely. I have done this many times over my years of querying data. This is why I like to have a second set of eyes on my queries at times... as the thought process goes, someone else might see something that you do not.


Venn Diagram of SQL JOINs

The ON Clause

Final thoughts for the day will explain the ON clause of a SQL query. The ON clause is almost like the WHERE clause that works to filter certain information based on the tables that are being joined together to merge into a single result set. This is where you can make your queries slightly less bulky.

Where you can use the WHERE clause after-the-fact to limit results, the ON clause tends to work before-the-fact and only pulls data that links to the tables in question. Additionally, the biggest positive is that you can link many different filtered rows to not show up with a single ON clause segment that might take many WHERE clause segments.

In effect, using the ON clause instead of the WHERE clause can speed up the data returns as it will start to filter out data at the get-go instead of waiting until all data is returned. To explain this a bit further, think of it in this processing way:

(assuming there are 4 tables joined with 20,000 rows in each, trying to filter down to 42 rows of corresponding data)

WHERE Clause:

  1. Select data.
  2. Pull all (80,000) rows.
  3. Filter against WHERE clause.
  4. Remove 39,958 rows of data.

ON Clause:

  1. Select data from first and second table.
  2. Remove rows based on the ON clause.
  3. Take remaining rows (let's say 1,255) and join to 3rd table.
  4. Join and remove rows based on the ON clause.
  5. Take the remaining rows (lets say 433) and join to 4th table.
  6. Join and remove rows based on the ON clause.

While the ON clause, visually, has more steps that are apparent (and I'm really condensing the process for brevity's sake), it is quite conceivable that using a WHERE clause to join 4 tables each with 20,000 rows might take a few minutes at a time, then another minute or more to remove the redundant data. The ON clause, conversely, would have 40,000 rows the first time (might take a couple minutes to return) but then the next processes take less than a minute total because it's not searching through the sheer volume of rows in the first 2 tables to return data.

It is very conceivable and logical to think that the WHERE clause could take double or triple the time. I don't have any real-world results and examples to show this in action, but having worked a while in the SQL world, I have had to determine the best way to pull data... otherwise built-in "Time Out" errors could cause the longer queries to fail. In the IT world and Data Analytics world, speed is almost as important as accuracy.


Hitting the Target


With that I'll leave you to review this information and try to really absorb the meat of what I'm saying.

See you next time.


Previous Posts:

sql-beginner-s-tutorial-writing-your-first-queries
sql-beginner-s-tutorial-writing-your-first-queries
sql-beginner-s-tutorial-select-data-to-use-in-the-game
sql-beginner-s-tutorial-manipulate-data-for-use-in-the-game

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Too technical for me.

That’s a good response. I like hearing true statements.

Did you read the previous lessons first? Because I would love for you to go back and read the other posts and then return and re-read this one to see if that helps.

If you can do that and give me an assessment from your perspective it would shape the future of my lessons.

Thanks for the honesty.

That was my mistake, i do have some database concepts so i did not start with the previous ones. i can handle things with the interface which is why the queries look complex without the GUI. after i posted the comment i went back and re-read another time i do get it. will take some practice to do some scenarios but with your examples they do make sense. great effort. I never did use the On clause directly before.

In MS SQL (if writing out your own SQL queries) you need an ON clause any time you use the JOIN statement.

Upvoted on behalf of the dropahead Curation Team!

Thanks for following the rules.

DISCLAIMER: dropahead Curation Team does not necessarily share opinions expressed in this article, but find author's effort and/or contribution deserves better reward and visibility.

Help us giving you bigger upvotes by:

Upvote this comment!
Upvote the latest dropahead Daily Report!
Join the dropahead Curation Trail
to maximize your curation rewards!
Vote dropahead Witness with SteemConnect
Proxy vote dropahead Witness
with SteemConnect
Delegate/donate STEEM POWER to @dropahead
Do the above and we'll have more STEEM POWER to give YOU bigger rewards next time!

News from dropahead: Bye bye 25+ and 50+! Welcome 20+ 40+ and 60+!

Quality review from the dropahead Curation Team

According to our quality standards(1), your publication has reached an score of 98%.

Congratulations for your excellent work!


(1) dropahead Witness' quality standards:

- Graphic relation to the text (Choice of images according to the text)
- Organization and coherence
- Personal sense of the text (Logic, complexity, understanding, what makes it interesting for the reader)

@dbzfan4awhile, pls how can u fetch the name of students that has the same username and password from a database.

Are you wanting to know the number of unique rows that contain the same username and passwords?

@dbzfan4awhile, thanks for ur quick response!! I believe each students record in a database must have a row. But I want to know how to fetch the record of students that has the same username and password.

In reference to my SchoolChildren table this wouldn't happen because there's no actual username field, there's an id field which is a unique integer and first- and last-name fields.

If, by chance, you had multiple kids in the same class with the same first and last name and you were trying to find out how many "multiples" you have, you could do the following:

SELECT DISTINCT COUNT() AS child_duplicate_count, first_name, last_name
FROM schoolchildren
GROUP BY first_name, last_name
HAVING COUNT(
) > 1

You will notice that there are components of this query that I have not went over yet in my lessons, so this is like you're reading ahead in your textbook.

Thanks to this question, I'll be trying to put that in my next tutorial.

Wao, this is great!! I really admire you. If I have any questions will let you know. Keep it up @dbzfan4awhile

Sounds good.

interesting news @dbzfan4awhile

Thanks I hope you find value in these lessons!

Very useful post..tnx for resteem

Glad you enjoyed... stay tuned for more lessons!

SELECT *
FROM girls
WHERE boobs IS NOT NULL
AND age >=18
ORDER BY age

Just joking.

You obviously know some SQL because I haven’t covered IS NOT NULL yet. What do you think of the lessons thus far?

Well.
A bit overcomplicated for beginners. i think.

Even the early lessons? Not sure I could make it any less novice than I did... I'm sure it could be done, but not sure I can make it any easier to understand.

Or are you commenting on this Post or the comment you left?

From my experience the best way to explain JOIN is to show SELECT * WHERE field IN (SELECT) and then rewrite the query as join.

What you should never do is to touch VLOOKUP excel function as it works substantially differently.

Few simple examples with tables 2-5 rows large filled with numbers helps a lot also.

Cheers, and please continue.

I hadn't thought of equating to the IN statement. That's a good idea.

Moreover, it's a bit strange that you teach ppl outer joins without them knowing what NULL is.

I did skip that and I'll be trying to tie that in on the next lesson... I know it's a bit of backtracking, but I didn't think of it until it came up in these comments.

Learn some new from your post. Love to read it.

Truly glad you are finding value in it. Any suggestions to make it even better?

Beautifull post dbzfan4awhile ...you are great man...

I’m content to be just above average lol.

Thank you very much !!! Thank you again and again for sharing this post because you did not just share this post. This post has been written in a beautiful manner by writing everything well.

At my core I am a writer as you could see from some of my earlier posts. I like to think it makes understanding easier, less confusing, and the learning curve far gentler. Besides, I’m hoping it doesn’t bore my readers to inaction.

Thank you very much

You are quite welcome. Do you have any ideas on how you would use this information in your personal life?

useful İnformation. Thank you )

I love to hear that my work is useful. Have you put the lessons to use? and if so, what have you used it for?

You perfect ))

Thanks. There's not much more I can say to someone calling me perfect.

Great Post Dear @dbzfan4awhile

Thanks. I appreciate all of my readers/students. Is there anything in particular that you would suggest to add to these lessons?