Introduction to SQL

  1. Introduction Introduction Hi, this is Jon Flanders with Pluralsight. In this course, I'm going to give you an introduction to SQL. So, Structured Query Language or SQL as it's better known as, or SQL, is a special-purpose programming language. That differentiates it from other languages like C, C++, JavaScript, or Java, which are all general-purpose programming languages. This means that SQL has a very particular purpose. And that purpose is to manipulate sets of data. Now typically, we manipulate those sets of data from what we call a relational database. Typically because there are other kinds of databases or other kinds of data sources that we can use SQL against. And even if we can't use SQL directly against these other data sources, most query languages today have some relationship to SQL. And generally speaking, once you've learned SQL, it's pretty easy to pick up other query languages. Now, SQL has a number of standards. It's both an ANSI and an ISO standard. And this is only really important because that means that each relational database vendor has to implement at least the standard so that you can know that if you learn the SQL standard, you can apply that to many different databases. Most databases have some additional features that aren't part of the standard and when you want to learn a particular database product, you'll have to pick up on those. But the basic standards-based SQL will always be the same. And so in this course, I'm going to stick to just teaching you the standards-based SQL. Now, I will pick a particular database product, it's a free one called MySQL, which I'll talk about how to set up and use in a later module, but I'm going to stick to just the standards. Now, we have a few terms that we should define before we continue on. One of those terms is a database. So, what's a database? A database is really just a name that we give a container that helps us to organize data in some logical way. Now, by organizing that data in a logical way, we basically have a much more efficient way of storing the data and retrieving the data. So, oftentimes in an enterprise, data will be stored in a spreadsheet, and there is some commonality between spreadsheets and databases. Spreadsheets contain rows and columns. Database tables, as we'll get into, also store data in rows and column. But we wouldn't create a database just for the data in one spreadsheet. Think of a database as containing the data that might be contained in say 500 spreadsheets. A database, you contain data that relates to a whole part of your business. So, you might have a finance database, an HR database, a commerce database. Whatever the logical split is for your particular business, you're going to have a database for each one of those things, and there's lots of other considerations that go into how to design and spread out data across the database, which I'm not going to get into in this particular course. The idea is we want to put data in a database in such a way that makes it the one source of truth and makes it easy to query data, to add new data, to delete old, unused data. That is the basic idea behind a database.

  2. The Relational Model Now, this idea of relational databases or the relational model, is really just a way to describe how we're going to store the data in the database and how those individual pieces of data are going to contain relationships to each other. So, the relational model has been around for a really long time. It's based on math. It's actually based on something called tuple relational calculus. And so, now you know that, tuple relational calculus. That's as much as I'm going to teach you about tuple relational calculus in this course, so please don't stop listening to the course. We're not going to get deep into the math here. But it is important to remember that there is a rationale, there's a reason, a logical reason for the way that relational databases are set up. So, in a relational database, we store data inside of something called a table. And a table is probably the closest construct to, let's say a spreadsheet that we have in a database. A database has tables. Each table has a name. And each column in that table also has a name. Now, each column is going to have restrictions, restrictions in terms of the size of the data, the type of the data, the format of the data. Columns can also be required or not required. So in this particular example, I'm showing you the email address column is clearly not required because we have some null data there. So, every row will contain at least all of the data for all of the columns that are required. Now, we want to get this data out at some point. So, the way we're going to get this data out is by asking questions of the database. So, a typical question might be something like, "What are all the contacts in my database "that have a last name that starts with the letter F? And what we're going to do is we're going to take those questions and turn them into SQL that we're going to be able to execute against the database. When we do that, this is known as querying. So, querying the database is asking questions of the database and hoping that the database will give us back an answer. Now, the database doesn't always give us back an answer. Sometimes that's because the data just doesn't exist. Sometimes that's because maybe the database wasn't set up in a way that allows us to ask that question. So, let's think about this example of Contacts and the example that I showed you where I've got a First Name column, a Last name column, and Email address column. And what if I want to ask the question, "What are all the email addresses that Jon has?" Well, obviously, I can only get back one. So, what if I have a contact that has multiple email addresses? What do I do then? Well, essentially, I'm stuck. There's nothing for me to do, which means that this particular database design is bad. It's not the kind of database design that you would want. Now, sort of instinctively, you might say, "Well, let's just add another column. "Let's just add an Email1 and an Email2 column." Well, that doesn't really help either because what if somebody has three email addresses or four email addresses? We don't want those kinds of limitations in our database design because that's going to limit the kinds of questions that we're going to be able to ask later. In fact, if you see a database design where column names have a number in it, please sit down and question that database design. That is generally speaking a red flag that something is wrong with that database design. Not always, but most of the time. So, the way of dealing with this problem is to do what we call database normalization. So, database normalization is a process that allows us to design a database in such a way that we can ask better questions later on. So, in my simple example, what we would do is we would normalize that data by taking the email data out of the contact table and creating a separate table that we could call email. It doesn't matter what the names of the database's tables are. What matters is the design. And so, this really illustrates this idea of relational because what we have is we have a relation between the person table or the contact table and the email table. And we're using keys, which are our way of matching up the data between these two tables to create a relationship. So now, the contact table has a relationship to the email table, and the email table has a relationship back to the person table. And this is going to allow us to ask better questions, like, "What are all of Jon's email addresses? "How many email addresses does each of my contacts have?" Now, there is a much deeper art and/or science, depending on your point of view, for normalization, which again is not part of this course. The main idea that you want to keep in mind is let's make sure that the database design that we're making allows me to ask the questions that I want to ask today and will work well enough to be able to answer the questions that I want to know tomorrow or in a week or in a month or a year. Now, that's not always possible. Again, as I said, database design or normalization is sort of more of an art than a science, but you want to keep these kinds of things in mind. And if you just keep in mind what are the questions that I want to ask, you're going to probably do pretty well. So, the point is database design will control what questions you can ask later. SQL is going to be the language that we're going to use to ask those questions.

  3. Summary So in summary, SQL is a powerful declarative language. You can grasp this language by understanding just a few basic concepts, and those are what I'm going to cover in the rest of this course.

  4. Understanding Basic SQL Syntax Introduction Hi, this is Jon Flanders with Pluralsight. In this module I'm going to talk to you about the basic SQL syntax. So, the basic syntax for executing commands against a database is known as a SQL Statement. And a SQL Statement is again, just an expression that tells the database what you want it to do. It's going to take that expression, it's going to parse it into its component parts and then it's going to execute that expression assuming of course that your expression is a valid SQL expression. So, let's see a SQL expression being built. First, here's select, first_name from person and what's really important is to have a semicolon at the end your command. Every valid SQL expression has a semicolon at the end. Now, let's look at these different pieces of this expression and sort of parse it the way that a database would. So, select. Select is what's known as a keyword. In fact, it's a particular kind of a keyword called a command. In this course and I think as a good practice I'm going to have all the SQL keywords in uppercase and all of the identifiers like first_name in lowercase. From is also a keyword and person is also an identifier, so select and from are the keywords, first_name and person are the identifiers. Select and from are part of the SQL specification. First_name and person refer to things inside of my database. Now, we can break this down in other ways as well. Select first_name is what we refer to as a select clause. So, every SQL statement can be broken down into individual tokens like keywords and identifiers and also individual clauses and so, we can look at in either way. In this case, the select clause tells the database what it is that we want. The from clause tells the database from where do we want the data that is expressed in the select clause. So, the select clause and the from clause go together to say give me the column first_name from the table named person. All of our SQL statements can be broken down in this way, even the more complex ones. In the more complex ones we also may have multiple clauses. We're not limited to necessarily two to clauses and two keywords and two identifiers. Again, this is just an example.

  5. Basic SQL Commands - SELECT So, I'm going to cover four basic SQL commands in this course. The first command is the select command. The select command is a command that allows you to get data. It allows you to retrieve data, one or more rows from one or more tables. If we have a select statement like this, where we say select first_name, last_name from contacts, we know that select is the command of the keyword, from is a keyword, first_name, last_name and contacts are identifiers. So, we take the statement to say please give us the first_name and last_name column from the table named contacts and this is what the result set would look like.

  6. Basic SQL Commands - INSERT Now, let's talk about the insert command. The insert command adds one or more rows into a table. Insert only works against a single table unlike select which can work against multiple tables. So, let's assume that we have the table as we described it on the last slide where we've got the contacts tables where there's an ID column, a first_name and a last_name, an insert statement might look something like this. Insert into is the actual command. We do refer to it as insert as shorthand. Then we specify the table name and the columns that we want to update. The columns go inside of the parentheses and then we specify a values clause. In the values clause we specify the values that we want to be put into the table. In this case, we're saying Fritz and Onion, first_name, last_name and so, this would be the table after we executed this command. Now, you may notice that I didn't specify the ID column but the ID column updated automatically. This is what's known as an auto incrementing column and it's very common in database design. We don't want our identifiers for each row to be something that is associated data with the person. Even in this case where we have people, maybe we even know their social security number which should be a unique number for each person, we don't want that to be the primary key, we want some basically generated or unrelated auto generated value for the primary key and that doesn't have to be specified when you do an insert statement.

  7. Basic SQL Commands - UPDATE Update is our next statement. Update modifies one or more rows in a table. So, let's imagine that I have this table where there's a row with the first_name of Jon and the last_name of Flanders. If I run this update statement, update contacts, so I'm telling the database I want to update the contacts table, I want to set the value of the last_name column to be equal to Ahern. Then I have a where clause. The where clause tells the update statement what is the restriction on this update. If I didn't specify any where clause with this update statement, that means that every single row in my contacts table would have its last_name value set to Ahern. In this case, that could be a bad thing. It is not always the case, however, that a update statement without a where clause is a bad thing. Sometimes it's a very useful thing. If you need to update all of the data in one or more columns inside of a table. But it's not very usual, it's much more usual to have a where clause where you're restricting that data. Once I execute that update statement, I'll have a row in that table where the first_name is Jon, the last_name is Ahern and notice the ID stayed the same, we're updating that particular row, we're not creating a whole new row.

  8. Basic SQL Commands - DELETE And the last of the basic SQL commands is delete. So, delete is there to remove one or more rows from one table. So, let's imagine here is my table contacts again. I've got my first_name and last_name and the auto incrementing ID. What I can do is say delete from contacts and so again, the delete from is the full real delete command, contacts is the table that I want to delete rows from and where ID equals two is my where clause or my restriction. Again, just like update, if I say delete from contacts, and don't specify any where clause, that's going to delete all the rows in my table and generally speaking, that's not what you want. So, having a where clause in this case again, is almost always going to be the case and so, that's going to restrict the delete to just those rows that match the expression in the where clause and in this case that's going to be where the ID is two and that's going to get rid of the row with Fritz in it. I'm going to go into these commands in more detail in later modules, so again, this is just a short introduction to the idea of what's in a SQL statement and what are the basic SQL statements and commands that we're going to cover in this course.

  9. Course Housekeeping Now, in terms of this course, I'm going to be using the database called MySQL. I'm using it because it's free, it's open source and it supports ANSI SQL, so ANSI SQL is what I'm covering in this course. If in the end you actually end up working on an Oracle database or a SQL Server database, you may see some different SQL, some vendor-specific SQL extensions. For this course again I'm sticking with the very basics that should work in any database. I'm going to put SQL keywords in uppercase. That will help to keep the SQL statement easy to read, so I can see which parts are SQL and which parts are going to be provided by me, that is which parts are going to be identifiers. Table names will be singular, so I'm going to have a person table, I'm going to have a phone number table, not a phone numbers table or a persons table. Now, this is a database design decision. People have different viewpoints on this particular point of design. My college database instructor said the name of the table should be about what each row is about, so each row in a person table is about a person. If, on the other hand, each row in a particular table was really referring to multiple things, multiple entities which is not very usual but does happen sometimes, then in this particular kind of database design, having a pluralized table name would be okay but just to be clear, I'm going to use singular table names throughout the rest of this course. I'm never going to repeat column names inside of a database. So, most column names will be prefixed by the name of the table like person_first_name or person_last_name and that is just another piece of database design that I follow. Hopefully it will make the expressions and the statements clearer to read and you may or may not follow that. If you go forward and create a bunch of databases, you may find databases where that particular restriction that I am implementing isn't going to be found. That's okay. People who design databases have different viewpoints on database design. I'm just specifying up front, hey, this is what I'm going to do and so, you should see that be consistent throughout the rest of the course.

  10. Summary So, understanding the basics of SQL, the point of it is to be able to ask questions. We have data and we have questions that we would like to know about that data and so, using SQL, using SQL properly will help us to be able to answer those questions.

  11. Querying Data with the SELECT Statement Introduction Hi, this is Jon Flanders of Pluralsight. In this module, I'm going to talk to you about querying data using the SELECT statement. I think I've said this a number of times already in the course if you've listened to the earlier modules, but I'm going to say it again. A SELECT statement is really a question. It is the translation of a question you have about your data into correct, valid SQL, into an SQL statement. So what are some of the types of questions that we might ask? We might ask, who are all my contacts? Show me a list of all of my contacts. More likely, we're going to want to do some sort of restriction. A restriction could be, who are all my contacts with a first name of Jon? That would allow you to reduce the set of overall contacts. Rather than having to look through and finding the name of Jon, you can just find who are all those contacts. These are the things that we're used to doing, for example, in your contacts database on your phone, right? You can scroll through all of your contacts or you can put in a search term at the top. You can do this against an SQL database as well. In fact, it's more than likely that most contact databases on your phone or your desktop are built on top of a database and are executing some sort of SQL. There are other types of SQL statements as well, like, how many contacts do I have? That's another important, potentially useful piece of data. It doesn't have to do with the actual rows of data, the values in each row for each column, but it has to do with what kind of overall sort of roll-ups or aggregations can I find out about my data. Those are also the kinds of questions that you might ask. So let's start with a simple query. The simplest query is going to have the SELECT keyword. And then a SELECT list. Now, in this case, the SELECT list is the string Hello and the string World. Now, this may look like a very odd kind of query. We're not actually getting any data from a database in this case. What we're doing is we're using the database query to return to us a row with two columns. The value in column one will be Hello and the value in column two will be World. Now, again, the SELECT list is part of the SELECT clause. So the SELECT clause is what we need to have a valid SELECT statement, and that is made up of the SELECT keyword and a SELECT list. Now, as I said, you may be confused by, why would I ever want to return a string. It is actually very useful to be able to return constant values in some cases. If, for example, there are values that you need in an output, let's say for a spreadsheet or for another program, that those values are not actually in the data that you are querying. you can sort interject those items. That's sort of a more advanced kind of thing. But it is something that is useful and it's good to know that the most basic SELECT statement you can execute doesn't actually involve going against any data at all.

  12. Demo - SELECT Okay, I'm in SQL Workbench and I have created my database, the contacts database. And I'm in an open new SQL tab for executing queries. So I'm going to start typing in some SQL. So I'm going to say SELECT, Hello, comma, World. And I'm going to hit this button to execute that. Here, right below the SQL, is the Results Grid. In the Results Grid, you can see that my result is Hello and World. Okay. So this is the super, most basic SQL query ever. Now, one thing I want to show you here is that if you want to execute just part of a file that you have open inside of MySQL Workbench, you can select, highlight those characters that you want to execute and then press the Execute button. Now, notice that I get an error message. It says, "You have an error in your SQL syntax. "Check the manual that corresponds to your MySQL Server "to make sure that you're doing everything correct." That's what it's basically saying. And so SELECT by itself is not a valid command, right? We need to select something, we need a SELECT list.

  13. The SELECT List So the SELECT list, in contrast to the last slide, is most of the time, vast majority of time, going to contain names of columns from the table or tables that you would like to get data from. Now, once you have a list of columns, then you're required to have a FROM clause. In the last slide, we didn't need a FROM clause because there were no column names. Once you have column names, the table name is required in the FROM clause. Now, in a SELECT list, after every column of data, you're going to need a comma. So you separate each column that you'd like with a comma, except, no comma after the last column. When you go from the SELECT list to the FROM clause, there's just a space, right? But, as always, we have the semicolon at the end of every valid SQL statement. Now let's look at a SELECT statement visually. We're going to have the SELECT keyword, column name followed by a comma, column name, and since this is going to be the last column name, no comma, FROM. Because now we have column names, we're required to have a FROM clause that specifies the name of the table that the columns come from. So if we translate this into a real statement, we're going to have a SELECT statement, a column name, a column name, FROM, the keyword, and then the table name, like person. And then, of course, the semicolon at the end. So, again, column names, table names, these are identifiers. SELECT and FROM are keywords. Putting these two together will allow you, as in this case, to have a valid SQL statement. In this case, one that will return every row in the person table. And it will return just the first name and the last name columns.

  14. SELECT List Wildcard (*) Now, sometimes when you look at a table and you're going to write a SELECT statement, you might think, "Wow, there's a lot of column names "that I'm going to have to write down." And there's is a way to not have to write down every column that you want to get. That's by using the asterisk or the star. This is the wildcard SELECT list. If I say SELECT star FROM table name, that's going to give me all of the columns. And that might seem like a really good thing. This is not a good thing. Using star or the asterisk in a SELECT statement is considered a very bad practice. The reason it's a bad practice is if the application that's getting back your data has to look at the different columns, what happens if the database updates those columns or adds a column? Well, now the database application could be caught in a weird situation. It's much better to be explicit about the names of the columns in your SELECT list. It will benefit you greatly even if you maybe don't understand now completely why, if you're new to SQL. But just to warn you, if you submit SQL statements with star in it to any sort of code review or maybe you have to show it to the database administrator before the database administrator will let you run that query, nobody's going to like this. Everybody is going to say, "Hey, could you use column names instead?" Okay?

  15. The FROM Clause So the FROM clause is going to at least define the table that you want to query. It is possible to query multiple tables, as I've said. We're going to cover that later in the course however. For right now, we're going to stick with a single table name in the FROM clause. Now, a good practice is instead of saying, "First_name, last_name, FROM, person, semicolon," it's a good idea to qualify every column name in your SELECT list with a table name. So if we were going to do that in this case, first_name would turn into a person.first_name. Person. is the qualification for the column first_name where person is the table and then dot indicates the relationship. The column is a child of person. Same thing for last_name. And now we can say FROM person. Add the semicolon, okay? So this is a good practice, right? It's a good practice to always table qualify the names of your columns. Some databases will run this query faster when you have qualified the name of the column. But, certainly, for one thing, I think it makes the SQL statement much easier to read. The SELECT list is much clearer. And, again, in this simple case, it may not seem like that's such a big deal, but as we get into more complex queries, as we get into multi-table queries, always qualifying the column names helps you the be able to disambiguate which columns are from which table. So, again, good practice. You should do this. Now, you may be already thinking to yourself or have thought to yourself, "Wow, that's going to be a lot of typing." Well, there's a couple of reasons why it doesn't end up ever being a lot of typing. One reason is that most tools, like SQL Workbench for MySQL, or the tools that help to connect to Oracle, or the tools that help to connect to SQL Server, almost always have some sort of statement completion that allows you to only type in like the first two characters of a table name and then it will fill the rest out for you automatically. Even if you don't have that capability, there is a way to make this less painful than typing our person., person., person.first_name. So that's one, two, three, four, five, six, seven extra characters that you have to type in. What you can do is you can alias the table name. So you can say person p, where p is the alias that you're giving to the person table. As you can imagine, that allows you to say p.first_name, p.last_name. And then, of course, don't forget the semicolon at the end. This is the way that you can simplify that task of always qualifying the column names with the table names, is that even in this simple case like this, by aliasing the table name, you've saved yourself 14 characters of typing. And it actually is a little bit easier to read because in your mind you can just do a replacement of, "Okay, p is the person table, "so I can sort of read this as SELECT person.first_name comma person.last_name FROM person." And p is just this nice convenient alias that helps you to write less characters and keep your SQL statement less cluttered. But, still, qualify all of the columns in your query. Again, once you get into multi-table queries, this will become even more important. But even in this simple case, it does make a difference. And, again, best practice, right? Best practice is to not only qualify all the column names but to alias each of the table names. Again, helps you simplify the query, keeps it much cleaner and much easier to read.

  16. Demo - FROM Clause Okay, so I'm ready to do the next demo. Now, remember our first demo, our simple SELECT Hello and SELECT World. I'm going to go ahead and comment that out in this file. That's another neat thing that you can do with SQL files, is you comment out sections of it. Now, remember, I said that you can highlight a section of a SQL file and just execute that. And you can do that regardless of whether that section, that selected section, is inside of comments or not. So that's a good way to sort of have a file with multiple SQL statements if you want to be able to execute them but not always execute all of them at once. So the thing that we were talking about here is aliasing. Aliasing allows you to do things like this. I can say select Hello as first word, select World as second word. And now when I execute this, notice that first word and second word become the column names. So this is just like aliasing the column names in the SELECT list if those items were from a table, like SELECT p.person_first_name as FirstName FROM person p. So I'm going to highlight that portion of my file and execute. And, notice, it tells me it doesn't know where to find person. It doesn't know how to find that entity. And the reason is is because my particular open session here isn't tied to a particular database. So I'm going to go ahead and use this keyword. I'm going to say, "USE contacts." And putting the USE contacts at the top of my SQL statement means that all of the other statements after that will be in the context of that database. Another way to do that in SQL Workbench is to right-click on the database. And you can basically then say, "I want a new "SQL tab for that database." This is another way to do that. I'm going to go ahead and execute this again. And so you see here I get first names, right? I get Jon, Shannon, Fritz, and Jon. That means that there are two Jons inside of my person table. Now, you may note that in the slides, I'm not using person underscore, I'm just using first_name, last_name. A simple reason for that is that person_firstname_name doesn't fit the very well on the slide. Also, kind of keep you on your toes if you are following along with the slides and then you're using the exercise files. You're going to have to, basically, hand type the SQL because you can't copy and paste it out of the slides. So those are the two reasons that I'm using the longer column name inside of the actual files.

  17. How to Constrain the Result Set Now, all of the queries that we've done so far have returned all of the columns from the table. Although you will sometimes run such a query, it is fairly unusual. Most of time, what you want is some subset of the data. You want to constrain the rows that get returned to your data set from the SQL statement. And so there are really two ways to constrain the number of results. One way is to add a WHERE clause to your SELECT statement. The WHERE clause, we're going to cover in detail on the next module. So I'm going to cover the other way that we can constrain the number of results. And that's by using the DISTINCT qualifier.

  18. DISTINCT and NOT DISTINCT So not distinct is, what are all the first names of all the people I know? Here's our question. Here's our query. SELECT p.first_name FROM person p, semicolon. So this query, this SELECT statement, returns a complete set of all of the first names from the person table. So not distinct, right? All of the names. So let's say that that particular result returns Jon, Jon, Fritz, Shannon, Jason, Jason, Brian, Brian. This means I have two people in my contacts in that person table who have the first name of Jon, one Fritz, one Shannon, two Jasons, and two Brians. So a pretty, sort of typical distribution of names. If I said distinct, that is if I said, "What are all unique first names "of the people I know?" This is one way to reduce the result set. The DISTINCT keyword goes after the SELECT statement, before the SELECT list, and then the rest of the query is the same. So reading the query is SELECT DISTINCT person.first_name FROM person p, where p, we are aliasing to the table name person. This result set would be different. This result set would just be Jon, Fritz, Shannon, Jason, Brian. The fact that there are two Jons, two Jasons, and two Brians isn't of consequence to this particular question, and, thus, to this particular query. So the DISTINCT keyword will take all of the values that your result set would have returned and it will find all of those values that are distinct and only return you one row for each of those distinct values. So this is a way to reduce your result set, by saying, "I don't want all of the items from that table, "I just want all of the distinct items from that table." And this is sometimes a useful question to ask. Think about a sales table. What are all the unique countries that we have sales to? SELECT DISTINCT country name or whatever the way that the country is formulated inside of that database, would give you a break down of all of the countries that you've ever sold items to. And that can be a useful piece of business information. It may inform the business on, "Hey, we've got some sales over here in this country. "Maybe we need to beef up the sales force in that country." Again, these are just examples of the way that getting the distinct list of values can sometimes really be the question that you want to ask.

  19. Demo - Distinct Okay, so let's talk about distinct and not distinct. Not distinct is the default. And when I go ahead and execute this query again, notice I get first name Jon, Shannon, Fritz, and Jon. Why does it look that way? Well, if I go look at all of the data in the table, I'll go to the person table, and I can right-click on it and say Select Rows. It's going to show me all the data that's inside of the person table. And you notice that there are two people with the first name of Jon. So that explains why. Over here, I'm saying person_first_name, I'm getting Jon, Shannon, Fritz, Jon because If I look at the table, I've got Jon, Shannon, Fritz, Jon. And this does a SELECT *. Again, this is okay for testing, but make sure to stay away from SELECT * when you're actually doing SQL for a real project. So if I go back over here, if I want just the distinct person names, I'll just type in DISTINCT. And if I type in DISTINCT person_first_name, that's going to only give me three results because the name Jon appears twice. Now, you have to be careful with DISTINCT. Because if I say DISTINCT p.person_first_name and then p.person_last_name, and notice SQL Workbench brings up a list of all the columns, and I can just select the one I want. And that makes typing in longer column names much easier. I'm going to go ahead and execute this script. Notice now it gives me one, two, three, four rows again. And even though I have the DISTINCT qualifier here, it's giving me DISTINCT for the combination of columns in the SELECT list, not necessarily the column that is first, that is nearest to the DISTINCT. So the DISTINCT applies to all of the columns in the column list, not just the first one.

  20. Summary So to summarize, we query data with the SELECT command. The SELECT list defines the number of columns. The FROM clause defines the table that we want to use. And remember that DISTINCT helps you to constrain the results to unique values.

  21. Filtering Results with the WHERE Clause The WHERE Clause Hi, this is Jon Flanders with Pluralsight. In this module, I'm going to talk about filtering results using the where clause. So, the where clause starts as most SQL clauses do, with the keyword, and the keyword is where. Where is going to tell the query, tell the database how to constrain the result set. It's actually a fairly decent correlation to English. I want to know X, where, a certain condition is true. The where cause is going to come after the from clause and it's going to contain Boolean expressions. For those of you that have never been introduced to Boolean expressions, it's very simple. A Boolean expression is, is this expression true, or is this expression false? A simple expression could be something like, does two equal to two? Yes, two equals to two. Does two equal to three? No, two does not equal to three. So, two equals two would be true, two equals three would be false. So, the where clause is going to contain one or more Boolean expressions. The database will then filter the result set by returning only those rows where that expression evaluates to true. If the expression evaluates to true, the row will be part of the result set. If the expression evaluates to false for that particular row then that row will not be part of the result set. As we already know from earlier modules, the select last_name is the select clause. The from person p is the from clause, right? The select clause contains a select list, the from clause contains the table name. What if we wanted to constrain this using where? Let's take an example. What is the last name of all the people I know whose first name is Jon? So, let's add a where clause to this SQL statement. Now, the semi colon, of course, moves to the end of the SQL statement, and the where clause is made up of the where key word plus one or more Boolean expressions. In this case, the Boolean expression is first_name equals to Jon. We could read this as, as the box in the upper right says, give me all of the rows inside of the person table. For each row, give me just the last name, but filter the result by where each row that has a first name of Jon is part of the result set, and each row where a first name does not equal Jon is not part of the result set, okay? So, select p.last_name from person p where p.first_name equals to Jon. In this case, our expression contains a column, an operator, and the value that we would like that column to have in the case of this operator which is the equals operator. Notice in the where clause that we should continue to qualify the column names, and because we're using an alias, in this case p, we can use the alias to qualify the column names. Again, still a good practice even in the where clause.

  22. Demo - WHERE Clause So, select from person p where p.person_first_name is equal to Jon. I'm going to go back up here and say p.person_last_name. I'm going to go ahead and execute that, and you can see that the result I get is person_last_name, Flanders and Ahern. Again, if we go look at all of the data by doing a select star, you can see that the rows are Jon Flanders, Shannon Ahern, Fritz Onion, Jon Ahern. So, two rows return for this particular where clause.

  23. Boolean Operators The last slide, I used the equals operator, and that's one of the potential Boolean operators inside of SQL. We also have not equal to, greater than, less than, greater or equal to, less than or equal to. And these are operators that, I think, most people are familiar with from just simple math, right? Something greater than the other, something less than the other, something equals to the other. Probably the only difference between the way we use these in simple math and the way we use them in SQL is that these can also be applied to other kinds of values other than numeric values. We can say, does one string equal to another? Does one date equal to another? Is one date greater than another? These operators can be applied to different kinds of data, not just numerical data, as would be the case if we were doing simple math. Although you may end up using just a single expression in some of your queries, a single expression can be quite limiting. You need a way to be able to ask more complex questions. What we'd like to do is be able to put multiple expressions together. In order to do that, we're going to need some additional keywords. It isn't like a select list where we can just have multiple expressions separated by commas, so we need additional keywords that are going to allow us to chain together in unique ways multiple expressions. We're going to talk about those additional keywords.

  24. The AND Keyword The first keyword we're going to talk about is and, and the and keyword is what you'd expect from the English word and. It's going to combine two expressions. If both expressions are true, the row is included. The combined expression returns true. If either returns false, the row is excluded. Only if this and that are true will the row be part of the result set. Let's look at an example. Here's our question. Who are all the people in my contact list that have the first name Jon, and have a birthday later than 1965? So, we've got our typical query that we've seen before. Select first_name last_name from person, then we've got person alias to p, so we can fully qualify the column names. We've got the select clause and the from clause. Let's add the necessary where clause. Where first_name equals to Jon, and birthdate is greater than 12/31/1965. The and clause is going to combine those two expressions. If both of those expressions evaluate to true for a particular row, that row will be included in the result. If it's not, then that row will not be included in the result.

  25. Demo - AND Okay, let's expand this query by adding an and. I'm going to say where person_first_name equals to Jon and p. and I need to pick some other column. In the slides, I referred to birthdate. Birthdate is not a column in the actual exercises file, so we'll have to pick out some other kind of data. Let's go look at the person table again and look at all of the data and try to come up with something that might be interesting to do. So, how about people who have the first name of Jon who I've contacted more than five times? So, p.person_contacted_number is greater than five. If I go ahead and execute that, you can see that I get one person last name, Ahern, whereas before if I executed just the where without the and, I got two people. Again, both of these expressions have to evaluate to true in order for a row to be included in the result set. Again, if I look over here, it's fairly easy to see where person_first_name equals to Jon. That brings us to row one and row four. Person_contacted_number greater than five, row one gets thrown out of the set, and we end up with just row four, just the set that includes the Jon Ahern row.

  26. The OR Keyword Another one of these new keywords that we're going to talk about to combine expressions is or, and again, or does exactly what it sounds like based upon the English. This or that. It combines two expressions. If either expression is true, the row is included. If both are false, the row is excluded. One of the two expressions has to evaluate to true. But either can evaluate to true, and in those cases, the row will be included in the result set. Our example question, who are all the people in my contact list that have the first name of Jon or a last name of Flanders? So, we've got our select and our from. Let's add the where clause. Where first_name equals Jon or last_name equals to Flanders. Our where clause with the or allows us to add this more complex expression, this or that. First name equals Jon, or last name equals to Flanders. Any row that matches either of those two conditions will be part of the result set. If a row matches neither of those two conditions, it will not be included in the result set.

  27. Demo - OR Let's modify this query and change the and to an or, and let's see what happens. Notice that we get two rows. We get the Flanders last name and the Ahern last name, and again, if we look at the data, that makes sense. If we look at the query, we're saying where person_first_name equals to Jon or person_contacted_number is greater than five. For sure, we know that both of these rows will be included in the result set, Flanders and Ahern. This one is also included because it matches the other or condition, but of course, that doesn't really make clear the way the or works because that row would be returned just from the where clause. Let's change the person_contacted_number value from five down to greater than zero. Now, again, if we look at the table, we're going to see one, two, three, four rows. The where clause will execute and it will pick out row one and row four, because both of those rows will evaluate to true. We're using the or, so then the SQL engine is going to look at this column and it's going to say, "Yes, this row would be included," but it's already included. "No, this row won't match the other expression, "and this one will." We should get back Flanders, Ahern, and Onion. Let's see if that actually happens. Flanders, Onion, and Ahern. The row order is different. We can talk more about that in a later module. That illustrates the usage of the or operator.

  28. Other Boolean Operators Now, in addition to those two keywords that combined expressions, there are some other keywords which are essentially additional Boolean operators, things that aren't like equal to, like between, like, in, is, and is not. Let's talk about all these in turn.

  29. BETWEEN We're going to talk about each of these in turn, in the order that I presented them, so we're going to start with between. The between operator acts on a columns value and two additional values. It's sort of a simplified way of saying, you know, where column name is equal to or greater than this value and is column name equal to or less than the next value? It's essentially a shorthand way to provide that expression in a single expression rather than in two Boolean expressions separated by the and. It's true if the column value is between those two values. It is inclusive, so it includes the two values. Again, it's like greater than or equal to and less than or equal to, rather than less than and greater than. Again, the values are included in the evaluation. Let's pose a question of our data. Let's assume that along with the first_name and last_name, we're also keeping track of every time I contact one of my contacts. Every time I have some interaction with them, it is logged into that database. Maybe this is a database, let's say, for salespeople. Knowing how often you contact each of your contacts is an important thing for somebody to know. So, who are all the people in my contact list that I've contacted at least once, but no more than 20 times? In this query, we have our select clause and our from clause, just like before. Our where clause is a little bit different. We're using that between keyword. Where p.contacted, where p is equal to the person table, where the contacted column is between the values of one and 20. So, who are all the people in my contact list that I've contacted at least once, but no more than 20 times? Between one and 20 would answer that question with this where clause that contains that between keyword.

  30. Demo - BETWEEN Okay, let's look at the between operator. What I'm going to do here is I'm going to select all of this part of the query and copy it and then paste it down here and then I'm going to go ahead and comment out this earlier query so we can keep it for future reference, and now I'm going to add the where clause. I want to use a column name, and I want to be able to say two values, right? Where a column is between value one and value two, and so let's look at the person table, let's look at all the data again. I think person_contacted_number is the right column to use. Let's give us the person_first_name where person_contacted_number is between one and 20, just like the slide. So, where p.person_contacted_number between one and 20. I'm going to go ahead and save and execute this. Notice it gives me the same result as this query did earlier. I'll go ahead and execute this query as well. You can see both of those queries give me the same exact result. Different queries, same data. Again, let's look at the four rows. We can kind of see how the query engine would do this. It would go through and see this number, matches. That is, it is between one and 20. This number is not between one and 20, this number is between one and 20, and this number is between one and 20. So, row one, three and four are what are going to go into the result set, and we can see that result set here. Next, let's look at like.

  31. LIKE The way I would describe like is as a more fuzzy version of equals. Like has actually some special syntax that allows you to put wild cards into a string, and what that allows the database to do is, again, to do sort of a fuzzy matching. Rather than where the first name is equal to Jon, we could say where the first name is like J percent, and that string, J percent, where the percent sign is the special wild card, allows the database to, again, do sort of a fuzzy matching. Match where the first name begins with J is another way to say it in English. Again, just like any other of these operators, if the match is true, that row is returned as part of the result set. Once again, our select clause and our from clause are exactly the same. We're going to add a where clause. The where clause will say where p.first_name like J percent symbol, the percent symbol being this special syntax that SQL supports and it allows me to put that anywhere in the string. It could be in the middle, it could be at the end, it could be at the beginning. The SQL engine of your database will do this sort of fuzzy matching, looking for, essentially, in this case, where the first name has its first letter as J. The where clause and the like clause allow us to ask this more complex question.

  32. Demo - LIKE I'm going to do the same thing I did during the last demo. I'm going to do a little copy and pasting here. I'm going to copy that. Notice in the SQL Workbench editor, I can hide and show sections of the file that are commented out. Let's go ahead and comment out this one as well. Let's just get it out of our way. I'm going to go ahead and put in select p.person_last_name from person p where person_first_name, and the operator I want to use here is like. I'll go ahead and put this on the next row. Like, and here's where I'm going to put in a string, and I'm going to put in one of those wild card characters. In this case, the wild card character is the percent symbol. I'm going to put in like percent O percent, closed quote, semi colon, and let's think about this in the context of the data. We're querying on the first_name column, so this is the column and these are the rows that would match. Notice in my query, I put in a wild card before and after the O. That means it's going to look for any name that has the letter O in it anywhere. I should get back Flanders, Ahern, and then Ahern. Let's go ahead and see if that is the case. Flanders, Ahern, and Ahern. That's exactly what I expected. Now, if I change this and I just said like, and I got rid of the first wild card and I put in a J here, I could go ahead and execute that, and notice, that's going to give me those two last names, Flanders and Ahern. Again, we can easily look over at the data and see how that one works. It's looking for any first name where the first letter is J and then there are any other letters after that. The wild card feature in like can be used for these particular scenarios where you want to match something but you don't have the exact value to match, you just have one of the characters or two of the characters and it can become a very useful operator to use in those circumstances.

  33. IN In is the next in my list that I said I would cover. In is kind of like a multi-value operator. Let's look at an example of the in clause. Here's our question. Who are all the people in my contact list that are named Jon or Fritz? Of course, we could put this together with a multi expression where clause with two expressions, with an or. In allows us to simplify a more complex expression in the where clause that would require multiple equals. Instead of having where first_name is equal to Jon or first_name is equal to Fritz or first_name is equal to Brian, or you can imagine you want to get a certain number of states, right? Where state is equal to California, or state is equal to Florida, or state is equal to New York. Rather than having those kinds of expressions, we can now instead replace those with the in clause. And that in keyword in the where clause enables us to simplify that kind of query.

  34. Demo - IN I want to use the in operator, so let's type in select p.person last_name from person p where some column name is in between or includes value one, value two, et cetera, et cetera. This is basically the way that that query works. Let's think about this for a second. Let's go over and look at the data. We could say where the person_first_name is in the set of Jon and Fritz. Let's go ahead and say that where p.person_first_name in and the set that I want to match against will include Jon and Fritz. Let's go ahead and save this and execute, and we get three rows. Flanders, Onion, and Ahern. How did we get these three rows? Let's go back and look at all the data. We are querying on person_first_name, and we're saying give us all of the rows that have the first name in the set of Jon and Fritz. So, the SQL engine looks at the first row. This one is true, so it's going to be part of the result set. This one is false, not going to be part of the result set. This one is true, it will be part. And again, four will not be. Whether you're querying against four rows or 40,000 rows or 400,000 rows, this is exactly how the in operator works. Whatever is in the set of values inside of the parentheses, that's what's going to match, and the result set will be returned accordingly.

  35. IS Is. Is is a sort of outlier of all of the other keywords that we've been looking at, because is only works against null. It's like an equals operator, but just specially for null. Remember, database column values can be null. Let's look at an example of this kind of expression. Here is my imagined question. Who are all the people in my contact list that don't have a last name? Let's assume that I'm some sort of entertainment lawyer or agent and I've got a lot of people in my contacts database like Madonna, like Snoop Dogg. Think of your other famous celebrity that only has one name. Kanye. I want to have my select first_name last_name. I'm going to say from person p, and then the where clause is going to look like this where p last_name is null. Is is always accompanied by null. There is no other is keyword pairing with any other value. In this case, the database is going to look at each row. When it gets to a row where the last_name column is null, that is, there's no value, it's going to return that particular row. Actually, the first_name and last_name from that row, because this expression will have evaluated to true.

  36. IS NOT Is not is the corollary to is, also just for null. It's like not equals for null. Whereas is is equal for null, is not is not equal for null. What this tells us is that databases like to handle null values in a very explicit way, because null values are sort of very special, special values that need this sort of special handling. So, who are all the people in my contact list that have a last name? Select first_name last_name, select clause, from person p from clause, and then the where clause looks like this. Where p.last_name is not null. The where clause with the is not is the corollary, or the reverse of the where clause with is null. Is not null is the opposite of is null. Just like is null, the only thing you're ever going to see in a query in the where clause is is not null, is and is not, again, just to emphasize, only work with columns that have values that could be null and they only work with the null value. Again, they're like equals and not equals for null values.

  37. Demo - IS AND IS NOT I think it makes sense to look at is and is not together. Remember, is and is not are related. They're both related to columns that can be null, and null is this very special value inside of SQL. Again, select p.person_last_name, and you might wonder as I'm typing this, from person p, why, when I type p.person_last_name, does the column name not come up? The reason is, until I've actually aliased the table in my from clause, SQL Workbench doesn't know what columns to display. It has no idea what p stands for until I type in the phrase person p. One way to do this if you want to save a little bit of time in SQL Workbench is to go ahead and type in your select. Don't put any columns in your select list, and then put in your from, alias your table, and then go back to the select list and you'll get this nice sort of statement completion, column completion. Let's go ahead and execute this statement by itself. Remember, this will give us four columns, Flanders, Ahern, Onion, and Ahern, because there are four rows in the table. Now, let's go ahead and say, from person p where p. person_first_name is null. Let's think about this. If we look at the whole set, that is, all the rows that are in the table, none of the rows have a value of null inside of person_first_name. So, when I run this, I would expect to get no results. And as you see, I got zero rows returned. I can run this query on that column, but because none of the data matches, I'm getting back no data. I want to emphasize the special nature of null. Null and zero are not the same thing. If you're a programmer or if you've done any sort of programming and some math, null and zero are sort of considered the same thing, but if I switch my query here to say where person_contacted_number is null and execute, I get back zero rows. I think some people would think that this is a null value, but zero is not a null value. Zero is an actual integer value here. Null is a very, very special character and that only comes up when a column actually has a column where a value can be null, as well as actual values of null. If I look at email_address, for example, and select all the rows there, notice that I do have a row here inside of email_address where email_address_person_id is null. If I was going to write a query where is null actually worked, let's go ahead and comment out this one, I would write select from email_address e where e. Notice that I'm still not getting that statement completion. That's because, at this point, my SQL has a syntax error. If I'm going to do that little trick that I referred to a moment ago, I have to go back to the select list first. I can't put in the where clause before I put in the select list. Now if I say e.email_address_person_id, I can go back and add my where clause. Where e.email_address_person_id is null. Let's go ahead and run that query. So, I get back null. Let's actually pick a different column to return from here. Or, let's add another column, e.email_address, and let me run this query again. You can see, I get back the row. If we look at email_address, again, it's is null that is email_address_person_id is null. This one's not null, this one's not null, this one's not null, this one is. That row matches the result set. Again, if I want to do the opposite, I can say is not null. If I go ahead and run that query, you'll see that I get the other three rows. The other three rows match this set where the email_address_person_id is not null. That's is null and is not null.

  38. Summary In summary, remember that the where clause is there to allow us to restrict the result of our queries. The more complex our questions become, the more complex our where clause becomes, right? Our select list and our from stay the same. Our where clause is where we're going to add additional expressions to make the result set answer our questions more completely.

  39. Shaping Results with ORDER BY and GROUP BY Introduction Hi, this is Jon Flanders with Pluralsight. In this module, I'm going to be talking about shaping your results using the ORDER BY or GROUP BY clause. Up to now in the course, we've been looking at result sets that are in the order that the database stores them. That is whatever the database has in terms of row order, our result set is in that same row order. That's just using a simple SELECT statement, select from and even with the WHERE clause, you're reducing the result set, but you're not ordering it. What we want to talk about in this module is the main ways that we can order that result set.

  40. ORDER BY The main way that we order a result set is using ORDER BY. The ORDER BY clause as its name suggests allows us to sort the result set. We're going to put this after the WHERE clause. Now, if there is no WHERE clause that's okay, you can still have ORDER BY after the FROM clause. You're going to specify one or more columns and the database will then order your result set by each of those columns in turn, so you could specify ORDER BY age and by first name and then by last name, whatever the resulting result set that you need, whatever order you need that in, you can do that in your query. Each column will be separated by commas and the default ordering is ascending, you use the ASC token to specify that you want it to be ascending, but you can leave that off because again it's the default. If you want the order to be descending, you specify that at the end of the ORDER BY clause, you'd say ORDER BY age descending or DESC is the keyword. Here's our question, who are all the people in my contact list ordered by last name? We've got our SELECT last name first name, our from person P, again aliased and then, our ORDER BY clause ORDER BY p.last name, so that will give us a result set, which is all of the people in my person table, ordered by last name.

  41. Demo - ORDER BY Let's look at ORDER BY, so I'm going to say SELECT from person P and what I want to select is p.person_first_name, p.person_last_name. I want to add an ORDER BY clause, ORDER BY p.person_last_name. I'm going to go ahead and execute this query. You can see that the ORDER BY has changed the order of the result set from the natural order. I can emphasize this by just selecting the select list and the from and execute that again, notice the order that it comes back is Jon Flanders, Shannon Ahern, Fritz Onion, Jon Ahern because again remember if we look at all the rows that's the default row order. If we don't give any sort of ORDER BY clause at all, SQL will just return the order that those rows are in the table. If we want to change the order to sort them in some way for the output, very common thing to want to do, we have to add the ORDER BY clause.

  42. Set Functions Now, a commonly used function with ORDER BY is one of the set functions. Set functions compute new values from column values. We'll use the set function in place of one of the columns in the SELECT clause and we basically pass the column name that we want the function to be applied to into the function. This is going to help us ask more interesting questions and by using set and order, now we can start to get into things like how many contacts do I have, where I've talked to those contacts at least 10 times, how many widgets have I sold, order them by the amount sold, then the country. These are ways that you can start using ORDER BY in more sophisticated ways to give you more sophisticated answers to more sophisticated questions. As we'll see, it's often used with the distinct qualifier. Let's go over the set functions. The first set function is COUNT that gives you the count of the number of rows for a particular column. You're going to pass in the name of the column to the COUNT function. Now, you can also pass in the special asterisks character. If you pass in the asterisks character that just gives you the count of rows in that particular table. In that case, if a particular column has a null value that row will still be included in the count. If you pass in a column name to the COUNT function and a row has a null value, it will not be included in the count. The MAX function gives you the maximum value for a particular column, again you're going to pass the column name into this function. This does not include null values. The MIN function does not include null values as well that gives you the minimum value of a column. Average or AVG is the function name will give you the average of all the values of that column. That does not include null values and it only works on numeric columns, where MAX, MIN and COUNT will work on any column type. SUM is another very useful set function. It also only works on numeric columns. It also does not include null values because that would just be an extra zero and that gives you the sum total of all the values of a column. This is commonly used in things like give me the sum of sales with some other query that gives you the parameters for reducing the result set. An example of using a SET function could be something like, how many times if I contacted all of my contacts that is how many times have I emailed or called on the phone etc. That query would look something like SELECT SUM p.contacted_number, notice we're passing in the column name into the function from person P. This is a SELECT clause with a SUM set function being used.

  43. Demo - Set Functions Let's look at SET functions and remember the SET functions we have are COUNT, MAX, MIN, average and SUM. Now, these are the standard ones from ANSI SQL. Your SQL implementation may have other SET functions that could be extremely useful and they work just like these. Let's start out with SELECT FROM person P. I know I'm going to do this for all of the different SET functions, so let's just go ahead and add them in, one by one. Let's start at the top. I want to SELECT account and if I say SELECT COUNT, star from person P, notice that's going to return four. In fact, even if I specify a column name, SELECT COUNT person_first_name, if I execute that I'm still going to get four. Now, if I added a WHERE clause here, like WHERE p.person_last_name is equal to Ahern that should give us a different result. The result in that case should be two. Remember if we look at the whole set, which is the table, we're going to first apply the WHERE clause. That's going to give us back this row and this row. Then, it's going to apply the function to the first name column, well of course there are only two first names in that set, so we'll get back two. Let's look at MAX, so let's say we want to select p.person contacted_number. Let's make that MAX, in capital letters. What should we expect to get? Well, we should expect to get the maximum value from that particular column from that particular table, we're not putting any WHERE clause in it. If we look at the person table, person contacted number five, zero, one, 10, the maximum value is 10 and so we should expect to get back 10 and we do. How about if we do the same thing for MIN instead of MAX? Change this to MIN and I'm going to go ahead and execute that query. MIN, we would expect to get zero because again if we look at the data for items in the data set, the minimum value is zero. Now, average that could be a little bit interesting. Average of person contacted that would give my sales manager if this was a sales database, what is the average number of contacts that Jon's had with his clients? If I execute that notice that the average comes back as four because it's going to do the math. It's going to add 10 to 1 to 0 to 5 that's 16. I've got four rows, so the average of that set is four, four divided by 16. SUM, let's get the sum of all my contacts. If I say, "SELECT SUM," I should get the value I just figured out, which is 16 because again here's my column, here's my values, add them all up, SUM them all up and I get 16. Again, these set functions are very similar to functions you might have in a cell inside of Excel. They work exactly the same way. We can use them with WHERE clauses or without WHERE clauses. There's a lot of very useful questions that you can answer by using these set functions.

  44. Set Function And Qualifiers Now, set functions are sometimes used with qualifiers like DISTINCT and when you're going to use one of those qualifiers, you add the qualifier inside of the function next to the column name. The most common example of course is DISTINCT, which would give you a distinct number of column values to work with in terms of passing to the function. You just think about what is the SELECT column name result set versus what is the SELECT DISTINCT result set and then, think about all of those values being passed to the function. When you don't use DISTINCT, it's all of the column values being passed to that function. When you use DISTINCT, it's going to be the distinct set being passed to that function. If you need a little refresher on DISTINCT, go back to the earlier module. An example query could be what is the count of unique first names among my contacts? SELECT COUNT is our set function. DISTINCT first name is the list of values from the person table that we want to pass in. Rather than passing in all of the first names, we're just passing in the distinct first names and that's going to give us a different count. If I just passed in first name that would give me a count of all the rows. This is a different query. This is giving me what are all the different first names in my person table and what's the count of the distinct values, so what is the count of the Shannon's or the Fritz's or the Aherns' or the Jon's, not give me a count of all the rows, but give me a count of the times each first name appears in my table.

  45. Demo - Set Function And Qualifiers Let's look at set functions with qualifiers. I'm going to go ahead and say SELECT FROM person p and let's do what we saw in the slides, SELECT COUNT p.person_first_name. Let's do this again without the qualifier. When I execute that I'm going to get four. There are four rows. The COUNT is going to give me the count of the rows and without any WHERE clause, this is a fairly uninteresting query. It really is just giving me how many rows are in this table, but if we add DISTINCT to this, now what we're saying is I want the count of the distinct first names, not just the first names. If we look at our table, we've got Jon, Shannon, Fritz. Jon and Jon obviously are the same first name. What we should get is not four, but instead would you get three, because there are three distinct first names in our table. This is often a very useful query to be able to execute, to find out interesting things about your data.

  46. GROUP BY GROUP BY is another keyword, another clause that we can add to our queries and this is going to allow our queries to be a little bit more sophisticated, get some more interesting results. What the GROUP BY clause does is it allows me to add multiple columns into the SELECT list and still use a set function inside of the SELECT list. By default, if I use a set function in a SELECT list, I can't also use another column and get a rational result. You get kind of a weird result. What GROUP BY does, you can think about it as breaking the result set into subsets. It then will run the set function against those subsets. What we end up with is one row per subset. The subset is dictated by the column name, specified in the GROUP BY. Now, the column you use in the GROUP BY has to also appear in the SELECT list. This is a hard and fast requirement. The GROUP BY clause appears after the FROM and/or the WHERE clauses, essentially at the end of your query. Let's look at an example, what is the count of every unique first name among my contacts? This is similar to the DISTINCT when we said SELECT COUNT distinct first name, but now we're going to get to see what are those first names and we're going to have this result return more than one row. This is going to return one row for each unique first name because when I say GROUP BY first name, it's going to take the query, it's going to take the SELECT first name. It's going to break that up into subsets, where each subset is the different first names. Then, it's going to give me the count for each of those. This would be a way to say what are the different first names that are in my database and what is the count for each of those. We've got the SELECT clause that includes the COUNT set function and we've got the GROUP BY clause. Again, notice that the p.first_name, the first name column appears in both the SELECT list and as the column name in the GROUP BY clause.

  47. Demo - GROUP BY Just to review, we're talking about things like using the set function with qualifiers and in this case, what we're going to get back remember is three when we say a COUNT DISTINCT person_first_name, but what if we want to see what those first names are at the same time? Well, I could take this column and add another reference to it in the SELECT list like so. Now, let's try to execute the query. The query actually tells me what I need to do. In an aggregated query without a GROUP BY, etc., etc., well what it's really telling me is I need a GROUP BY. Let me do a GROUP BY, GROUP BY p.person_first_name. Now adding that GROUP BY allows the query engine to be able to go through and group those rows in to sets, one and four will be a set, two will be a set and three will be a set. Then, it can run the COUNT DISTINCT on top of that set and then, be able to get me back the right results. What I should see are three rows and the three different names printed out along with the counts. Now, you look at the results and we kind of expected to get two for this result, but remember it's not running the COUNT DISTINCT against the subset that it created from the GROUP BY clause. If we get rid of the DISTINCT that's probably what we want. We want to know what are all the first names and what are the COUNT that is how many times does that first name appear in our list. That gives us the result of one, two and one, where Fritz and Shannon are one and Jon is two. This is what we would expect.

  48. HAVING HAVING is another interesting keyword. All of these keywords that we're talking about are sort of additive on top of each other. You can use just the ORDER BY, you can use just the GROUP BY, but then once you start wanting more particular results, again more specific results or sophisticated results, depending on how you want to look at it, you're going to add more and more of these keywords. HAVING, what it does, it works like the WHERE clause works against select, but it works against a query with a GROUP BY clause. This is going to restrict the result set that the GROUP BY clause is going to operate against. Remember the GROUP BY clause will take the whole result set, break it up into subsets and then, output one row per subset. The HAVING clause allows you to restrict the subset. What is the count of unique first names among my contacts list that appear more than once? I want to say what are all of these unique first names, how many of them do I have and then, adding on those that appear more than once. Going back to our original example, we've got the select, count first name, first name from person P, GROUP BY first name and then, we'll add a HAVING clause. HAVING COUNT p.first_name is greater than one. We're going to see only those names that appear in our person table more than once, then we're going to count each of those. Then, the GROUP BY is going to give us back the count plus the first name. To break it down, again we've got the SELECT clause with the count set function. We've got the GROUP BY clause. We've got the GROUP BY column in the select list, first name and then, now we're adding the HAVING clause. Again, this just allows us to hone in specifically on the data that we want.

  49. Demo - HAVING Let's talk about having, remember having is sort of like a WHERE clause against the GROUP BY, so let's review what we're getting back in terms of result set with our GROUP BY statement. We're saying, "Give me the count of the person first names "and that first name "from person, "GROUP BY person first name." Of course, because this is the column that's not part of a set function that is the column we need to put into the GROUP BY. This is our result, one Fritz, two Jon, one Shannon and again, to review if we go look at our data that makes sense. There are two rows with Jon, one with Fritz, one with Shannon that matches the result set that we see here, but what if we don't want to see all of these, we want to see just a subset of these? I could say HAVING p.person_first_name equals to Jon. Now, this is not a very interesting query, but it shows you that you can put anything inside of the HAVING, it's going to restrict from the result set that we get from the full query to the result set that will match the query with the HAVING, which would be just the row with the Jon. What if we want to do something, where we want to actually reference the COUNT here? Well, we could put in our HAVING clause, the count and we could say, "Where count is greater than one." Now again, we're going to get the same set. We could do count is greater than zero, which of course will give us the full set and that's sort of a redundant query, so let's just change it back to greater than one. Now, when we look at this query, I see something that we could improve a little bit and this is just an aside, it's not really about SQL itself, but remember we can qualify or alias not only table names, but also column names. I might do something like this say as FirstNameCount and I could take this alias and refer to it in the HAVING clause. This keeps me from having to type out this full set function twice. Now, if I run the query, I should get the exact same result. I don't change the result, but the query is a little bit cleaner because if I change this to some other set function, if I add a DISTINCT to it, now I've got this alias, which means I don't have to go and change the HAVING clause. That's the SELECT with a set function, also with HAVING.

  50. Summary To summarize, as you want to answer more complex questions about your data, you're going to have to start writing more complex queries. The ORDER BY allows you to sort the result set. SET functions essentially allow you to slice through the result set or roll-up values in the result set. The GROUP BY clause allows you to create subsets and the HAVING clause allows you to restrict those subsets in the GROUP BY clause.

  51. Matching Different Data Tables with JOINs Introduction Hi, this is Jon Flanders with Pluralsight. In this module, I'm going to talk about matching different data tables with JOINs. So the JOIN clause, the JOIN keyword is a new keyword that we're going to start talking about. It merges multiple tables into one result set. The FROM clause is going to have to include all the tables that we want to JOIN. We're going to separate each table with a comma. Typically, we're going to have a WHERE clause. Most of the time, we don't want to just JOIN the data whole from one table to another. Again, usually we're using these types of clauses in more sophisticated query scenarios. Now in the WHERE clause, we can restrict the result set from each different table. We can put in Boolean expressions that return true or false and those expressions can reference columns from all the tables. Now there are different kinds of JOINs so we're going to talk about each one in turn.

  52. CROSS JOIN The first JOIN we're going to talk about is the CROSS JOIN. This is the simplest JOIN. It's going to return all rows from both tables. No WHERE clause. This is also the least useful. It's very inefficient. It creates what's called a Cartesian product. And you don't actually have to use the CROSS JOIN or the JOIN keyword. It's implied just by the syntax of the query. So here's an example. What are all the first names and email addresses I have? So SELECT first_name from the table p which is person. SELECT email_address from the table e which is email address. We've got the SELECT clause and we've got a FROM clause with multiple tables. This is what's going to create for us our Cartesian product. This is bad practice. This is inefficient query. We don't want to use this kind of query.

  53. Demo - CROSS JOIN All right, I'm going to show you a CROSS JOIN. Again, just to reiterate, do not do this at home or work or anywhere for that matter. You don't want to run this type of query. Let me show you the query. SELECT FROM person p, email_address e, and I'm going to say SELECT p.person_first_name, p.person_last_name, and e.email_address. So what's wrong with this query is we've got multiple tables and there's no WHERE clause. That's really the thing that can let you know that you are doing a full CROSS JOIN and you don't want to be doing that. There is really no reason to do it except to know what to look for. So notice that I get 16 rows. Why do I get 16 rows? Because once you do all of the CROSS of all of the data, notice that here's Shannon Ahern with the jon.flanders email and Fritz Onion with the jon.flanders email, so everybody from the first table with every piece of data from the second table with no attempt at any sort of matching, so this is the thing you don't want to do. Look for FROM clauses where you have multiple tables and there's no WHERE clause. And again, you know you're doing something wrong at that point.

  54. INNER JOIN Now, a more sophisticated kind of JOIN is the INNER JOIN. This is the most typical JOIN we're going to use. This is emphasizing the relational nature of a database. What it's going to do is it's going to allow us to match the column value in one of the tables to the column value in another one of the tables. And in the simple case if we have two tables, that's a column from the first table to a column in the second table. Most of the time, we're going to use a primary key in one of the tables and a foreign key in one of the other tables. If you remember back to the introduction, the relational model is dependent on this idea of each table has a relationship to some other table. We have a primary key in the first table. We've got a foreign key in the second table. And that allows us to relate table one to table two. The INNER JOIN is the clause that's going to allow us to make that happen. So instead of saying, "What are all my contacts "and what are all my email addresses?" Which isn't a very useful result set, how about, "What are all my contacts' email addresses?" So I'm going to have a SELECT first_name, last_name, from the table p which is person, a SELECT email_address from the table e which is email address, and then I'm going to say FROM person p INNER JOIN email_address e ON p.person_id = e.email_address_person_id. So this is the way that we're able to say relate the primary key from my person table to the foreign key in the email address table. So the INNER JOIN clause and the ON clause are the key here. INNER JOIN says I want to join these two tables and then the ON clause specifies between those two tables what columns should we relate, which columns are part of this relational model.

  55. Demo - INNER JOIN Okay, so let's fix this query to something that is an actual JOIN that you'd want to run. Let's change this into an INNER JOIN. So after person p, I'm going to say INNER JOIN email_address e and then here is the key ON p.person_id equals to e.email_address_person_id. So remember that p.person_id is our primary key in our person table, email_address_person_id is our foreign key in the email address table. And so we should get back the set which is all the people in my person table. Let's go ahead and take a look at all those rows again. We've got those four rows. And then all the matches inside of the email address table. So in the end, what I should get is three people because we've got a person Shannon who doesn't have an email address and we've got an email address under Aaron that doesn't have a person. Now I can drive to that the data in order to illustrate these features. You typically would have constraints in place that wouldn't allow that to happen. Let's go ahead and run the query and you see I get those three people. So I only get the data where there is a match between the two columns in each of the tables.

  56. OUTER JOINs So now let's talk about OUTER JOINs. The difference between an INNER JOIN and an OUTER JOIN is that INNER JOINs don't deal with null values and OUTER JOINs do. That is, if there isn't data matching the query in one or both of the tables, INNER JOIN ignores that. OUTER JOINs will work even when there is no match in the second table. Then no columns matching in the second table would mean that those values in the result set will be null. Now a FULL OUTER JOIN will return all the rows from both tables that match the query and then null for the columns in the first table where the second table has data but the first table doesn't and all rows in the second table when the first table has data and the second table doesn't.

  57. LEFT OUTER JOIN Now the LEFT OUTER JOIN is a kind of OUTER JOIN that specifies all the rows from the left side table will be returned. Null for non-matching right side table rows. So an example is what are my contacts and their email addresses including those I don't have an email for? So SELECT first_name, last_name, email_address FROM person LEFT OUTER JOIN email_address ON person_id = email_address_person_id. So we're still using the primary key/foreign key system here. We're still using the relational system. But we're basically saying if there isn't a match in the right hand table, please return those rows. So the result set would look something like this. I've got three rows. The first two rows have data in both tables so we've got all the columns for those two rows filled out. The third column only has data in the person table. So the fact that there is no email address value for that person would in the INNER JOIN case disregard that row. In LEFT OUTER JOIN, we're going to get that row.

  58. Demo - LEFT OUTER JOIN So changing this query from an INNER JOIN to an OUTER JOIN, in this case a LEFT OUTER JOIN, is pretty easy. I'm just going to copy and paste the query and then I'm going to say instead of an INNER JOIN, this is a LEFT OUTER JOIN. And again, remember the difference between an INNER JOIN and an OUTER JOIN is the OUTER JOIN helps us to be able to take into account null values. And I'm going to run both of these queries at the same time and we can compare the two results. So result one is the first query and that's the result that we got in the last demo three rows. The second result is the second query and so you notice that there are two rows in the person table that don't have values in the email address table. So that's what a LEFT OUTER JOIN will give us.

  59. RIGHT OUTER JOIN Now RIGHT OUTER JOIN is the opposite. It gives me all the rows from the right side table and null for non-matching left side table rows. So what are the email addresses I have including those emails I don't have a person for? Now this is an unusual query in that normally we wouldn't have a system that would allow the tables to get into such a state. How we could have an email address and not a person associated with it? So this is more of a hypothetical example in this case, but in the real world, this is definitely possible in many database designs. So we've got the RIGHT OUTER JOIN this time and so the first two rows are the same as the last result set. The third row is I've got an email address, but that email address doesn't have a matching person and so the fact that there's no matching person means that those two columns are null.

  60. Demo - RIGHT OUTER JOIN All right, so let's go back and review. For doing an INNER JOIN, the INNER JOIN remember it ignores null values, okay? If we do a LEFT OUTER JOIN, we get the null values from the right hand table, the right hand side of the LEFT OUTER JOIN. All the rows from the left table are included. Null from the right hand side table. So let's see what happens when we switch that to RIGHT OUTER JOIN. And you should be able to reason about this at the moment because we know that we're going to be changing this from rows that are in the person table that aren't in the email address table, instead we're grabbing rows that are in the email address table that don't have a corresponding row in the person table. And again, I will say this one more time. This is not something that you typically would see in a database of this kind because normally we would have constraints in place to not allow this to happen. But if you have a badly written application on top of your database, that actually sometimes can happen. So let's go down to the last query and change this to RIGHT OUTER JOIN. So let's go ahead and run all these queries at once. We get three results. Here's our RIGHT OUTER JOIN result where we have the row from the email address table that doesn't have a corresponding row in the right hand table. Result 10 is the LEFT OUTER JOIN and result nine is the INNER JOIN which remember doesn't take into account null values at all.

  61. FULL OUTER JOIN And now we can look at the FULL OUTER JOIN. What are all my contacts and their email addresses including the ones missing an email address and the ones with an email address missing a contact? So this is where we're going to use the FULL OUTER JOIN syntax and the result set is essentially going to be a merge of the last two. So when you take the RIGHT OUTER JOIN and add it to the LEFT OUTER JOIN, at least add the differences between those two result sets, we get a result set with the first two rows again, the row that would come from the LEFT OUTER JOIN and then the row that would come from the RIGHT OUTER JOIN. Now just as a note, this particular syntax does not work in MySQL. MySQL just does not support FULL OUTER JOIN. It is part of the SQL standard. For whatever reason, they've decided not to support it. There is a workaround for this if you need to do this in MySQL, but this is actually a fairly uncommon kind of query and so I wouldn't go out of your way trying to do it. Just remember that it exists and its possible.

  62. Demo - FULL OUTER JOIN So as I said, there is no FULL OUTER JOIN support in MySQL. Syntax error, FULL is not a valid input at this position. I can actually simulate a FULL OUTER JOIN however and let's review the two OUTER JOINs. I'm going to comment out the INNER JOIN. And if we talk about the FULL OUTER JOIN, remember what we're really talking about is the results that would be from the RIGHT OUTER JOIN with the results that would be from the LEFT OUTER JOIN. How do we put those two results together? Well, there is a special keyword. It's slightly more advanced, but there is a keyword in SQL called UNION which allows me to union those two result sets into one. And I'm going to say UNION DISTINCT because that means I will only get the distinct rows. I won't get repeating rows. And let's execute that and here is the equivalent of the FULL OUTER JOIN. We have the rows from both tables that have null values in the corresponding other side table. We have the one that is the two rows that are from the LEFT OUTER JOIN and then the row that is from the RIGHT OUTER JOIN.

  63. SELF JOIN Now there's another kind of JOIN that's referred to as the SELF JOIN. Now SELF isn't a keyword. It's just this idea that I can JOIN a table to itself. This is odd but it actually is sometimes useful. Now there's no special syntax for this. The usefulness if often the case where you have a table that's hierarchical. So let's think of the person table where maybe not only am I keeping track of people, but I'm keeping track of who each individual person's superior is or manager is. So the person table then might have another key which is a foreign key back to itself which is for this particular person what is the ID of that person's manager and then a SELF JOIN would allow you to be able to find the person and that person's manager. So we're just going to put the same table name on the left and right side of the JOIN and this can work for any of the JOINs that we've talked about. It's not specific to a particular kind of JOIN. It's more of a conceptual way to query data where you've got the relational system working in a recursive fashion against the table itself.

  64. Summary Okay, so in summary, JOINs are the way that we really make the relational model come to life. The relational model is about having data separated out, normalized into multiple tables, but then able to bring it back together again when we want to see those relationships. And the different kinds of JOINs allow us to do that in slightly differently ways.

  65. Adding, Changing, and Removing Data Introduction Hi, this is Jon Flanders with Pluralsight. In this module, I'm going to talk about adding, changing, and removing data using SQL. In this course so far, all we've done is query data. Querying data is, of course, one of the main functions of SQL. But SQL is used for other things other than querying. It can be used for adding and modifying data as well. So the command we've been talking about in earlier modules of this course, is the select command. This is the read part of the CRUD acronym. But there's also insert, that is the create part of the CRUD acronym. Update, and delete. Now, why do people call it CRUD and not... Whatever word SUID would be, or ISUD? I don't know, I think people just came up with something that actually matched the semantics of what these commands do, not the exact command names or first letters. So CRUD stands for create, read, update and delete. That corresponds to insert, select, update and delete. No real easy way to remember that, other than trying to remember the mappings.

  66. INSERT So insert is the first other command we're going to talk about. And it does exactly what it says, it inserts data into a table inside of a database. Now the command technically is insert into. Insert by itself isn't a valid command. Insert into is the actual command. We put in the table name after the command, so insert into table name. You can only insert data into one table at a time. So there's no sort of, join, or inserting into multiple tables in one statement. Only one table at a time. After the table name, comes a list of the columns that you want to update. Now you have to update at least all of the required columns, but you don't have to update a column if those values are not required, or if there is a default value for that column. After the list of columns comes the values keyword. And then after the values, comes a list of values inside of parens as well. Now, the number of values and the number of columns have to be the same, right? You can't specify a column in the column list, and not specify a value in the value list, or vice versa. So here's an example. Insert into person, person id, first and last name, contacted number, date last contacted, date added. And we can pass in values one string string, zero, null and then the date we've added that contact to the table. So insert into is the command. Person is the table. Here's our column names inside of the parenthesis separated by commas. Then the values keyword, and then the values, separated by commas inside of parentheses.

  67. Demo - INSERT All right, so let's go ahead and do an insert. So, insert into, right, remember that's the keyword, person is the table I want to insert into, inside of the parentheses is going to be the column list. Then the keyword values, and then another set of parentheses. And then all of the values that I want to insert. Okay. And if I run this at the moment, notice it's giving me a warning, but it's not actually giving me an error. And if I go look at the person table, notice that I now have a row... With basically null data here, blank data here, zero here, zero here. So, this is not a very good statement, right? This was essentially a mistake. So what do we want to insert? Person id. Person first name. Person last name. Person... Contacted number. And then the last column is person date last contacted, and person date added. Now, person date added. Person... Date last contacted. Now notice I put these in a different order than they appear in the actual column list in the table. That's okay, right? As long as the values down here match, this is perfectly fine. I'm actually going to change this, because even though it is technically fine, I find it to be a little distracting. And for the most part you want to keep the column names in order unless you have a real good reason not to, because again, it just makes it easier to read and sort of understand what the statement is going to do. And so we need to put in some new values. Now, most of the time in a table like this, the primary key person id would be auto incrementing, that is every time I inserted a new row, I'd get a new person id. 'Cause we're really not going into those features in SQL in this course, I'm going to go ahead and put in a value for person id. And then I'll need to put in a string for person first name, a string for person last name, a number for contacted, and then a date, the date last contacted, and the date that the person was added. So I can put in null for the date last contacted, and for the date added I can put in a formatted date, let's say 2017, 05, 14, 11:43 and 42 seconds. Let's go ahead and run this and see what happens. Now, notice it says date last contacted cannot be null. So the table is enforcing a value to be inserted there. Some value, okay. Let me go ahead and just put in this value. But I wanted to show you that, to show you that not only do you have to have a matching number of columns to values, all of the values have to match as well. So what would happen, for example, if I took out one of these values and tried to run it this way? Again, column count doesn't match value count at row one. So I need to have a match there, and I need to have values for all the required columns. So I will name this person Foo Bar, and let's go ahead and insert this person. I get one row affected, and if I go look at the table, let me run this query again. Notice I now have this additional entry, right? Foo Bar, and it has all of those values that I specified in the values clause of my insert into statement.

  68. BULK INSERT Now, we can do what's referred to as a bulk insert as well. Remember, insert only allows one table and one column list, but it allows multiple rows inside of values. We can either list multiple values explicitly, or you can actually have a select statement following the table name. So with the select statement, we're essentially inserting into one table by selecting from one or more other tables. And this is useful when you're trying to create some sort of join table for fast querying, or roll up table. Think about all of the queries that we talk about with the group by, and having, and these are common queries to want to run, but you may not want to run them all the time. You may only want to run them once a day, and then cache those values. And inside of a database, one great way to cache those values is to put them into their own individual table. So here's an example. We're going to do insert into person, and instead of providing the column list, and the set of values explicitly, we're going to say, get the columns and the values from this other table. In this case we're saying old person, so this would be a good example of some sort of, you know, update to the database, right? Maybe we're trying to get rid of some data, or maybe we're redesigning something. And so we can take one table, take the values from that table, and insert into the other table. Right, so insert into is the key word, person is the table name, and then the select statement. And the select statement, the column list, is going to turn into the column list for the insert into statement, and the values returned from the select are going to end up being the values, right? We just don't have to specify the column list, or the values keyword.

  69. Demo - BULK INSERT So in the slide I showed you one way to do an insert into using a select. Let me go ahead and show you how to do it using multiple values. And because they are handy, I'm just going to go ahead and copy and paste the value statement a few times. And let's make this six, seven... And I can make this one 42. And I'll say this is Foo six, Foo seven, and Foo eight, because I don't really want to be creative and come up with different names. Let's go ahead and execute this query. And I see here, it affected three rows, and if I go back to person, and if I run that query again, you can see, there are the additional rows. So that's an example of a bulk insert using multiple values.

  70. UPDATE Now the update keyword, it modifies one or more columns in a single table, again, like insert, you can only use update against one table. The where clause will dictate which rows get updated. So the where clause in the update statement, is exactly like the where clause in the select statement. The set keyword is going to follow the table name, and the set keyword will define what are the columns that we want to update, and what are those values. So here's our example. We've got the update command. The table name email. The set keyword. The values, that is the column names and the values for those columns that we want to update. In this case we are only putting in one. If you had multiple columns you wanted to update, you'd separate them with commas. We've got the where clause, and then the expression inside of the where clause is email address equals to five. So you can think about how the database would do this. If the database would go to the email address table, it would find the row that corresponds to id equals to five, the expression and the where clause. It will find the email address column, and then it would update it, right? This would be very similar to the way it would work if you had a spreadsheet, and you knew that you needed to go to a particular row in a particular column and update a value, right? This is just the statement that allows the database to do it automatically.

  71. Demo - UPDATE Let's look at the update keyword. So, what I would suggest is when you do the update keyword, remember you're going to do update, and then you're going to specify the table name. And then you're going to have a set keyword, and then here you're going to have the column names and values, and then you want to have a where clause. And you want to put in the where clause first, like for example, where, p.person id is equal to zero. If we want to go update that zero row that I put into the table in the insert demo. The problem is, is if you do something like this, say update person p, set p.person first name, equal to Bob, okay. If you haven't put in the where clause first, there's a chance that you could run this statement by itself. Now, notice that my SQL says, you are using safe update mode, and you tried to update a table without a where that uses a key column. So it's basically telling you, this is a bad idea, right? You don't want to run this query, because what you're going to end up doing is setting all of the rows first name column, equal to Bob. So that's why you always want to write the where clause first. Now, in this case again, luckily I'm using my SQL, and there's this my SQL setting, but not all databases do this. The setting might be turned off by your database administrator. Just be safe, right? This is one of the places where I will tell you, you don't want to do what I did. I've done this many times. Accidentally not added the where clause, and updated a table, and all of a sudden, all of the rows of the table have the same value. Not very fun to do. Let's execute the one that does have the where clause, go over here and re execute this query, and now you can see that the person's first name is equal to Bob. Now we've could have done first name and last name at the same time, right? You can update multiple columns, you just can't update multiple tables at the same time. Let's run this again. Now, it's going to run again, it'll update the first name column as well, but that's okay, because we just updated it to the same value, so that's no big deal, right? We're already at the row, we're already doing an update, not a big deal to update an existing value, not a big deal. Okay, so that's the update statement.

  72. DELETE Now the delete statement, this deletes one or more rows in a table. Again, singular table. Remember, this is permanent, right? If you delete rows from a table, they are deleted forever, right? There's no trash can or recycle bin inside of a database. Those rows are gone. Delete from is actually the full command name. Now using a where clause is critical here, and in fact, some database tools, and some database settings require you to put in a where clause. This is to help avoid the potential fat fingered problem of deleting all of the rows in your table. Now, sometimes you want to delete all the rows in your table, but you're going to have to be very specific about it, if that's what you want. So here's an example, delete from person p. Now, this is what I said, is a bad practice, right? We don't want to do this. We don't want to do this, because A, some databases don't allow us to do this, but B, we're not being explicit and... The chances of us making a mistake and deleting all the rows is fairly high. Here's a better command. Delete from person p, where, person id equals to five. This is a much better practice. Now, if you wanted to delete all the rows in the person table, you could have an expression like, id is greater than zero, right? And presumably all of the ids would be positive integers, and you'd delete all of the people from your person table.

  73. Demo - DELETE Okay, now let's look at the delete statement. So I'm going to type in delete from, and then table name. And I'm going to say person. When I execute this query inside of my SQL workbench, notice that I get an error, right? My SQL itself is set in what's called a safe mode, and it doesn't let you do certain queries, and this is one of those queries. As I said in the slide, this is a very dangerous query. You always want to write a delete statement with a where clause so that you are very specific about what you are deleting. You can use any expression. So, I'm going to say, delete from person, where person id is greater than four. That's going to delete just those rows that I added during the last few demos, right? The Foo Bar, Foo six Bar, Foo seven Bar, et cetera. I'm going to go ahead and execute that statement. Notice that four rows are affected. And if I go back and I rerun the select statement, you notice that all of those extra rows are now gone, okay? So that's using the delete statement. Big thing to remember, always use a where clause when you're using the delete statement. Even if your database doesn't have a feature like my SQL or you're using my SQL and that feature is turned off, you want to learn from people like me, and their mistakes, and I've made that mistake numerous times in my career, and it's never fun. Okay, so always use a where clause.

  74. Summary Okay, so this is CRUD, right. Insert, update, and delete. These SQL commands added together with the select command allow you to query the database, and also allow you to modify the data inside of your database.

  75. Creating Database Tables Introduction Hi this is Jon Flanders with Pluralsight. In this module I'm going to talk about creating databases and database tables. So data definition language or DDL as it's known for short is a subset of the sequel standard for creating databases and tables, and all the accompanying things, right, columns, foreign keys, and all sorts of other database constructs. Now most database tools, most database products have a visual tool, and some of them have more than one visual tool that allows you to do this visually, right, using a point and click rather typing in commands. I think it's still a good idea to have an idea of what these commands are doing, and at the end you may end up scripting from those visual commands that you've executed a set of DDL files or a DDL file that you can use to recreate your database in another context or environment or as a backup for example.

  76. CREATE DATABASE Now the first thing we want to do is create a database. Oddly enough the create database command is not part of the SQL standard. It is however supported by almost all implementations, so this will be the first command that we will use to specify we want to create this container, and the container is going to contain some number of tables, right. We want to organize our tables in a logical way, related tables together inside of a database. There's lots of other reasons for partitioning databases in particular ways. Some have to do with querying, some have to do with performance, some have to with reliability and availability, all things beyond the scope of this course. Now once you have a database, if you want to execute a query inside of that database, there are two ways to do that. One is to use the database statement. All queries that happen after they use database statement will happen in the context of that database, or you can fully qualify the table name to the database instead. This is actually generally a best practice to fully qualify the table name. In some databases it's known that fully qualifying the table name can actually lead to an increase in performance in your queries. So here's our create database statement. This is going to create a database called contact. Now if we want to query a table inside of the contact database, we can either use the use database contact, once we execute that command, all the other queries that we execute after that in the same session will be scoped to the contact database until we say use database, some other database name. Again another way to do this is to write your queries in such a way that no matter where they're executed from, they are qualified to the name of the database and table. And again this is probably a slightly better way of doing it than the use database command.

  77. Demo - CREATE DATABASE So the create database statement, pretty straightforward, create database, name of the database that you want to create like contacts and then semicolon. Now when I execute this command, notice that the database comes back and says, you can't do this, there is already something named contacts, right, a database named contacts already exists. So you can't have more than one database inside of an instance of your database server that has the same name, right you can only have one. So I'll call this contacts_v2 just for demonstration purposes. So I'm going to go ahead and create that database. Now if I go over here to my navigator inside of MySQL Workbench, I'll see contacts_v2, and I'll see that there are no tables, but we'll create a table and talk about that next.

  78. CREATE TABLE After we've created the database, we probably want to create a table, and the create table statement is part of the SQL standard. After you have the create table statement, you're going to have the table name, then a list of column definitions. At a minimum for every column, you're going to have a column name and type. So you're going to specify what's the name of this column, and what type of data this column is going to contain. So here's our create table statement. We're going to follow that by the name of the table, and then we're going to follow the table name by the columns, and the columns are going to contain both the column name as well as the column type. So we've got three columns, two of which are integers or numeric values. And the third column is the email address column and that's going to be a string or a set of characters. And in this case we are restricting the value space of those characters to be no more than 55 characters. Now since we're talking about column names and column types, we should talk about what the standard data types are for SQL. So here are a list of the standard SQL data types. I'm not going to read all of these to you. If you really want to know about each one of these, I would suggest pausing the video at this point and reading through the list. Suffice it to say that we've got the typical kinds of data types that you would expect, a character data, binary data, numeric data, with different value spaces that is what are the potential values of the data in those columns, Boolean, date, time, and timestamp. So the thing to remember here is that when you're designing your database, you want to design it in a very extensible way, you don't want to restrict the questions that you build to ask of your database in the future. A big mistake that a lot of people make is, well I don't think I'll ever have more than 32,000 rows, so I'm going to have the key of my table be a small int for example, and then when you get to 33,000 rows, you're in trouble because you can't have 33,000 rows. You're going to have to go back and modify the database and update it, and that's really a pain. So you want to plan for the future, right, in terms of how much space that you're giving all of these different data types, especially now in the cloud era and the era of storage is really cheap. In the "old days" we had to be very careful about the data types because we didn't want to have the database tables taking up more space on our disks than they needed to because sometimes we do run out of disk space. Again in the modern world we generally have bigger disks, and if not bigger disks, multiple disks, and if not multiple disks, cloud-based disks, and so you don't have to worry about that so much.

  79. Demo - CREATE TABLE So let's say that I want to create a table and I'll call it person. Now this is going to be just fine unlike my database which has the same name as another database inside of my server. You can't have tables in multiple databases with the same name because they're all scoped by the name of the database. So this is perfectly fine. I can go ahead and have this. Now after create table, remember I'm going to put the parentheses, and then inside of the parentheses I'm going to put a list of the columns that I want to create. So something like person_id which would be an integer column. And person_first_name which is going to be some sort of character column. I'll have this be a varchar character column, and I can specify how big I want that to be, it depends on how long you think a person's first name could possibly be, so you might want to make that reasonably big. And I'll do person_last_name, also varchar, and I'll keep it the same size. And you just keep going like this and going like this until you have all of the table defined, and then you want to go ahead and run the query. Let me go ahead and comment out the create database part of this file so it doesn't run again, and I'm going to go ahead and run this statement. Now it says table_person already exists, how is that possible? There are no tables here as far as I can see. You have to remember that all of your commands, queries, and data definition language commands are all scoped to a particular database, and I happen to be in the contacts database, so how do I get out of the contacts database? Well I can say use contacts_v2. When I run this statement, that's going to put my execution contacts in the contacts of contacts_v2, and I'll be able to successfully create my table. There we go. Zero rows affected. And if I refresh my Tables View over here, you can see that there is in fact a new table called person and it has the columns that I specified.

  80. NULL Values Now we've talked about null values earlier in this course in terms of joins. And so how do you end up with a column that allows null values? Null is a very special type inside of SQL. It really means there's no value there, a lack of value, right, which is different than a zero value or an empty string value. Now columns can be marked as either required or not required. If they're required, that means they are not null, right. A required column means you cannot have null values, okay. If it's not required, then you can have null values. So in the null case, this is the default for a column. So if you don't define not null on a column, it by default is null. Not null has to be specified. Once you have a not null column, then inserting a null value causes an error. If you have a null column which again is the default, inserting null values are okay. Deciding whether a column should be null or not null is really just a matter of logic. Should this particular piece of data be required? Is it necessary? If it's necessary, you want it to be a not null column, you want it to be required. If it isn't necessary and some rows of your table will not have that value, it's better to allow null values because null values indicate a lack of value, right. Don't put in like a special value, right, like again zero or empty string. That's a bad database design. So create table, our create table command followed by the table name, and then our columns, and in this case email_address we're specifying as an integer, and it is not null. That is it's required. The next column is email_address_person_id, and we're saying this is nullable or a null column, not required column. Email_address, we're going to specify that that's not null, right, so that is also a required column. So what are we really saying with this create table statement? If I were to translate it into English, I'd say we're going to have a table that's going to contain email addresses, it's going to have a identifier, it's going to have an email address, and then optionally it's going to have a foreign key that relates back to another table, and this is the person_id column, right, relates back to the person table. And again what this really means is I can have an email address without it being associated with the person. If you go back earlier in the course, we ran some queries where this was the case. This is not necessarily a "great database design," it's just there to indicate to you this idea of when I have one table that relates to another, but sometimes one of the rows may not relate to the other table, you want to make your foreign key value null. This is one of the scenarios where you might use a null or nullable column, not required.

  81. PRIMARY KEY So keys are a very important part of database design. In fact they're essential to the relational system. And the first kind of key is a primary key. So a primary key column must have a unique value per row, that is every row has to have a unique primary key. Null can't be one of these values. Now you can have multiple columns be the primary key, this is what's called a compound key. This is in the case where you may query one or more columns, and querying a primary key typically is more efficient in a database for a variety of reasons beyond the scope of this course. Also sometimes you're not using a unique, like, auto-incremented value which is a very common thing to do. You're actually using columns that represent actual data about the rows, and each of those columns is not unique on its own, but would be unique when combined together. So our create table statement, when we are adding primary keys, basically stays the same, create table, table name, and then columns. And then when we define the columns, we're going to decide which column is the primary key. In this case we're saying email_address_id is an integer and is the primary key. The rest of the columns then are just the same.

  82. CONSTRAINT Now the constraint keyword is a useful keyword because instead of putting primary key or foreign key or all these different kinds of constraints, there are a few other constraints that you can add to columns, rather than putting them in line with the column definition, you can add them all at the end of the table definition inside of the column list. And this is convenient because it just allows you to look quickly at a create table statement, and look at the constraints all in one place, rather than having to look at this column and that column, right. Again it's just a way of organizing the constraints in a very explicit way. So we're going to add all of our keys and other constraints in just one grouping, right, and this can include primary or foreign keys, or again other constraints that you'll learn about as you dive deeper into SQL. So here's our create table, in this case we're creating a table called phone_number, we're saying the phone_number_id is not null, the phone_number_person_id is not null, and the phone_number itself is not null. Then we're going to add the constraint keyword. That's at the end of all the columns. In this case we're naming the constraint, you can name the constraint whatever you want, the constraint name could be food but it's a common convention to have pk and the name of the column as the identifier for a primary key, and fk as we'll see as the identifier for a foreign key. And so then we say constraint, name of constraint, type of constraint, it's a primary key, and it references the phone_number_id column, right. Again this is just an example of one constraint but as you create more complicated tables, having the constraints at the end is a very useful feature to take advantage of.

  83. ALTER TABLE Now at some point in the future, you may run into a situation where you've got to change the table, you have to change the DDL. You could take that data, export it, create the table, reimport it, or in certain circumstances you can just use the alter table command. This allows you to change an existing table in line while it's live in your database, you can do things like add or remove a column, you can change a column's data type, you can change column constraints, but everything you change has to match with the current data. So for example you couldn't change a column from not null to null if there were no values for some of the rows. That would cause an error to happen. Now you could go in and update all of the rows that have the null value and change it to some other value, and then run the alter table statement, but when you run the alter table statement, it has to match the data that is already in the table. So alter table, then the name of the table that we're altering, and then we use the add keyword. When we say we want to add, in this case, we want to add a constraint that's a foreign key, and the column that the foreign key corresponds to is email_address_person_id, and the foreign key constraint has a special extra syntax that the primary key constraint doesn't, because we have to say that this foreign key which is a column in the email_address table, the table we're altering, references another table. So the table that it references is person, and then what's the column inside of the person table that this foreign key references. And again as I've said during the course, primary keys and foreign keys are sort of the linchpin of the relational system, and this is the most straightforward simple example of that, right. We've got a table called person, each row in the person table is identified by the person_id. We have an email address table. Inside of the email address table, we have a column called email_address_person_id which is now a foreign key that references back to the person_id.

  84. DROP TABLE And finally we come to the drop table command. This removes a table and all of its data from the database. Remember this is permanent, not unlike the delete from command. This one is even more draconian because it not only removes all the data, but it removes the table itself, the definition of the table. So you want to be very careful when you use this. Now if you try to run drop table, and that table has data that is a foreign key to another table, in our previous slide example that would be the person table, we can't drop the person table until we drop the email address table. So these relational references between tables is going to be enforce by the database right. We can't delete things that are referenced by some other entity in our data. Drop table, fairly straightforward. Drop table, then the name of the table, semicolon. That will again delete all the data in the person table, and delete it from our database permanently.

  85. Demo - CREATE TABLE Okay so we've talked about alter table and drop table, and now I'm going to show you a couple of usages of both of those commands. Let's say that I want to make this column not null because I want it to be the primary key of this table, right. It makes sense that the person_id would be the primary key. There's two ways I could do this. One way is I could add the alter table statement, but because I haven't actually added any data yet, using the drop table statement on person and then rewriting the column definition to be not null, primary key, would be another way that I could do this. Now when I write these types of scripts, I always put the destructive command inside of a comment. That's to save me from accidentally executing the destructive command over and over again. I have to be very explicit and highlight it, and execute just the selected portion of the script. So when I do that, it drops the table, great. Now I can run this command, and I can run this command without having to highlight anything because it's uncommented out. And again that's just a little tip that I use in order to stop myself from making mistakes all the time because remember the drop table statement drops the table and all of the data, and the only reason that I'm using it here is because I had no data in that table. Now let's take the alter table statement into account. Let's say that I also want to create a table called email_address, and inside of email_address I'm going to have some columns, I'm going to have three columns. One is email_address_id which I want to be an integer, not null, and the primary key. I'm going to have email_address itself which is going to be a character column, I'm going to make it 256. And then I'm going to have email_address_person_id which is going to be a foreign key, and it's going to reference back to the person_id column in the person table. I could add that foreign key constraint in line inside of the create table statement by adding the constraint command, and adding in all of the syntax for the foreign key. But a very common way to do this and probably a more common way to do this would be to use the alter table statement outside of the creation statement. So I'm going to say alter table, email_address, add constraint, I'm going to call this foreign key_email_address_person. This is going to be a foreign key. That foreign key is going to reference the email_address_person_id column. And the reference to the other table is going to be person, and the column there is going to be person_id. So now this statement will change the email address table and add a constraint, and of course if I tried to execute that right now, we get an error because there is no table yet called email_address, I haven't created it yet. Now in this particular script because it's so simple, I could put that add constraint command into the create table statement for email_address. But imagine if the email address table were above the person table in the script, well now we have a forward dependency problem, right. We can't create the constraint here inside of the table statement. I can't create a constraint that references the person table because the person table hasn't been created yet. That's why you will typically see in database creation scripts, like you would see if you look at the one for this course, that most of the time people create all of their tables at once, and then create all of their constraints one after another. And that's so you don't have to worry about this forward dependency problem of, am I creating the tables in the right order because if you just create all your tables, and then you add all your constraints, and it works out great. So I'm going to go ahead and drop the person table again. And now I'm going to go ahead and clear my commands, and I'm going to go ahead and execute this whole script again. So it created the email_address table, it created the person table, and then it added the foreign key constraint to the email_address table which referenced back to the person_id table, and again that was okay because at the point that the alter table statement was executed, both of the other tables had been created.

  86. Summary So to summarize, knowing about DDL at least a little bit is a very useful thing when you're working with SQL. You're going to use it very much. Typically speaking the way you will use it is something like the way that you will be using it in this course which is somebody will give you a SQL file, you'll import that file, and execute it in the context of your database, and it will create the table and all of the data for you. But understanding what that does is a useful skill to have. Create table allows you to configure your table, your columns, and all your relations and constraints. Alter table lets you change that later. Again as long as that is matching with the data and the table definition that you already have. And drop table removes the table and all of its data from the database.