1 Querying Data with Transact-SQL
Notes taken during/inspired by the edX course ‘Querying Data with Transact-SQL - Microsoft: DAT201x’ by Graeme Malcolm and Geoff Allix.
Course Handouts
- Course Syllabus
- Getting Started Guide Including Install
- Adventure Works Entity Relationship Diagram
- Adeventure Works db install script
NOTE: Remember to ensure read only access for everyone to the folder containing the .SQL and other files - GitHib Repo for course including course materials, slids, labs etc
- A copy of the above materials should they be changed or removed
Other useful links * Transact-SQL Refrence
library(DBI)
## Loading required package: methods
# creates a connection to the SQL database
# note that "con" will be used later in each connection to the database
con <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "localhost\\SQLEXPRESS",
Database = "AdventureWorksLT",
Trusted_Connection = "True")
# Sets knitr to use this connection as the default so we don't need to specify it for every chunk
knitr::opts_chunk$set(connection = "con")
1.1 Introduction to Transact-SQL
SQL or Structured Query Language was first developed in the 1970s by IBM as a way of interacting with databases. Other vendors have specfic versions of SQL for instance Oracle is PL/SQL, Microsoft’s implentation is TSQL or Transact SQL. Both SQL Server (on prem) and Azure SQL Databases (cloud) use the same query language, however Azure is a subset of full TSQL since it some commands relate to local files and data functions within .NET that relate only to SQL Server. However, as new features are added to Azure, some new commands are being added to Azure.
SQL is a declarative language - you express what it is that you want, the results - rather than specifying the steps taken to acheive that - it is not prodecural like other programming languages, it is set theory based. It is possible to write proceedural elements or steps within TSQL, however if this is ocurring a lot, it is perhaps better done in another language, which may also perform or run better.
In databases, we typically talk about entities - one type of thing - which is contained in each table.
* Entities are represented as relations (tables) * And entity attributes as domains (columns)
Most relationships are normalized, with relationships between primary and foreign keys. This helps to reduce duplication, however there are instances where de-normalised data is desired.
Schemas are namespaces for database objects - is shows a logical layout for all or part of a relational database, “As part of a data dictionary, a database schema indicates how the entities that make up the database relate to one another, including tables, views, stored procedures, and more.”(see Lucid Chart on Database Schemas). The process of creating a database schema is called data modelling.
When referring to objects in a database, we could use a fully qualified name, such as:
- [server_name.][database_name.][schema_name.]object_name
This is only really relevant for SQL Server, since Azure will only work with one database at a time. Most of the time we typically just use
- schema_name.object_name
The schema name somwtimes be discarded, but it is considered best practice to include this, since there is sometimes some ambiguity about tables e.g. if we have two tables - Product.Order and Customer.Order - which order table is being referred to, that in the customer or product schema?
SQL has a number of SQL Statement Types:
- DML or Data Manipulation Language - SELECT, INSERT, UPDATE, DELETE
- DDL or Data Definition Language - CREATE, ALTER, DROP
- DCL or Data Control Language - GRANT, REVOKE, DENY
The course focuses on DML which is typically for working with data.
SELECT statement has a number of possible sub-components:
- FROM [table]
- WHERE [condition for filtering rows]
- GROUP BY [arranges rows by groups]
- HAVING [condition for filtering groups]
- ORDER BY [sorts the output]
Whilst a SQL statement can look like English, it doesn’t neccessarily run from top to bottom in terms of the sequence of elements that are run in a query. For instance, the FROM is the first thing that will be run, then the WHERE filter will be run, then we GROUP BY, then SELECT the columns we are interested in and finally ORDER the results. This can be important when running some queries, which will be explored later in the course. When we run a query, it is not an actual table in a database that is return but a set of rows or record set or subset.
1.1.1 Data Types
There are a number of different data types in T-SQL as shown below, which are grouped in to a number of different types.
This is more relevant when designing a database, however it is useful to know when querying what data type you have in a broad sense - numeric, data, string and so on - as the types will determine what type of combinations can be combined together in expressions e.g. you can concatenate strings or add numbers together, but you can’t concatenate a string and a number together.
Sometimes it is neccessary to convert data from one type to another, there are two ways this could happen
- Implicit conversion - compatible data types are automatically converted
- Explicit conversion - requires an explicit function e.g. CAST / TRY_CAST, STR, PARSE ? TRY_PARSE, CONVERT / TRY_CONVERT
The TRY options will attempt a conversion and if it does not work, a NULL will be returned rather than an error in the non-TRY version.
1.1.2 Working with NULLs
There are recognised standards for treating NULL values - ANSI - which says that anythign involving a NULL should return a NULL. There are functions that help us handle NULL values:
- ISNULL(column/variable, value) - Returns value (which you can specify) if the column or variable is NULL
- NULLIF(column/variable, value) - Returns NULL if the column or variable is a value - we are almost recoding a non-null to a null
- COALESCE (column/variable1, column/variable2, …) - Returns the value of the first non-NULL column or variable in the list - for instance if contact details, someone might not have an email, so we might want a telephone number, if they don’t have that, return an address etc
NULL is used to indicate an unknown or missing value. NULL is not equivalent to zero or an empty string.
ISNULL can be used like an IF function in excel, for instance:
SELECT name, ISNULL(TRY_CAST(size AS Integer), 0) AS NumericSize
FROM SalesLT.Product;
In this instance, if there is a value that will be returned, if not, the NULL value will be returned as a 0.
We can also use a CASE statement to return a value whilst integrating NULL in to our query, e.g.
SELECT name,
CASE size
WHEN 'S' THEN 'SMALL'
WHEN 'M' THEN 'MEDIUM'
WHEN 'L' THEN 'LARGE'
WHEN 'XL' THEN 'EXTRA LARGE'
ELSE ISNULL(Size, 'N/A')
END AS PRODUCT
FROM SalesLT.Product;
name | PRODUCT |
---|---|
HL Road Frame - Black, 58 | 58 |
HL Road Frame - Red, 58 | 58 |
Sport-100 Helmet, Red | N/A |
Sport-100 Helmet, Black | N/A |
Mountain Bike Socks, M | MEDIUM |
Mountain Bike Socks, L | LARGE |
Sport-100 Helmet, Blue | N/A |
AWC Logo Cap | N/A |
Long-Sleeve Logo Jersey, S | SMALL |
Long-Sleeve Logo Jersey, M | MEDIUM |
1.1.3 Lab Exercises
AdventureWorks Cycles is a company that sells directly to retailers, who then sell products to consumers. Each retailer that is an AdventureWorks customer has provided a named contact for all communication from AdventureWorks.
The sales manager at AdventureWorks has asked you to generate some reports containing details of the company’s customers to support a direct sales campaign. Let’s start with some basic exploration.
First we display the sales person, the customer’s title, surname and telephone number
SELECT SalesPerson, Title + ' ' + LastName AS CustomerName, Phone
FROM SalesLT.Customer;
SalesPerson | CustomerName | Phone |
---|---|---|
adventure-works0 | Mr. Gee | 245-555-0173 |
adventure-works8 | Mr. Harris | 170-555-0127 |
adventure-works0 | Ms. Carreras | 279-555-0130 |
adventure-works0 | Ms. Gates | 710-555-0173 |
adventure-works0 | Mr. Harrington | 828-555-0186 |
adventure-works3 | Ms. Carroll | 244-555-0112 |
adventure-works0 | Mr. Gash | 192-555-0173 |
adventure-works1 | Ms. Garza | 150-555-0127 |
adventure-works1 | Ms. Harding | 926-555-0159 |
adventure-works1 | Mr. Caprio | 112-555-0191 |
Next we cast the CustomerID column to a VARCHAR and concatenate with the CompanyName column
SELECT CAST(CustomerID AS VARCHAR) + ': ' + CompanyName AS CustomerCompany
FROM SalesLT.Customer;
CustomerCompany |
---|
1: A Bike Store |
2: Progressive Sports |
3: Advanced Bike Components |
4: Modular Cycle Systems |
5: Metropolitan Sports Supply |
6: Aerobic Exercise Company |
7: Associated Bikes |
10: Rural Cycle Emporium |
11: Sharp Bikes |
12: Bikes and Motorbikes |
The SalesLT.SalesOrderHeader table contains records of sales orders. You have been asked to retrieve data for a report that shows:
The sales order number and revision number in the format
SELECT SalesOrderNumber + ' (' + STR(RevisionNumber, 1) + ')' AS OrderRevision,
CONVERT(NVARCHAR(30), OrderDate, 102) AS OrderDate
FROM SalesLT.SalesOrderHeader;
OrderRevision | OrderDate |
---|---|
SO71774 (2) | 2008.06.01 |
SO71776 (2) | 2008.06.01 |
SO71780 (2) | 2008.06.01 |
SO71782 (2) | 2008.06.01 |
SO71783 (2) | 2008.06.01 |
SO71784 (2) | 2008.06.01 |
SO71796 (2) | 2008.06.01 |
SO71797 (2) | 2008.06.01 |
SO71815 (2) | 2008.06.01 |
SO71816 (2) | 2008.06.01 |
Next we write a query that returns a list of customer names. We use ISNULL to check for middle names and concatenate with FirstName and LastName.
SELECT FirstName + ' ' + ISNULL(MiddleName + ' ', '') + LastName
AS CustomerName
FROM SalesLT.Customer;
CustomerName |
---|
Orlando N. Gee |
Keith Harris |
Donna F. Carreras |
Janet M. Gates |
Lucy Harrington |
Rosmarie J. Carroll |
Dominic P. Gash |
Kathleen M. Garza |
Katherine Harding |
Johnny A. Caprio |
Next, we will imagine that some data has been deleted - customer email addresses - then we try to find contact details in sequence.
UPDATE SalesLT.Customer
SET EmailAddress = NULL
WHERE CustomerID % 7 = 1;
Next we write a query that returns a list of customer IDs in one column, and a second column named PrimaryContact that contains the email address if known, and otherwise the phone number.
SELECT CustomerID, COALESCE(EmailAddress, Phone) AS PrimaryContact
FROM SalesLT.Customer;
CustomerID | PrimaryContact |
---|---|
1 | 245-555-0173 |
2 | keith0@adventure-works.com |
3 | donna0@adventure-works.com |
4 | janet1@adventure-works.com |
5 | lucy0@adventure-works.com |
6 | rosmarie0@adventure-works.com |
7 | dominic0@adventure-works.com |
10 | kathleen0@adventure-works.com |
11 | katherine0@adventure-works.com |
12 | johnny0@adventure-works.com |
You have been asked to create a query that returns a list of sales order IDs and order dates with a column named ShippingStatus that contains the text “Shipped” for orders with a known ship date, and “Awaiting Shipment” for orders with no ship date.
Again, we imagine that some data is missing by deleting some first.
UPDATE SalesLT.SalesOrderHeader
SET ShipDate = NULL
WHERE SalesOrderID > 71899;
SELECT SalesOrderID, OrderDate,
CASE
WHEN ShipDate IS NULL THEN 'Awaiting Shipment'
ELSE 'Shipped'
END AS ShippingStatus
FROM SalesLT.SalesOrderHeader;
SalesOrderID | OrderDate | ShippingStatus |
---|---|---|
71774 | 2008-06-01 | Shipped |
71776 | 2008-06-01 | Shipped |
71780 | 2008-06-01 | Shipped |
71782 | 2008-06-01 | Shipped |
71783 | 2008-06-01 | Shipped |
71784 | 2008-06-01 | Shipped |
71796 | 2008-06-01 | Shipped |
71797 | 2008-06-01 | Shipped |
71815 | 2008-06-01 | Shipped |
71816 | 2008-06-01 | Shipped |
1.2 Querying Tables with SELECT
1.2.1 Removing Duplicates
If we wanted to know what colours our products are, we would run something like the following.
SELECT Color
FROM SalesLT.Product;
Color |
---|
Black |
Red |
Red |
Black |
White |
White |
Blue |
Multi |
Multi |
Multi |
Here each product has a row and corresponding colour. However, if we just want colour, we are typically interested in removing duplicates to just show what colours we are actually producing. This is achieve using the DISTINCT keyword
SELECT DISTINCT Color
FROM SalesLT.Product;
Color |
---|
NA |
Black |
Blue |
Grey |
Multi |
Red |
Silver |
Silver/Black |
White |
Yellow |
These results are DISTINCT at the row level, so if we have two combinitions of columns - say size and colour - then it would be DISTINCT colour and size combinations that appear in the database.
SELECT DISTINCT Color, Size
FROM SalesLT.Product;
Color | Size |
---|---|
NA | NA |
Black | NA |
Black | 38 |
Black | 40 |
Black | 42 |
Black | 44 |
Black | 46 |
Black | 48 |
Black | 52 |
Black | 58 |
Here is just the size - here IS NULL will return a ‘None’ if the Size is missing (NA).
SELECT DISTINCT ISNULL(Size, 'None') AS Size
FROM SalesLT.Product;
Size |
---|
38 |
40 |
42 |
44 |
46 |
48 |
50 |
52 |
54 |
56 |
1.2.2 Sorting Results
ORDER By is how we sort the results. Any aliased fields used in the SELECT element are visible by ORDER BY. You can oder the results using columns that are not selected in the SELECT clause. You can also ORDER BY multiple columns, either ascending or descending.
We can also just show the top 10 products, e.g. the top 10 most expensive. This is done using the keywork TOP
SELECT TOP (10) ProductCategoryID AS Category, Name, ListPrice
FROM SalesLT.Product
ORDER BY ListPrice DESC, Category;
Category | Name | ListPrice |
---|---|---|
6 | Road-150 Red, 62 | 3578.27 |
6 | Road-150 Red, 44 | 3578.27 |
6 | Road-150 Red, 48 | 3578.27 |
6 | Road-150 Red, 52 | 3578.27 |
6 | Road-150 Red, 56 | 3578.27 |
5 | Mountain-100 Silver, 38 | 3399.99 |
5 | Mountain-100 Silver, 42 | 3399.99 |
5 | Mountain-100 Silver, 44 | 3399.99 |
5 | Mountain-100 Silver, 48 | 3399.99 |
5 | Mountain-100 Black, 38 | 3374.99 |
We can also use TOP (N) Percent or TOP (N) WITH TIES. If we say wanted the bottom 10 items - say those with the lowest price - there is no ‘BOTTOM’ Keyword, instead we would sort our data so that they are now in the the order we want - with those we are interested at the top - the use TOP again.
1.2.3 Paging through results
This is achieved through using the OFFSET-FETCH which is an extension of ORDER BY. This might be useful if you have a set of web page results and you want to see certain ones.
You first say how many rows you want to skip using e.g. OFFSET 10 ROWS, then use specify how many rows you are interested in retrieving from the database e.g. FETCH NEXT 10 ROWS ONLY.
1.2.4 Filtering and Using Predicates
We can use the WHERE clause with a number of coniditions or predicates. For instance = (equals) <> (not equals), IN, BETWEEN (is an inclusive statement e.g. BETWEEN 100 AND 200 includes 100 and 200), LIKE, AND, OR and NOT. IN can be more efficient in coding terms when testing multiple attributes, as you just say color IN (red, blue) rather than colour = ‘red’ OR colour = ‘blue’. This becomes more useful when testing multiple conditions e.g. IN (red, blue) AND size = large - this would have more typing with explicit code for each combition.
Like mathematics, SQL works on PEMDAS sequencing - parenthesis, exponents, multiplication, division, addition, subtraction.
Some examples.
First look for products that start with an FR:
SELECT Name, Color, Size, ProductNumber
FROM SalesLT.Product
WHERE ProductNumber LIKE 'FR%';
Name | Color | Size | ProductNumber |
---|---|---|---|
HL Road Frame - Black, 58 | Black | 58 | FR-R92B-58 |
HL Road Frame - Red, 58 | Red | 58 | FR-R92R-58 |
HL Road Frame - Red, 62 | Red | 62 | FR-R92R-62 |
HL Road Frame - Red, 44 | Red | 44 | FR-R92R-44 |
HL Road Frame - Red, 48 | Red | 48 | FR-R92R-48 |
HL Road Frame - Red, 52 | Red | 52 | FR-R92R-52 |
HL Road Frame - Red, 56 | Red | 56 | FR-R92R-56 |
LL Road Frame - Black, 58 | Black | 58 | FR-R38B-58 |
LL Road Frame - Black, 60 | Black | 60 | FR-R38B-60 |
LL Road Frame - Black, 62 | Black | 62 | FR-R38B-62 |
Or we can look for products that end in a 58:
SELECT Name, Color, Size, ProductNumber
FROM SalesLT.Product
WHERE ProductNumber LIKE '%58';
Name | Color | Size | ProductNumber |
---|---|---|---|
HL Road Frame - Black, 58 | Black | 58 | FR-R92B-58 |
HL Road Frame - Red, 58 | Red | 58 | FR-R92R-58 |
LL Road Frame - Black, 58 | Black | 58 | FR-R38B-58 |
LL Road Frame - Red, 58 | Red | 58 | FR-R38R-58 |
ML Road Frame - Red, 58 | Red | 58 | FR-R72R-58 |
Road-450 Red, 58 | Red | 58 | BK-R68R-58 |
Road-650 Red, 58 | Red | 58 | BK-R50R-58 |
Road-650 Black, 58 | Black | 58 | BK-R50B-58 |
Road-250 Red, 58 | Red | 58 | BK-R89R-58 |
Road-250 Black, 58 | Black | 58 | BK-R89B-58 |
Or we can use underscores to specify a number of charecters e.g. one _ is one missing charecter. A wildcard (%) would match any number of chars. The figures in brackets then are like regex, so if we want a numeric value between 0 and 9 we use [0-9]. Equally we could use a similar query to find things like email addresses in a string, or email addresses that end in a .co.uk.
SELECT Name, Color, Size, ProductNumber
FROM SalesLT.Product
WHERE ProductNumber LIKE 'BK-_[0-9][0-9]_-[0-9][0-9]';
Name | Color | Size | ProductNumber |
---|---|---|---|
Road-150 Red, 62 | Red | 62 | BK-R93R-62 |
Road-150 Red, 44 | Red | 44 | BK-R93R-44 |
Road-150 Red, 48 | Red | 48 | BK-R93R-48 |
Road-150 Red, 52 | Red | 52 | BK-R93R-52 |
Road-150 Red, 56 | Red | 56 | BK-R93R-56 |
Road-450 Red, 58 | Red | 58 | BK-R68R-58 |
Road-450 Red, 60 | Red | 60 | BK-R68R-60 |
Road-450 Red, 44 | Red | 44 | BK-R68R-44 |
Road-450 Red, 48 | Red | 48 | BK-R68R-48 |
Road-450 Red, 52 | Red | 52 | BK-R68R-52 |
We can use the BETWEEN clause on things like dates to select all products that were removed from sale in 2016:
SELECT Name
FROM SalesLT.Product
WHERE SellEndDate BETWEEN '2006/1/1' AND '2006/12/31';
Name |
---|
Mountain Bike Socks, M |
Mountain Bike Socks, L |
ML Road Frame - Red, 44 |
ML Road Frame - Red, 48 |
ML Road Frame - Red, 52 |
ML Road Frame - Red, 58 |
ML Road Frame - Red, 60 |
HL Mountain Frame - Silver, 44 |
HL Mountain Frame - Silver, 48 |
HL Mountain Frame - Black, 44 |
Note that it is often useful to order results in the order you want, even if it currently appears it the correct order. Sometimes these queries may change as data or the database does, so it is best to be explicit and use an ORDER BY.
1.2.5 Lab Exercises
You are being told that transportation costs are increasing and you need to identify the heaviest products.
-- select the top 10 percent from the Name column
SELECT TOP (10) Percent Name, Weight
FROM SalesLT.Product
-- order by the weight in descending order
ORDER BY Weight DESC;
Name | Weight |
---|---|
Touring-3000 Blue, 62 | 13607.70 |
Touring-3000 Yellow, 62 | 13607.70 |
Touring-3000 Blue, 58 | 13562.34 |
Touring-3000 Yellow, 58 | 13512.45 |
Touring-3000 Blue, 54 | 13462.55 |
Touring-3000 Yellow, 54 | 13344.62 |
Touring-3000 Yellow, 50 | 13213.08 |
Touring-3000 Blue, 50 | 13213.08 |
Touring-3000 Blue, 44 | 13049.78 |
Touring-3000 Yellow, 44 | 13049.78 |
Next, we want to ignore the first 10 records - to page through using offset
SELECT Name
FROM SalesLT.Product
ORDER BY Weight DESC
-- offset 10 rows and get the next 100
OFFSET 10 ROWS FETCH NEXT 100 ROWS ONLY;
Name |
---|
Mountain-500 Silver, 52 |
Mountain-500 Black, 52 |
Mountain-500 Black, 48 |
Mountain-500 Silver, 48 |
Mountain-500 Silver, 44 |
Mountain-500 Black, 44 |
Touring-2000 Blue, 60 |
Mountain-500 Black, 42 |
Mountain-500 Silver, 42 |
Touring-2000 Blue, 54 |
Next we create a query to find the names, colors, and sizes of the products with a product model ID of 1.
-- select the Name, Color, and Size columns
SELECT Name, Color, Size
FROM SalesLT.Product
-- check ProductModelID is 1
WHERE ProductModelID = 1;
Name | Color | Size |
---|---|---|
Classic Vest, S | Blue | S |
Classic Vest, M | Blue | M |
Classic Vest, L | Blue | L |
Now we would like more information on products of certain colors and sizes. We retrieve the product number and name of the products that have a Color of ‘Black’, ‘Red’, or ‘White’ and a Size of ‘S’ or ‘M’.
-- select the ProductNumber and Name columns
SELECT ProductNumber, Name
FROM SalesLT.Product
-- check that Color is one of 'Black', 'Red' or 'White'
-- check that Size is one of 'S' or 'M'
WHERE Color IN ('Black', 'Red', 'White') AND Size IN ('S', 'M');
ProductNumber | Name |
---|---|
SO-B909-M | Mountain Bike Socks, M |
SH-M897-S | Men’s Sports Shorts, S |
SH-M897-M | Men’s Sports Shorts, M |
TG-W091-S | Women’s Tights, S |
TG-W091-M | Women’s Tights, M |
GL-H102-S | Half-Finger Gloves, S |
GL-H102-M | Half-Finger Gloves, M |
GL-F110-S | Full-Finger Gloves, S |
GL-F110-M | Full-Finger Gloves, M |
SH-W890-S | Women’s Mountain Shorts, S |
Next you have been asked to retrieve the product number, name, and list price of products that have a product number beginning with ‘BK-’.
-- select the ProductNumber, Name, and ListPrice columns
SELECT ProductNumber, Name, ListPrice
FROM SalesLT.Product
-- filter for product numbers beginning with BK- using LIKE
WHERE ProductNumber LIKE 'BK-%';
ProductNumber | Name | ListPrice |
---|---|---|
BK-R93R-62 | Road-150 Red, 62 | 3578.27 |
BK-R93R-44 | Road-150 Red, 44 | 3578.27 |
BK-R93R-48 | Road-150 Red, 48 | 3578.27 |
BK-R93R-52 | Road-150 Red, 52 | 3578.27 |
BK-R93R-56 | Road-150 Red, 56 | 3578.27 |
BK-R68R-58 | Road-450 Red, 58 | 1457.99 |
BK-R68R-60 | Road-450 Red, 60 | 1457.99 |
BK-R68R-44 | Road-450 Red, 44 | 1457.99 |
BK-R68R-48 | Road-450 Red, 48 | 1457.99 |
BK-R68R-52 | Road-450 Red, 52 | 1457.99 |
Finally, the product manager is interested in a slight variation of the last request regarding product numbers with a particular prefix.
We are interested in products with product number beginning with ‘BK-’ followed by any character other than ‘R’, and ending with a ‘-’ followed by any two numerals. Not an R is [^R].
-- select the ProductNumber, Name, and ListPrice columns
SELECT ProductNumber, Name, ListPrice
FROM SalesLT.Product
-- filter for ProductNumbers
WHERE ProductNumber LIKE 'BK-[^R]%-[0-9][0-9]';
ProductNumber | Name | ListPrice |
---|---|---|
BK-M82S-38 | Mountain-100 Silver, 38 | 3399.99 |
BK-M82S-42 | Mountain-100 Silver, 42 | 3399.99 |
BK-M82S-44 | Mountain-100 Silver, 44 | 3399.99 |
BK-M82S-48 | Mountain-100 Silver, 48 | 3399.99 |
BK-M82B-38 | Mountain-100 Black, 38 | 3374.99 |
BK-M82B-42 | Mountain-100 Black, 42 | 3374.99 |
BK-M82B-44 | Mountain-100 Black, 44 | 3374.99 |
BK-M82B-48 | Mountain-100 Black, 48 | 3374.99 |
BK-M68S-38 | Mountain-200 Silver, 38 | 2319.99 |
BK-M68S-42 | Mountain-200 Silver, 42 | 2319.99 |
1.3 Querying Tables with Joins
We usually join tables based on primary key - foreign key relationships. We don’t run two queries then join, but match at the time of the query. When we are talking about joins, we typically represent them as Venn diagrams.
The convention (ANSI SQL-92) is to specify the JOIN operator in the FROM clause:
SELECT … FROM Table1 JOIN Table 2 ON
;
There is an older standard (ANSI SQL-89) where the tables are joined using commas in the FROM clause and using a WHERE operator, but this can lead to accidental cartesian (aka cross) products.
1.3.1 INNER Joins
INNER Joins are typically the most common join type. It involves a join only where a match is found in both input tables. You can add multiple joins after each other.
Some examples - first a basic inner join where the schema, table and column are explicitly stated.
SELECT SalesLT.Product.Name AS ProductName, SalesLT.ProductCategory.Name AS Category
FROM SalesLT.Product
INNER JOIN SalesLT.ProductCategory
ON SalesLT.Product.ProductCategoryID = SalesLT.ProductCategory.ProductCategoryID;
ProductName | Category |
---|---|
Mountain-100 Silver, 38 | Mountain Bikes |
Mountain-100 Silver, 42 | Mountain Bikes |
Mountain-100 Silver, 44 | Mountain Bikes |
Mountain-100 Silver, 48 | Mountain Bikes |
Mountain-100 Black, 38 | Mountain Bikes |
Mountain-100 Black, 42 | Mountain Bikes |
Mountain-100 Black, 44 | Mountain Bikes |
Mountain-100 Black, 48 | Mountain Bikes |
Mountain-200 Silver, 38 | Mountain Bikes |
Mountain-200 Silver, 42 | Mountain Bikes |
Next, we can do the same query but make this less cumbersome by using table aliases.
SELECT p.Name AS ProductName, c.Name AS Category
FROM SalesLT.Product AS p
INNER JOIN SalesLT.ProductCategory AS c
on p.ProductCategoryID = c.ProductCategoryID;
ProductName | Category |
---|---|
Mountain-100 Silver, 38 | Mountain Bikes |
Mountain-100 Silver, 42 | Mountain Bikes |
Mountain-100 Silver, 44 | Mountain Bikes |
Mountain-100 Silver, 48 | Mountain Bikes |
Mountain-100 Black, 38 | Mountain Bikes |
Mountain-100 Black, 42 | Mountain Bikes |
Mountain-100 Black, 44 | Mountain Bikes |
Mountain-100 Black, 48 | Mountain Bikes |
Mountain-200 Silver, 38 | Mountain Bikes |
Mountain-200 Silver, 42 | Mountain Bikes |
Next, we look at joining multiple tables, where we want the sales, including order level details, the products in the order and the product details. If we don’t specify the join type, the assumption is it is a INNER join, as shown below.
SELECT oh.OrderDate, oh.SalesOrderNumber, p.Name As ProductName, od.OrderQty, od.UnitPrice, od.LineTotal
FROM SalesLT.SalesOrderHeader AS oh
JOIN SalesLT.SalesOrderDetail AS od
ON od.SalesOrderID = oh.SalesOrderID
JOIN SalesLT.Product AS p
ON od.ProductID = p.ProductID
ORDER BY oh.OrderDate, oh.SalesOrderID, od.SalesOrderDetailID;
OrderDate | SalesOrderNumber | ProductName | OrderQty | UnitPrice | LineTotal |
---|---|---|---|---|---|
2008-06-01 | SO71774 | ML Road Frame-W - Yellow, 48 | 1 | 356.898 | 356.8980 |
2008-06-01 | SO71774 | ML Road Frame-W - Yellow, 38 | 1 | 356.898 | 356.8980 |
2008-06-01 | SO71776 | Rear Brakes | 1 | 63.900 | 63.9000 |
2008-06-01 | SO71780 | ML Mountain Frame-W - Silver, 42 | 4 | 218.454 | 873.8160 |
2008-06-01 | SO71780 | Mountain-400-W Silver, 46 | 2 | 461.694 | 923.3880 |
2008-06-01 | SO71780 | Mountain-500 Silver, 52 | 6 | 112.998 | 406.7928 |
2008-06-01 | SO71780 | HL Mountain Frame - Silver, 38 | 2 | 818.700 | 1637.4000 |
2008-06-01 | SO71780 | Mountain-500 Black, 42 | 1 | 323.994 | 323.9940 |
2008-06-01 | SO71780 | LL Mountain Frame - Black, 48 | 1 | 149.874 | 149.8740 |
2008-06-01 | SO71780 | HL Mountain Frame - Black, 42 | 1 | 809.760 | 809.7600 |
It is possible to do joins based on more than one criteria, e.g. we could join based on the productID and where the ListPrice is less than the unitprice i.e. there has been a discount.
-- Multiple join predicates
SELECT oh.OrderDate, oh.SalesOrderNumber, p.Name As ProductName, od.OrderQty, od.UnitPrice, od.LineTotal
FROM SalesLT.SalesOrderHeader AS oh
JOIN SalesLT.SalesOrderDetail AS od
ON od.SalesOrderID = oh.SalesOrderID
JOIN SalesLT.Product AS p
ON od.ProductID = p.ProductID AND od.UnitPrice < p.ListPrice --Note multiple predicates
ORDER BY oh.OrderDate, oh.SalesOrderID, od.SalesOrderDetailID;
OrderDate | SalesOrderNumber | ProductName | OrderQty | UnitPrice | LineTotal |
---|---|---|---|---|---|
2008-06-01 | SO71774 | ML Road Frame-W - Yellow, 48 | 1 | 356.898 | 356.8980 |
2008-06-01 | SO71774 | ML Road Frame-W - Yellow, 38 | 1 | 356.898 | 356.8980 |
2008-06-01 | SO71776 | Rear Brakes | 1 | 63.900 | 63.9000 |
2008-06-01 | SO71780 | ML Mountain Frame-W - Silver, 42 | 4 | 218.454 | 873.8160 |
2008-06-01 | SO71780 | Mountain-400-W Silver, 46 | 2 | 461.694 | 923.3880 |
2008-06-01 | SO71780 | Mountain-500 Silver, 52 | 6 | 112.998 | 406.7928 |
2008-06-01 | SO71780 | HL Mountain Frame - Silver, 38 | 2 | 818.700 | 1637.4000 |
2008-06-01 | SO71780 | Mountain-500 Black, 42 | 1 | 323.994 | 323.9940 |
2008-06-01 | SO71780 | LL Mountain Frame - Black, 48 | 1 | 149.874 | 149.8740 |
2008-06-01 | SO71780 | HL Mountain Frame - Black, 42 | 1 | 809.760 | 809.7600 |
1.3.2 OUTER Joins
In an outer join we return all the rows from one table, and any matching rows from the second table. The records in the ‘outer’ table are preserved, typically we use language such as LEFT, RIGHT and FULL keywords. We are pulling in records from that OUTER table. FULL keeps records from both tables, but are typically not seen in practice. OUTER is OPTIONAL e.g. LEFT JOIN is the same as LEFT OUTER JOIN.
Now some examples. First, we bring up a list of customers, with any matching sales records i.e. we have a list of customers who HAVE bought someting and those WHO HAVE NOT.
--Get all customers, with sales orders for those who've bought anything
SELECT c.FirstName, c.LastName, oh.SalesOrderNumber
FROM SalesLT.Customer AS c
LEFT OUTER JOIN SalesLT.SalesOrderHeader AS oh
ON c.CustomerID = oh.CustomerID
ORDER BY c.CustomerID;
FirstName | LastName | SalesOrderNumber |
---|---|---|
Orlando | Gee | NA |
Keith | Harris | NA |
Donna | Carreras | NA |
Janet | Gates | NA |
Lucy | Harrington | NA |
Rosmarie | Carroll | NA |
Dominic | Gash | NA |
Kathleen | Garza | NA |
Katherine | Harding | NA |
Johnny | Caprio | NA |
Next we can look just for those customers who have not purchased anything using IS NULL.
--Return only customers who haven't purchased anything
SELECT c.FirstName, c.LastName, oh.SalesOrderNumber
FROM SalesLT.Customer AS c
LEFT OUTER JOIN SalesLT.SalesOrderHeader AS oh
ON c.CustomerID = oh.CustomerID
WHERE oh.SalesOrderNumber IS NULL
ORDER BY c.CustomerID;
FirstName | LastName | SalesOrderNumber |
---|---|---|
Orlando | Gee | NA |
Keith | Harris | NA |
Donna | Carreras | NA |
Janet | Gates | NA |
Lucy | Harrington | NA |
Rosmarie | Carroll | NA |
Dominic | Gash | NA |
Kathleen | Garza | NA |
Katherine | Harding | NA |
Johnny | Caprio | NA |
Next we add records from multiple tables. If we add tables on to the chain of tables, having first decleared a left or right join, you have to keep using LEFT joins. You could use an INNER Join but you would loose some records e.g. if you used an INNER join on the second table below, you would loose those records (products) that had never been sold.
Sometimes it is neccessary to go through one table to get to another e.g. to Products -> Orders requires going through order details first. Even if, we are not brining back any tables from the intermediary table.
--More than 2 tables
SELECT p.Name As ProductName, oh.SalesOrderNumber
FROM SalesLT.Product AS p
LEFT JOIN SalesLT.SalesOrderDetail AS od
ON p.ProductID = od.ProductID
LEFT JOIN SalesLT.SalesOrderHeader AS oh --Additional tables added to the right must also use a left join
ON od.SalesOrderID = oh.SalesOrderID
ORDER BY p.ProductID;
ProductName | SalesOrderNumber |
---|---|
HL Road Frame - Black, 58 | NA |
HL Road Frame - Red, 58 | NA |
Sport-100 Helmet, Red | SO71782 |
Sport-100 Helmet, Red | SO71783 |
Sport-100 Helmet, Red | SO71784 |
Sport-100 Helmet, Red | SO71797 |
Sport-100 Helmet, Red | SO71902 |
Sport-100 Helmet, Red | SO71936 |
Sport-100 Helmet, Red | SO71938 |
Sport-100 Helmet, Black | SO71782 |
Next another example with multiple tables, but this time the order of the tables is different. We SELECT from the product table, then we LEFT Join the Order details so we can identify products that have never sold, then we LEFT JOIN to the order header table so we can get the order number where we use a left outer join as we are left joining those records to our original table. Then we INNER Join product category, because we are joining product category back to the first table - Products. The final table joins back to our original table, before the outer joins. So whether you need to use an OUTER or INNER join depends on where you wish to place the records based on the current list of tables, however you could reorder the tables to do it differently e.g. do the INNER JOIN first, then then LEFT JOINS.
SELECT p.Name As ProductName, c.Name AS Category, oh.SalesOrderNumber
FROM SalesLT.Product AS p
LEFT OUTER JOIN SalesLT.SalesOrderDetail AS od
ON p.ProductID = od.ProductID
LEFT OUTER JOIN SalesLT.SalesOrderHeader AS oh
ON od.SalesOrderID = oh.SalesOrderID
INNER JOIN SalesLT.ProductCategory AS c --Added to the left, so can use inner join
ON p.ProductCategoryID = c.ProductCategoryID
ORDER BY p.ProductID;
ProductName | Category | SalesOrderNumber |
---|---|---|
HL Road Frame - Black, 58 | Road Frames | NA |
HL Road Frame - Red, 58 | Road Frames | NA |
Sport-100 Helmet, Red | Helmets | SO71782 |
Sport-100 Helmet, Red | Helmets | SO71783 |
Sport-100 Helmet, Red | Helmets | SO71784 |
Sport-100 Helmet, Red | Helmets | SO71797 |
Sport-100 Helmet, Red | Helmets | SO71902 |
Sport-100 Helmet, Red | Helmets | SO71936 |
Sport-100 Helmet, Red | Helmets | SO71938 |
Sport-100 Helmet, Black | Helmets | SO71782 |
** Key Points**
- Use a Left Outer Join to include all rows from the first table and values from matched rows in the second table. Columns in the second table for which no matching rows exist are populated with NULLs.
- Use a Right Outer Join to include all rows from the second table and values from matched rows in the first table. Columns in the first table for which no matching rows exist are populated with NULLs.
- Use a Full Outer Join to include all rows from the first and second tables. Columns in the either table for which no matching rows exist are populated with NULLs.
1.3.3 Cross Joins
Cross Joins create caretesian products - they combine each row from the first table with each row from the second table - to give all possible combinations of products.
One example of it being used would be if we have a list of staff with their attributes, we might want to compare them to the attributes required for all jobs to see which they match, perhaps as an internal job search. Another example is if we have two list of addresses and we calculate the edit distance between the two addresses from the different sources, to try and match those addresses. We would rank the addresses based on their edit distance, with those scoring highest the closest and most likely to be the same address. It can be used to generate test data also.
An example
--Call each customer once per product - perhaps not the most realistic example!
SELECT p.Name, c.FirstName, c.LastName, c.Phone
FROM SalesLT.Product as p
CROSS JOIN SalesLT.Customer as c;
Name | FirstName | LastName | Phone |
---|---|---|---|
All-Purpose Bike Stand | Orlando | Gee | 245-555-0173 |
All-Purpose Bike Stand | Keith | Harris | 170-555-0127 |
All-Purpose Bike Stand | Donna | Carreras | 279-555-0130 |
All-Purpose Bike Stand | Janet | Gates | 710-555-0173 |
All-Purpose Bike Stand | Lucy | Harrington | 828-555-0186 |
All-Purpose Bike Stand | Rosmarie | Carroll | 244-555-0112 |
All-Purpose Bike Stand | Dominic | Gash | 192-555-0173 |
All-Purpose Bike Stand | Kathleen | Garza | 150-555-0127 |
All-Purpose Bike Stand | Katherine | Harding | 926-555-0159 |
All-Purpose Bike Stand | Johnny | Caprio | 112-555-0191 |
1.3.4 Self Joins
You might want to join data on to itself but in a different sequence. For instance, we might want to join a person’s manager on to their employee(s), but managers are also employees, so this would be a self join. We would use aliases for the table names as we have the same table twice. So when defining a self-join, you must specify an alias for at least one instance of the table being joined.
So our original table looks like this:
SELECT *
FROM SalesLT.Employee;
EmployeeID | EmployeeName | ManagerID |
---|---|---|
1 | adventure-works8 | 8 |
2 | adventure-works1 | 1 |
3 | adventure-works0 | NA |
4 | adventure-works0 | 3 |
5 | adventure-works1 | 1 |
6 | adventure-works3 | 3 |
7 | adventure-works9 | 9 |
8 | adventure-works0 | 3 |
9 | adventure-works0 | 3 |
Then the actual self join query - we use a left join as some people i.e. the CEO, will not have a manager.
SELECT e.EmployeeName, m.EmployeeName AS ManagerName
FROM SalesLT.Employee AS e
LEFT JOIN SalesLT.Employee AS m
ON e.ManagerID = m.EmployeeID
ORDER BY e.ManagerID;
EmployeeName | ManagerName |
---|---|
adventure-works0 | NA |
adventure-works1 | adventure-works8 |
adventure-works1 | adventure-works8 |
adventure-works3 | adventure-works0 |
adventure-works0 | adventure-works0 |
adventure-works0 | adventure-works0 |
adventure-works0 | adventure-works0 |
adventure-works8 | adventure-works0 |
adventure-works9 | adventure-works0 |
1.3.5 Lab Exercises
Write a query that returns the company name from the Sale.Customer table, the sales order ID and total due from the SalesLT.SalesOrderHeader table.
-- select the CompanyName, SalesOrderId, and TotalDue columns from the appropriate tables
SELECT c.CompanyName, oh.SalesOrderId, oh.TotalDue
FROM SalesLT.Customer AS c
JOIN SalesLT.SalesOrderHeader AS oh
-- join tables based on CustomerID
ON c.CustomerID = oh.CustomerID;
CompanyName | SalesOrderId | TotalDue |
---|---|---|
Professional Sales and Service | 71782 | 43962.7901 |
Remarkable Bike Store | 71935 | 7330.8972 |
Bulk Discount Store | 71938 | 98138.2131 |
Coalition Bike Company | 71899 | 2669.3183 |
Futuristic Bikes | 71895 | 272.6468 |
Channel Outlet | 71885 | 608.1766 |
Aerobic Exercise Company | 71915 | 2361.6403 |
Vigorous Sports Store | 71867 | 1170.5376 |
Thrilling Bike Tours | 71858 | 15275.1977 |
Extreme Riding Supplies | 71796 | 63686.2708 |
In order to send out invoices to the customers, we need their addresses. Extend your customer orders query to include the main office address for each customer, including the full street address, city, state or province, postal code, and country or region.
SELECT c.CompanyName, a.AddressLine1, ISNULL(a.AddressLine2, '') AS AddressLine2, a.City, a.StateProvince, a.PostalCode, a.CountryRegion, oh.SalesOrderID, oh.TotalDue
FROM SalesLT.Customer AS c
-- join the SalesOrderHeader table
JOIN SalesLT.SalesOrderHeader AS oh
ON oh.CustomerID = c.CustomerID
-- join the CustomerAddress table
JOIN SalesLT.CustomerAddress AS ca
-- filter for where the AddressType is 'Main Office'
ON c.CustomerID = ca.CustomerID AND AddressType = 'Main Office'
JOIN SalesLT.Address AS a
ON ca.AddressID = a.AddressID;
CompanyName | AddressLine1 | AddressLine2 | City | StateProvince | PostalCode | CountryRegion | SalesOrderID | TotalDue |
---|---|---|---|---|---|---|---|---|
Good Toys | 99700 Bell Road | Auburn | California | 95603 | United States | 71774 | 972.7850 | |
West Side Mart | 251 The Metro Center | Wokingham | England | RG41 1QW | United Kingdom | 71776 | 87.0851 | |
Nearby Cycle Shop | Burgess Hill | Edward Way | West Sussex | England | RH15 9UD | United Kingdom | 71780 | 42452.6519 |
Professional Sales and Service | 57251 Serene Blvd | Van Nuys | California | 91411 | United States | 71782 | 43962.7901 | |
Eastside Department Store | 9992 Whipple Rd | Union City | California | 94587 | United States | 71783 | 92663.5609 | |
Action Bicycle Specialists | Warrington Ldc Unit 25/2 | Woolston | England | WA1 4SY | United Kingdom | 71784 | 119960.8240 | |
Extreme Riding Supplies | Riverside | Sherman Oaks | California | 91403 | United States | 71796 | 63686.2708 | |
Riding Cycles | Galashiels | Liverpool | England | L4 4HB | United Kingdom | 71797 | 86222.8072 | |
Thrifty Parts and Sales | Oxnard Outlet | Oxnard | California | 93030 | United States | 71815 | 1261.4440 | |
Engineered Bike Systems | 123 Camelia Avenue | Oxnard | California | 93030 | United States | 71816 | 3754.9733 |
The sales manager wants a list of all customer companies and their contacts (first name and last name), showing the sales order ID and total due for each order they have placed. Customers who have not placed any orders should be included at the bottom of the list with NULL values for the order ID and total due.
-- select the CompanyName, FirstName, LastName, SalesOrderID and TotalDue columns
-- from the appropriate tables
SELECT c.CompanyName, c.FirstName, c.LastName, oh.SalesOrderID, oh.TotalDue
FROM SalesLT.Customer AS c
LEFT JOIN SalesLT.SalesOrderHeader AS oh
-- join based on CustomerID
ON oh.CustomerID = c.CustomerID
-- order the SalesOrderIDs from highest to lowest
ORDER BY oh.SalesOrderID DESC;
CompanyName | FirstName | LastName | SalesOrderID | TotalDue |
---|---|---|---|---|
Central Bicycle Specialists | Janeth | Esteves | 71946 | 43.0437 |
Bulk Discount Store | Christopher | Beck | 71938 | 98138.2131 |
Metropolitan Bicycle Supply | Krishna | Sunkammurali | 71936 | 108597.9536 |
Remarkable Bike Store | Cory | Booth | 71935 | 7330.8972 |
The Bicycle Accessories Company | Guy | Gilbert | 71923 | 117.7276 |
Discount Tours | Melissa | Marple | 71920 | 3293.7761 |
Essential Bike Works | Linda | Mitchell | 71917 | 45.1995 |
Aerobic Exercise Company | Rosmarie | Carroll | 71915 | 2361.6403 |
Many Bikes Store | Jeffrey | Kurtz | 71902 | 81834.9826 |
Coalition Bike Company | Donald | Blanton | 71899 | 2669.3183 |
A sales employee has noticed that AdventureWorks does not have address information for all customers. Write a query that returns a list of customer IDs, company names, contact names (first name and last name), and phone numbers for customers with no address stored in the database.
SELECT c.CompanyName, c.FirstName, c.LastName, c.Phone
FROM SalesLT.Customer AS c
LEFT JOIN SalesLT.CustomerAddress AS ca
-- join based on CustomerID
ON c.CustomerID = ca.CustomerID
-- filter for when the AddressID doesn't exist
WHERE ca.AddressID IS NULL;
CompanyName | FirstName | LastName | Phone |
---|---|---|---|
A Bike Store | Orlando | Gee | 245-555-0173 |
Progressive Sports | Keith | Harris | 170-555-0127 |
Advanced Bike Components | Donna | Carreras | 279-555-0130 |
Modular Cycle Systems | Janet | Gates | 710-555-0173 |
Metropolitan Sports Supply | Lucy | Harrington | 828-555-0186 |
Aerobic Exercise Company | Rosmarie | Carroll | 244-555-0112 |
Associated Bikes | Dominic | Gash | 192-555-0173 |
Rural Cycle Emporium | Kathleen | Garza | 150-555-0127 |
Sharp Bikes | Katherine | Harding | 926-555-0159 |
Bikes and Motorbikes | Johnny | Caprio | 112-555-0191 |
Some customers have never placed orders, and some products have never been ordered.
Write a query that returns a column of customer IDs for customers who have never placed an order, and a column of product IDs for products that have never been ordered.
Each row with a customer ID should have a NULL product ID (because the customer has never ordered a product) and each row with a product ID should have a NULL customer ID (because the product has never been ordered by a customer).
SELECT c.CustomerID, p.ProductID
FROM SalesLT.Customer AS c
FULL JOIN SalesLT.SalesOrderHeader AS oh
ON c.CustomerID = oh.CustomerID
FULL JOIN SalesLT.SalesOrderDetail AS od
-- join based on the SalesOrderID
ON od.SalesOrderID = oh.SalesOrderID
FULL JOIN SalesLT.Product AS p
-- join based on the ProductID
ON p.ProductID = od.ProductID
-- filter for nonexistent SalesOrderIDs
WHERE oh.SalesOrderID IS NULL
ORDER BY ProductID, CustomerID;
CustomerID | ProductID |
---|---|
1 | NA |
2 | NA |
3 | NA |
4 | NA |
5 | NA |
6 | NA |
7 | NA |
10 | NA |
11 | NA |
12 | NA |
1.4 Using Set Operators
A union query is unlike a join, where as a join adds more columns, a union typically adds more rows. You put all records from one query on to the records at the end of another query. NOTE that it is a list of distinct (non duplicate) records - this will be checked every single row in one table and then checks if that record exists across every single row in the other table. Obviously as more tables are added, checking this can become more time consuming and affects performance.
UNION ALL will not undertake this checking, leading to more performance, but some duplicates. John Smith may appear in the employees table but he may also appear in the customers table. Sometimes this is the desired result. In such an instance it makes sense to add a new field using an alias at the time of ther query, such as record type, so we know where this record occurs.
When using Union:
- It is a good idea to use column aliases, so we know which table the column occurs in. However, only aliases in the first query are recognised, so any aliases should be set against the first table
- The number of columns must be the same across tables, you can add an additional column but this must be given a specific value or a NULL
- The data types must be approximatley similar i.e. we can do an implicit or explict conversion so they match.
The following example joins the Employees to the customer using Union i.e. removing duplicates.
NOTE: In the following example, views were created (using code provided in the course), before running the queries on the database. Therefore to reproduce these results, it is neccessary to run that same code (Module 4, Union sql file, in the CourseFiles zip) before the following queries will run on the AdventureWorksLT db.
SELECT FirstName, LastName
FROM SalesLT.Employees
UNION
SELECT FirstName, LastName
FROM SalesLT.Customers
ORDER BY LastName;
FirstName | LastName |
---|---|
Catherine | Abel |
Kim | Abercrombie |
Frances | Adams |
Jay | Adams |
Samuel | Agcaoili |
Robert | Ahlering |
Stanley | Alan |
Amy | Alberts |
Paul | Alcorn |
Gregory | Alderson |
This gives 440 rows, if however we use UNION ALL
SELECT FirstName, LastName
FROM SalesLT.Employees
UNION ALL
SELECT FirstName, LastName
FROM SalesLT.Customers
ORDER BY LastName;
FirstName | LastName |
---|---|
Catherine | Abel |
Kim | Abercrombie |
Frances | Adams |
Jay | Adams |
Samuel | Agcaoili |
Robert | Ahlering |
Stanley | Alan |
Amy | Alberts |
Paul | Alcorn |
Gregory | Alderson |
This results in 441 rows, so there is 1 row which occurs in both - a Donna Carreras appears both in the customer and emplyee table. This could be the same person or indeed just someone with the same name.
It might be useful to know which table the record occurs in, which is where we can introduce a Type column. Not that after the first alias AS Type, subsequent fields e.g. ‘Customer’ as shown below do not need to be explicitly aliased, however it can make it easier to understand if we do include the AS Type.
SELECT FirstName, LastName, 'Employee' AS Type
FROM SalesLT.Employees
UNION
SELECT FirstName, LastName, 'Customer' AS Type
FROM SalesLT.Customers
ORDER BY LastName
OFFSET 100 ROWS FETCH NEXT 5 ROWS ONLY;
FirstName | LastName | Type |
---|---|---|
Andy | Carothers | Employee |
Donna | Carreras | Customer |
Donna | Carreras | Employee |
Rosmarie | Carroll | Employee |
Joseph | Castellucio | Employee |
Note that even though we have used UNION rather than UNION ALL, this would result in the same number of rows as UNION ALL (441), because the presence of the type column means this is no longer the same person - it is not a duplicate record but a unique one. However, it is worth using UNION ALL to limit the number of records that have to be checked for duplicates, so improving peformance.
By default, UNION eliminates duplicate rows. Specify the ALL option to include duplicates (or to avoid the overhead of checking for duplicates when you know in advance that there are none).
1.4.1 INTERSECT and EXCEPT Queries
In INTERSECT we look at only rows that appear in each set - we are looking for the duplicates. Whereas using a JOIN would append columns to the source table, in set theory, we are looking for rows, so in an INTERSECT we are looking for those rows that appear in both datasets.
IN EXCEPT we are looking for records that appear in one source but not the other. In such a scenario, the order of the tables matters.
If we now try and identify the person who appears in both our datasets, we can identify the person directly.
SELECT FirstName, LastName
FROM SalesLT.Customers
INTERSECT
SELECT FirstName, LastName
FROM SalesLT.Employees;
FirstName | LastName |
---|---|
Donna | Carreras |
Next we want to find the EXCEPT - so let’s find customers who are not employees, so we don’t try and sel products to employees. There are 104 customers, so we should get 103 rows, as we remove Donna Carreras.
SELECT FirstName, LastName
FROM SalesLT.Customers
EXCEPT
SELECT FirstName, LastName
FROM SalesLT.Employees;
FirstName | LastName |
---|---|
Abraham | Swearengin |
Ajay | Manchepalli |
Alan | Steiner |
Alice | Steiner |
Andrea | Thomsen |
Ben | Miller |
Billy | Trent |
Brad | Sutton |
Caroline | Vicknair |
Cecelia | Marshall |
1.4.2 Lab Exercises
Customers can have two kinds of address: a main office address and a shipping address. The accounts department wants to ensure that the main office address is always used for billing, and have asked you to write a query that clearly identifies the different types of address for each customer.
-- select the CompanyName, AddressLine1 columns
-- alias as per the instructions
SELECT CompanyName, AddressLine1, City, 'Billing' AS AddressType
FROM SalesLT.Customer AS c
JOIN SalesLT.CustomerAddress AS ca
-- join based on CustomerID
ON c.CustomerID = ca.CustomerID
-- join another table
JOIN SalesLT.Address AS a
-- join based on AddressID
ON ca.AddressID = a.AddressID
-- filter for where the correct AddressType
WHERE ca.AddressType = 'Main Office';
CompanyName | AddressLine1 | City | AddressType |
---|---|---|---|
Professional Sales and Service | 57251 Serene Blvd | Van Nuys | Billing |
Riders Company | Tanger Factory | Branch | Billing |
Area Bike Accessories | 6900 Sisk Road | Modesto | Billing |
Bicycle Accessories and Kits | Lewiston Mall | Lewiston | Billing |
Valley Bicycle Specialists | Blue Ridge Mall | Kansas City | Billing |
Vinyl and Plastic Goods Corporation | No. 25800-130 King Street West | Toronto | Billing |
Fun Toys and Bikes | 6500 East Grant Road | Tucson | Billing |
Great Bikes | Eastridge Mall | Casper | Billing |
Valley Toy Store | 252851 Rowan Place | Richmond | Billing |
Major Sport Suppliers | White Mountain Mall | Rock Springs | Billing |
Adapt the query to retrieve the company name, first line of the street address, city, and a column named AddressType with the value ‘Shipping’ for customers where the address type in the SalesLT.CustomerAddress table is ‘Shipping’.
SELECT c.CompanyName, a.AddressLine1, a.City, 'Shipping' AS AddressType
FROM SalesLT.Customer AS c
JOIN SalesLT.CustomerAddress AS ca
ON c.CustomerID = ca.CustomerID
JOIN SalesLT.Address AS a
ON ca.AddressID = a.AddressID
WHERE ca.AddressType = 'Shipping';
CompanyName | AddressLine1 | City | AddressType |
---|---|---|---|
Family’s Favorite Bike Shop | 26910 Indela Road | Montreal | Shipping |
Center Cycle Shop | 1318 Lasalle Street | Bothell | Shipping |
Safe Cycles Shop | 2681 Eagle Peak | Bellevue | Shipping |
Modular Cycle Systems | 165 North Main | Austin | Shipping |
Progressive Sports | 7943 Walnut Ave | Renton | Shipping |
Hardware Components | 99 Front Street | Minneapolis | Shipping |
Sample Bike Store | 2000 300th Street | Denver | Shipping |
Racing Toys | 9228 Via Del Sol | Phoenix | Shipping |
Elite Bikes | 9178 Jumping St. | Dallas | Shipping |
All Cycle Shop | 8713 Yosemite Ct. | Bothell | Shipping |
Next we can union all these records together to create a list of shipping and billing addresses, using UNION ALL.
SELECT c.CompanyName, a.AddressLine1, a.City, 'Billing' AS AddressType
FROM SalesLT.Customer AS c
JOIN SalesLT.CustomerAddress AS ca
ON c.CustomerID = ca.CustomerID
JOIN SalesLT.Address AS a
ON ca.AddressID = a.AddressID
WHERE ca.AddressType = 'Main Office'
-- UNION
UNION ALL
SELECT c.CompanyName, a.AddressLine1, a.City, 'Shipping' AS AddressType
FROM SalesLT.Customer AS c
JOIN SalesLT.CustomerAddress AS ca
ON c.CustomerID = ca.CustomerID
JOIN SalesLT.Address AS a
ON ca.AddressID = a.AddressID
WHERE ca.AddressType = 'Shipping'
ORDER BY c.CompanyName
OFFSET 10 ROWS;
CompanyName | AddressLine1 | City | AddressType |
---|---|---|---|
All Cycle Shop | 8713 Yosemite Ct. | Bothell | Shipping |
All Cycle Shop | 25111 228th St Sw | Bothell | Billing |
All Seasons Sports Supply | Ohms Road | Houston | Billing |
Alpine Ski House | 7505 Laguna Boulevard | Elk Grove | Billing |
Alternative Vehicles | 3307 Evergreen Blvd | Washougal | Billing |
Another Bicycle Company | 567 Sw Mcloughlin Blvd | Milwaukie | Billing |
Area Bike Accessories | 6900 Sisk Road | Modesto | Billing |
Area Sheet Metal Supply | 399 Clearing Green | London | Billing |
Associated Bikes | 5420 West 22500 South | Salt Lake City | Billing |
Authentic Sales and Service | 99 Dean Street, Soho | London | Billing |
You have created a master list of all customer addresses, but now you have been asked to create filtered lists that show which customers have only a main office address, and which customers have both a main office and a shipping address.
SELECT c.CompanyName
FROM SalesLT.Customer AS c
INNER JOIN SalesLT.CustomerAddress AS ca
ON c.CustomerID = ca.CustomerID
INNER JOIN SalesLT.Address AS a
ON a.AddressID = ca.AddressID
WHERE ca.AddressType = 'Main Office' --Filters out shipping addresses. This is a table of all Billing addresses.
EXCEPT
SELECT c.CompanyName
FROM SalesLT.Customer AS c
INNER JOIN SalesLT.CustomerAddress AS ca
ON c.CustomerID = ca.CustomerID
INNER JOIN SalesLT.Address AS a
ON a.AddressID = ca.AddressID
WHERE ca.AddressType = 'Shipping'
ORDER BY c.CompanyName;
CompanyName |
---|
A Bike Store |
A Great Bicycle Company |
A Typical Bike Shop |
Acceptable Sales & Service |
Action Bicycle Specialists |
Active Life Toys |
Active Systems |
Advanced Bike Components |
Aerobic Exercise Company |
Affordable Sports Equipment |
Or the INTERSECT version to identify the company name of each company that appears in a table of customers with a ‘Main Office’ address, and also in a table of customers with a ‘Shipping’ address.
SELECT c.CompanyName
FROM SalesLT.Customer AS c
INNER JOIN SalesLT.CustomerAddress AS ca
ON c.CustomerID = ca.CustomerID
INNER JOIN SalesLT.Address AS a
ON a.AddressID = ca.AddressID
WHERE ca.AddressType = 'Main Office' --Filters out shipping addresses. This is a table of all Billing addresses.
INTERSECT
SELECT c.CompanyName
FROM SalesLT.Customer AS c
INNER JOIN SalesLT.CustomerAddress AS ca
ON c.CustomerID = ca.CustomerID
INNER JOIN SalesLT.Address AS a
ON a.AddressID = ca.AddressID
WHERE ca.AddressType = 'Shipping'
ORDER BY c.CompanyName;
CompanyName |
---|
All Cycle Shop |
Center Cycle Shop |
Elite Bikes |
Family’s Favorite Bike Shop |
Hardware Components |
Modular Cycle Systems |
Progressive Sports |
Racing Toys |
Safe Cycles Shop |
Sample Bike Store |
1.5 Using Functions and Aggregating Data
Now we are looking to not just bring back individual rows but perform calculations, such as aggregations, on those results.
The functions we will look at are shown below.
1.5.1 Scalar Functions
A scalar function returns a single value, not a row or column or table. In database design things can be deterministic or non-deterministic
- Deterministic - we know what the result will be, assuming the data hasn’t changed. The data going in and coming out will be the same
- Non-deterministic - we cannot gaurantee what the result will be. For instance if we tested if todays date was less than a value in a table, it would depend on the time and day that the query was run. We can’t say what the result will be, based on the data going in/from the db.
A scalar function can be either deterministic or non-deterministic. Scalar is a set of functions, and can do multiple things - date and times, text and image, mathematical, system and system statistical, metadata and so on.
We might want to extract the year from a date, for instance to determine when a product was first sold.
SELECT YEAR(SellStartDate) SellStartYear, ProductID, Name
FROM SalesLT.Product
ORDER BY SellStartYear;
SellStartYear | ProductID | Name |
---|---|---|
2002 | 680 | HL Road Frame - Black, 58 |
2002 | 706 | HL Road Frame - Red, 58 |
2005 | 707 | Sport-100 Helmet, Red |
2005 | 708 | Sport-100 Helmet, Black |
2005 | 709 | Mountain Bike Socks, M |
2005 | 710 | Mountain Bike Socks, L |
2005 | 711 | Sport-100 Helmet, Blue |
2005 | 712 | AWC Logo Cap |
2005 | 713 | Long-Sleeve Logo Jersey, S |
2005 | 714 | Long-Sleeve Logo Jersey, M |
Other examples might include being able to extract certain parts of the date, like the month or day of the week, this can be achived with the DATENAME function
SELECT YEAR(SellStartDate) SellStartYear, DATENAME(mm,SellStartDate) SellStartMonth,
DAY(SellStartDate) SellStartDay, DATENAME(dw, SellStartDate) SellStartWeekday,
ProductID, Name
FROM SalesLT.Product
ORDER BY SellStartYear;
SellStartYear | SellStartMonth | SellStartDay | SellStartWeekday | ProductID | Name |
---|---|---|---|---|---|
2002 | June | 1 | Saturday | 680 | HL Road Frame - Black, 58 |
2002 | June | 1 | Saturday | 706 | HL Road Frame - Red, 58 |
2005 | July | 1 | Friday | 707 | Sport-100 Helmet, Red |
2005 | July | 1 | Friday | 708 | Sport-100 Helmet, Black |
2005 | July | 1 | Friday | 709 | Mountain Bike Socks, M |
2005 | July | 1 | Friday | 710 | Mountain Bike Socks, L |
2005 | July | 1 | Friday | 711 | Sport-100 Helmet, Blue |
2005 | July | 1 | Friday | 712 | AWC Logo Cap |
2005 | July | 1 | Friday | 713 | Long-Sleeve Logo Jersey, S |
2005 | July | 1 | Friday | 714 | Long-Sleeve Logo Jersey, M |
Or we might want to calculate how long a product has been sold, we can use the DATEDIFF function to work this out.
SELECT DATEDIFF(yy,SellStartDate, GETDATE()) YearsSold,
DATEDIFF(mm,SellStartDate, GETDATE()) MonthsSold,
ProductID, Name
FROM SalesLT.Product
ORDER BY ProductID;
YearsSold | MonthsSold | ProductID | Name |
---|---|---|---|
16 | 190 | 680 | HL Road Frame - Black, 58 |
16 | 190 | 706 | HL Road Frame - Red, 58 |
13 | 153 | 707 | Sport-100 Helmet, Red |
13 | 153 | 708 | Sport-100 Helmet, Black |
13 | 153 | 709 | Mountain Bike Socks, M |
13 | 153 | 710 | Mountain Bike Socks, L |
13 | 153 | 711 | Sport-100 Helmet, Blue |
13 | 153 | 712 | AWC Logo Cap |
13 | 153 | 713 | Long-Sleeve Logo Jersey, S |
13 | 153 | 714 | Long-Sleeve Logo Jersey, M |
Another common example might be to convert text to upper case.
SELECT UPPER(Name) AS ProductName
FROM SalesLT.Product;
ProductName |
---|
ALL-PURPOSE BIKE STAND |
AWC LOGO CAP |
BIKE WASH - DISSOLVER |
CABLE LOCK |
CHAIN |
CLASSIC VEST, L |
CLASSIC VEST, M |
CLASSIC VEST, S |
FENDER SET - MOUNTAIN |
FRONT BRAKES |
We might want to add two strings together but with a space or other text, so we could use the CONCAT function to achieve this.
SELECT CONCAT(FirstName + ' ', LastName) AS FullName,
FirstName, LastName
FROM SalesLT.Customer;
FullName | FirstName | LastName |
---|---|---|
Orlando Gee | Orlando | Gee |
Keith Harris | Keith | Harris |
Donna Carreras | Donna | Carreras |
Janet Gates | Janet | Gates |
Lucy Harrington | Lucy | Harrington |
Rosmarie Carroll | Rosmarie | Carroll |
Dominic Gash | Dominic | Gash |
Kathleen Garza | Kathleen | Garza |
Katherine Harding | Katherine | Harding |
Johnny Caprio | Johnny | Caprio |
Or we might want to return just a specific number of charecters from a string, perhaps there is a structure in the sequence - like the first two chars relate to a product type - so we want to just extract these first two.
SELECT Name, ProductNumber, LEFT(ProductNumber, 2) AS ProductType
FROM SalesLT.Product;
Name | ProductNumber | ProductType |
---|---|---|
HL Road Frame - Black, 58 | FR-R92B-58 | FR |
HL Road Frame - Red, 58 | FR-R92R-58 | FR |
Sport-100 Helmet, Red | HL-U509-R | HL |
Sport-100 Helmet, Black | HL-U509 | HL |
Mountain Bike Socks, M | SO-B909-M | SO |
Mountain Bike Socks, L | SO-B909-L | SO |
Sport-100 Helmet, Blue | HL-U509-B | HL |
AWC Logo Cap | CA-1098 | CA |
Long-Sleeve Logo Jersey, S | LJ-0192-S | LJ |
Long-Sleeve Logo Jersey, M | LJ-0192-M | LJ |
Or we might have something more complex, where we want to find certain elements of text, we can use CHARINDEX to identify where a char occurs, then combine it with SUBSTRING to extract just a portion of the full string which is n chars before or after a particular charecter.
SELECT Name, ProductNumber, LEFT(ProductNumber, 2) AS ProductType,
SUBSTRING(ProductNumber,CHARINDEX('-', ProductNumber) + 1, 4) AS ModelCode,
SUBSTRING(ProductNumber, LEN(ProductNumber) - CHARINDEX('-', REVERSE(RIGHT(ProductNumber, 3))) + 2, 2) AS SizeCode
FROM SalesLT.Product;
Name | ProductNumber | ProductType | ModelCode | SizeCode |
---|---|---|---|---|
HL Road Frame - Black, 58 | FR-R92B-58 | FR | R92B | 58 |
HL Road Frame - Red, 58 | FR-R92R-58 | FR | R92R | 58 |
Sport-100 Helmet, Red | HL-U509-R | HL | U509 | R |
Sport-100 Helmet, Black | HL-U509 | HL | U509 | |
Mountain Bike Socks, M | SO-B909-M | SO | B909 | M |
Mountain Bike Socks, L | SO-B909-L | SO | B909 | L |
Sport-100 Helmet, Blue | HL-U509-B | HL | U509 | B |
AWC Logo Cap | CA-1098 | CA | 1098 | |
Long-Sleeve Logo Jersey, S | LJ-0192-S | LJ | 0192 | S |
Long-Sleeve Logo Jersey, M | LJ-0192-M | LJ | 0192 | M |
1.5.2 Logical Functions
Logical functions test if someting is true or not i.e. traditional boolean. But we can use logical functions as filters by using CHOOSE for example, you could use IIF which is also a logical, or we could use CASE to achieve the same result.
An example might be if we want to return some numeric sizes, for instance if we have a table of data with different size formats:
SELECT Name, Size
FROM SalesLT.Product;
Name | Size |
---|---|
HL Road Frame - Black, 58 | 58 |
HL Road Frame - Red, 58 | 58 |
Sport-100 Helmet, Red | NA |
Sport-100 Helmet, Black | NA |
Mountain Bike Socks, M | M |
Mountain Bike Socks, L | L |
Sport-100 Helmet, Blue | NA |
AWC Logo Cap | NA |
Long-Sleeve Logo Jersey, S | S |
Long-Sleeve Logo Jersey, M | M |
We might only be interested in those that have a numeric type, even though the data is in a char data string, 1 in this instance means true:
SELECT Name, Size AS NumericSize
FROM SalesLT.Product
WHERE ISNUMERIC(Size) = 1;
Name | NumericSize |
---|---|
HL Road Frame - Black, 58 | 58 |
HL Road Frame - Red, 58 | 58 |
HL Road Frame - Red, 62 | 62 |
HL Road Frame - Red, 44 | 44 |
HL Road Frame - Red, 48 | 48 |
HL Road Frame - Red, 52 | 52 |
HL Road Frame - Red, 56 | 56 |
LL Road Frame - Black, 58 | 58 |
LL Road Frame - Black, 60 | 60 |
LL Road Frame - Black, 62 | 62 |
Or we might want to assign a value to something, like a product type of bike to certain values and other to everything else.
SELECT Name, IIF(ProductCategoryID IN (5,6,7), 'Bike', 'Other') AS ProductType
FROM SalesLT.Product;
Name | ProductType |
---|---|
HL Road Frame - Black, 58 | Other |
HL Road Frame - Red, 58 | Other |
Sport-100 Helmet, Red | Other |
Sport-100 Helmet, Black | Other |
Mountain Bike Socks, M | Other |
Mountain Bike Socks, L | Other |
Sport-100 Helmet, Blue | Other |
AWC Logo Cap | Other |
Long-Sleeve Logo Jersey, S | Other |
Long-Sleeve Logo Jersey, M | Other |
Or a more complicated query where we assign mutiple product types.
SELECT prd.Name AS ProductName, cat.Name AS Category,
CHOOSE (cat.ParentProductCategoryID, 'Bikes','Components','Clothing','Accessories') AS ProductType
FROM SalesLT.Product AS prd
JOIN SalesLT.ProductCategory AS cat
ON prd.ProductCategoryID = cat.ProductCategoryID;
ProductName | Category | ProductType |
---|---|---|
Mountain-100 Silver, 38 | Mountain Bikes | Bikes |
Mountain-100 Silver, 42 | Mountain Bikes | Bikes |
Mountain-100 Silver, 44 | Mountain Bikes | Bikes |
Mountain-100 Silver, 48 | Mountain Bikes | Bikes |
Mountain-100 Black, 38 | Mountain Bikes | Bikes |
Mountain-100 Black, 42 | Mountain Bikes | Bikes |
Mountain-100 Black, 44 | Mountain Bikes | Bikes |
Mountain-100 Black, 48 | Mountain Bikes | Bikes |
Mountain-200 Silver, 38 | Mountain Bikes | Bikes |
Mountain-200 Silver, 42 | Mountain Bikes | Bikes |
1.5.3 Window Functions
A window in this context is a set of rows - a window in to the database or table. For instance, we might want to RANK a set (or window) of data. The query below will pull out the top 100 more expensive products (ordered by list price) then creates a ranking based on this list price, then orders the results by this ranking.
SELECT TOP(100) ProductID, Name, ListPrice,
RANK() OVER(ORDER BY ListPrice DESC) AS RankByPrice
FROM SalesLT.Product AS p
ORDER BY RankByPrice;
ProductID | Name | ListPrice | RankByPrice |
---|---|---|---|
749 | Road-150 Red, 62 | 3578.27 | 1 |
750 | Road-150 Red, 44 | 3578.27 | 1 |
751 | Road-150 Red, 48 | 3578.27 | 1 |
752 | Road-150 Red, 52 | 3578.27 | 1 |
753 | Road-150 Red, 56 | 3578.27 | 1 |
771 | Mountain-100 Silver, 38 | 3399.99 | 6 |
772 | Mountain-100 Silver, 42 | 3399.99 | 6 |
773 | Mountain-100 Silver, 44 | 3399.99 | 6 |
774 | Mountain-100 Silver, 48 | 3399.99 | 6 |
775 | Mountain-100 Black, 38 | 3374.99 | 10 |
Or we might want to partition the results by product category, similar to grouping by product category. This will result in a ranking for each product category. The results might look a little odd at first glance.
SELECT c.Name AS Category, p.Name AS Product, ListPrice,
RANK() OVER(PARTITION BY c.Name ORDER BY ListPrice DESC) AS RankByPrice
FROM SalesLT.Product AS p
JOIN SalesLT.ProductCategory AS c
ON p.ProductCategoryID = c.ProductcategoryID
ORDER BY Category, RankByPrice;
Category | Product | ListPrice | RankByPrice |
---|---|---|---|
Bib-Shorts | Men’s Bib-Shorts, S | 89.99 | 1 |
Bib-Shorts | Men’s Bib-Shorts, M | 89.99 | 1 |
Bib-Shorts | Men’s Bib-Shorts, L | 89.99 | 1 |
Bike Racks | Hitch Rack - 4-Bike | 120.00 | 1 |
Bike Stands | All-Purpose Bike Stand | 159.00 | 1 |
Bottles and Cages | Mountain Bottle Cage | 9.99 | 1 |
Bottles and Cages | Road Bottle Cage | 8.99 | 2 |
Bottles and Cages | Water Bottle - 30 oz. | 4.99 | 3 |
Bottom Brackets | HL Bottom Bracket | 121.49 | 1 |
Bottom Brackets | ML Bottom Bracket | 101.24 | 2 |
1.5.4 Aggregate Functions
There are a lot of aggregate functions including some statistical functions like s.d. as well as some more standard things like MIN, MAX, SUM etc over a set of data. We can use GROUP BY.
First we can get some headline stats of what is in the table.
SELECT COUNT(*) AS Products, COUNT(DISTINCT ProductCategoryID) AS Categories, AVG(ListPrice) AS AveragePrice
FROM SalesLT.Product;
Products | Categories | AveragePrice |
---|---|---|
296 | 37 | 742.1235 |
Or we might be interested in summary figures for just one product type, like bikes. Note that we use COUNT then the specific item, since using COUNT(*) would return the number of rows, which may include duplicates.
SELECT COUNT(p.ProductID) BikeModels, AVG(p.ListPrice) AveragePrice
FROM SalesLT.Product AS p
JOIN SalesLT.ProductCategory AS c
ON p.ProductCategoryID = c.ProductCategoryID
WHERE c.Name LIKE '%Bikes';
BikeModels | AveragePrice |
---|---|
97 | 1586.737 |
However, our aggregate functions are currently returnign totals. In practice, we may want to return figures by groups, so we need to use GROUP BY in our queries. When we use a GROUP BY which groups up the results, we have to group up everything in the select row that isn’t being aggregated. You can’t have something in the SELECT which is neither being aggregated nor being grouped. As we are grouping the results, we don’t see individual rows of data anymore.
The following example calulates the sales revenue for each sales person.
SELECT c.Salesperson, SUM(oh.SubTotal) SalesRevenue
FROM SalesLT.Customer c
JOIN SalesLT.SalesOrderHeader oh
ON c.CustomerID = oh.CustomerID
GROUP BY c.Salesperson
ORDER BY SalesRevenue DESC;
Salesperson | SalesRevenue |
---|---|
adventure-works0 | 518096.4 |
adventure-works3 | 209219.8 |
adventure-works0 | 138116.9 |
Could it be that some sales people have no sales? If so, we need to return values where the total is NULL, by assigning NULL a value of zero. WE also use the LEFT JOIN rather than the INNER JOIN previously, we get sales people included who haven’t sold anything.
SELECT c.Salesperson, ISNULL(SUM(oh.SubTotal), 0.00) SalesRevenue
FROM SalesLT.Customer c
LEFT JOIN SalesLT.SalesOrderHeader oh
ON c.CustomerID = oh.CustomerID
GROUP BY c.Salesperson
ORDER BY SalesRevenue DESC;
Salesperson | SalesRevenue |
---|---|
adventure-works0 | 518096.4 |
adventure-works3 | 209219.8 |
adventure-works0 | 138116.9 |
adventure-works9 | 0.0 |
adventure-works0 | 0.0 |
adventure-works0 | 0.0 |
adventure-works1 | 0.0 |
adventure-works8 | 0.0 |
adventure-works1 | 0.0 |
Or we might want to know the number of customers each sales person has.
SELECT Salesperson, COUNT(CustomerID) Customers
FROM SalesLT.Customer
GROUP BY Salesperson
ORDER BY Salesperson;
Salesperson | Customers |
---|---|
adventure-works8 | 73 |
adventure-works1 | 78 |
adventure-works0 | 78 |
adventure-works0 | 148 |
adventure-works1 | 142 |
adventure-works3 | 71 |
adventure-works9 | 32 |
adventure-works0 | 74 |
adventure-works0 | 151 |
We might also want to know the sales Revenue by sales person and customer. Note that we now includer the customer details, we cannot use the CUSTOMER alias from the SELECT query in the GROUP BY, as the SELECT query is run after the GROUP BY.
SELECT c.Salesperson, CONCAT(c.FirstName +' ', c.LastName) AS Customer, ISNULL(SUM(oh.SubTotal), 0.00) SalesRevenue
FROM SalesLT.Customer c
LEFT JOIN SalesLT.SalesOrderHeader oh
ON c.CustomerID = oh.CustomerID
GROUP BY c.Salesperson, CONCAT(c.FirstName +' ', c.LastName)
ORDER BY SalesRevenue DESC, Customer;
Salesperson | Customer | SalesRevenue |
---|---|---|
adventure-works0 | Terry Eminhizer | 108561.83 |
adventure-works0 | Krishna Sunkammurali | 98278.69 |
adventure-works0 | Christopher Beck | 88812.86 |
adventure-works3 | Kevin Liu | 83858.43 |
adventure-works0 | Jon Grande | 78029.69 |
adventure-works0 | Jeffrey Kurtz | 74058.81 |
adventure-works0 | Rebecca Laszlo | 63980.99 |
adventure-works3 | Anthony Chor | 57634.63 |
adventure-works0 | Frank Campbell | 41622.05 |
adventure-works3 | Catherine Abel | 39785.33 |
Or the number of products in each category.
SELECT c.Name AS Category, COUNT(p.ProductID) AS Products
FROM SalesLT.Product AS p
JOIN SalesLT.ProductCategory AS c
ON p.ProductCategoryID = c.ProductCategoryID
GROUP BY c.Name
ORDER BY Category;
Category | Products |
---|---|
Bib-Shorts | 3 |
Bike Racks | 1 |
Bike Stands | 1 |
Bottles and Cages | 3 |
Bottom Brackets | 3 |
Brakes | 2 |
Caps | 1 |
Chains | 1 |
Cleaners | 1 |
Cranksets | 3 |
1.5.5 Filtering Groups
If we want to filter, we can use the WHERE clause as we have seen in some previous examples. However, most of the time we filter with the HAVING caluse. HAVING will filter the resutls RATHER than the input.
Having clause provides a search condition that each group must satisfy, for instance where the number of orders by a customer is greater than 10.
In the example below, we are interested in finding out which sales people have more than 100 customers.
SELECT Salesperson, COUNT(CustomerID) Customers
FROM SalesLT.Customer
GROUP BY Salesperson
HAVING COUNT(CustomerID) > 100
ORDER BY Salesperson;
Salesperson | Customers |
---|---|
adventure-works0 | 148 |
adventure-works1 | 142 |
adventure-works0 | 151 |
** Key Points **
- You can use GROUP BY with aggregate functions to return aggregations grouped by one or more columns or expressions.
- All columns in the SELECT clause that are not aggregate function expressions must be included in a GROUP BY clause.
- The order in which columns or expressions are listed in the GROUP BY clause determines the grouping hierarchy.
- You can filter the groups that are included in the query results by specifying a HAVING clause.
1.5.6 Lab Exercises
Write a query to return the product ID of each product, together with the product name formatted as upper case and a column named ApproxWeight with the weight of each product rounded to the nearest whole unit.
SELECT ProductID, UPPER(Name) AS ProductName, ROUND(WEIGHT, 0) AS ApproxWeight
FROM SalesLT.Product;
ProductID | ProductName | ApproxWeight |
---|---|---|
680 | HL ROAD FRAME - BLACK, 58 | 1016 |
706 | HL ROAD FRAME - RED, 58 | 1016 |
707 | SPORT-100 HELMET, RED | NA |
708 | SPORT-100 HELMET, BLACK | NA |
709 | MOUNTAIN BIKE SOCKS, M | NA |
710 | MOUNTAIN BIKE SOCKS, L | NA |
711 | SPORT-100 HELMET, BLUE | NA |
712 | AWC LOGO CAP | NA |
713 | LONG-SLEEVE LOGO JERSEY, S | NA |
714 | LONG-SLEEVE LOGO JERSEY, M | NA |
Extend your query to include columns named SellStartYear and SellStartMonth containing the year and month in which AdventureWorks started selling each product. The month should be displayed as the month name (e.g. ‘January’).
SELECT ProductID, UPPER(Name) AS ProductName, ROUND(Weight, 0) AS ApproxWeight,
YEAR(SellStartDate) as SellStartYear,
DATENAME(m, SellStartDate) as SellStartMonth
FROM SalesLT.Product;
ProductID | ProductName | ApproxWeight | SellStartYear | SellStartMonth |
---|---|---|---|---|
680 | HL ROAD FRAME - BLACK, 58 | 1016 | 2002 | June |
706 | HL ROAD FRAME - RED, 58 | 1016 | 2002 | June |
707 | SPORT-100 HELMET, RED | NA | 2005 | July |
708 | SPORT-100 HELMET, BLACK | NA | 2005 | July |
709 | MOUNTAIN BIKE SOCKS, M | NA | 2005 | July |
710 | MOUNTAIN BIKE SOCKS, L | NA | 2005 | July |
711 | SPORT-100 HELMET, BLUE | NA | 2005 | July |
712 | AWC LOGO CAP | NA | 2005 | July |
713 | LONG-SLEEVE LOGO JERSEY, S | NA | 2005 | July |
714 | LONG-SLEEVE LOGO JERSEY, M | NA | 2005 | July |
Extend your query to include a column named ProductType that contains the leftmost two characters from the product number.
SELECT ProductID, UPPER(Name) AS ProductName, ROUND(Weight, 0) AS ApproxWeight,
YEAR(SellStartDate) as SellStartYear,
DATENAME(m, SellStartDate) as SellStartMonth,
LEFT(ProductNumber, 2) AS ProductType
FROM SalesLT.Product;
ProductID | ProductName | ApproxWeight | SellStartYear | SellStartMonth | ProductType |
---|---|---|---|---|---|
680 | HL ROAD FRAME - BLACK, 58 | 1016 | 2002 | June | FR |
706 | HL ROAD FRAME - RED, 58 | 1016 | 2002 | June | FR |
707 | SPORT-100 HELMET, RED | NA | 2005 | July | HL |
708 | SPORT-100 HELMET, BLACK | NA | 2005 | July | HL |
709 | MOUNTAIN BIKE SOCKS, M | NA | 2005 | July | SO |
710 | MOUNTAIN BIKE SOCKS, L | NA | 2005 | July | SO |
711 | SPORT-100 HELMET, BLUE | NA | 2005 | July | HL |
712 | AWC LOGO CAP | NA | 2005 | July | CA |
713 | LONG-SLEEVE LOGO JERSEY, S | NA | 2005 | July | LJ |
714 | LONG-SLEEVE LOGO JERSEY, M | NA | 2005 | July | LJ |
Extend your query to filter the product returned so that only products with a numeric size are included.
SELECT ProductID, UPPER(Name) AS ProductName, ROUND(Weight, 0) AS ApproxWeight,
YEAR(SellStartDate) as SellStartYear,
DATENAME(m, SellStartDate) as SellStartMonth,
LEFT(ProductNumber, 2) AS ProductType
FROM SalesLT.Product
WHERE ISNUMERIC(SIZE) = 1;
ProductID | ProductName | ApproxWeight | SellStartYear | SellStartMonth | ProductType |
---|---|---|---|---|---|
680 | HL ROAD FRAME - BLACK, 58 | 1016 | 2002 | June | FR |
706 | HL ROAD FRAME - RED, 58 | 1016 | 2002 | June | FR |
717 | HL ROAD FRAME - RED, 62 | 1043 | 2005 | July | FR |
718 | HL ROAD FRAME - RED, 44 | 962 | 2005 | July | FR |
719 | HL ROAD FRAME - RED, 48 | 980 | 2005 | July | FR |
720 | HL ROAD FRAME - RED, 52 | 998 | 2005 | July | FR |
721 | HL ROAD FRAME - RED, 56 | 1016 | 2005 | July | FR |
722 | LL ROAD FRAME - BLACK, 58 | 1116 | 2005 | July | FR |
723 | LL ROAD FRAME - BLACK, 60 | 1125 | 2005 | July | FR |
724 | LL ROAD FRAME - BLACK, 62 | 1134 | 2005 | July | FR |
Write a query that returns a list of company names with a ranking of their place in a list of highest TotalDue values from the SalesOrderHeader table.
SELECT CompanyName, TotalDue AS Revenue,
RANK() OVER (ORDER BY TotalDue DESC) AS RankByRevenue
FROM SalesLT.SalesOrderHeader AS SOH
LEFT JOIN SalesLT.Customer AS C
ON SOH.CustomerID = C.CustomerID;
CompanyName | Revenue | RankByRevenue |
---|---|---|
Action Bicycle Specialists | 119960.82 | 1 |
Metropolitan Bicycle Supply | 108597.95 | 2 |
Bulk Discount Store | 98138.21 | 3 |
Eastside Department Store | 92663.56 | 4 |
Riding Cycles | 86222.81 | 5 |
Many Bikes Store | 81834.98 | 6 |
Instruments and Parts Company | 70698.99 | 7 |
Extreme Riding Supplies | 63686.27 | 8 |
Trailblazing Sports | 45992.37 | 9 |
Professional Sales and Service | 43962.79 | 10 |
Write a query to retrieve a list of the product names and the total revenue calculated as the sum of the LineTotal from the SalesLT.SalesOrderDetail table, with the results sorted in descending order of total revenue.
SELECT Name, SUM(LineTotal) AS TotalRevenue
FROM SalesLT.SalesOrderDetail AS SOD
LEFT JOIN SalesLT.Product AS P
ON SOD.ProductID = p.ProductID
GROUP BY P.Name
ORDER BY TotalRevenue DESC;
Name | TotalRevenue |
---|---|
Touring-1000 Blue, 60 | 37191.49 |
Mountain-200 Black, 42 | 37178.84 |
Road-350-W Yellow, 48 | 36486.24 |
Mountain-200 Black, 38 | 35801.84 |
Touring-1000 Yellow, 60 | 23413.47 |
Touring-1000 Blue, 50 | 22887.07 |
Mountain-200 Silver, 42 | 20879.91 |
Road-350-W Yellow, 40 | 20411.88 |
Mountain-200 Black, 46 | 19277.92 |
Road-350-W Yellow, 42 | 18692.52 |
Modify the previous query to include sales totals for products that have a list price of more than 1000.
SELECT Name, SUM(LineTotal) AS TotalRevenue
FROM SalesLT.SalesOrderDetail AS SOD
JOIN SalesLT.Product AS P
ON SOD.ProductID = P.ProductID
WHERE ListPrice > 1000
GROUP BY P.Name
ORDER BY TotalRevenue DESC;
Name | TotalRevenue |
---|---|
Touring-1000 Blue, 60 | 37191.49 |
Mountain-200 Black, 42 | 37178.84 |
Road-350-W Yellow, 48 | 36486.24 |
Mountain-200 Black, 38 | 35801.84 |
Touring-1000 Yellow, 60 | 23413.47 |
Touring-1000 Blue, 50 | 22887.07 |
Mountain-200 Silver, 42 | 20879.91 |
Road-350-W Yellow, 40 | 20411.88 |
Mountain-200 Black, 46 | 19277.92 |
Road-350-W Yellow, 42 | 18692.52 |
Modify the previous query to only include products with total sales greater than 20000.
SELECT Name, SUM(LineTotal) AS TotalRevenue
FROM SalesLT.SalesOrderDetail AS SOD
JOIN SalesLT.Product AS P
ON SOD.ProductID = P.ProductID
WHERE P.ListPrice > 1000
GROUP BY P.Name
-- add having clause as per instructions
HAVING SUM(LineTotal) > 20000
ORDER BY TotalRevenue DESC;
Name | TotalRevenue |
---|---|
Touring-1000 Blue, 60 | 37191.49 |
Mountain-200 Black, 42 | 37178.84 |
Road-350-W Yellow, 48 | 36486.24 |
Mountain-200 Black, 38 | 35801.84 |
Touring-1000 Yellow, 60 | 23413.47 |
Touring-1000 Blue, 50 | 22887.07 |
Mountain-200 Silver, 42 | 20879.91 |
Road-350-W Yellow, 40 | 20411.88 |
1.6 Sub-queries and Apply
Subqueries are queries within queries (nested). The results of the inner or nested query are pasted to the outer query and behave much like an expression.
Our subquery might be scalr, for instance we might have a single value which is passed to the outer query. So we might be interested in the last order (the maxium id) and the details from that order, where the information is held in two different tables.
SELECT salesorderid, productid, unitprice, orderqty
From SalesLT.SalesOrderDetail
WHERE salesorderid =
(SELECT MAX(salesorderid) AS lastorder
FROM SalesLT.SalesOrderHeader);
salesorderid | productid | unitprice | orderqty |
---|---|---|---|
71946 | 916 | 31.584 | 1 |
The subquery is in brackets, and the system will treat whatever is in brackets as an individual unit.
In a multi-valued query, mutiple values are returned but as a single column set to the outer query. So we might be interested in every customer in Mexico. One will get returned from the subquery is a one dimensional array - a vector.
1.6.2 The Apply Operator
Using an apply operator and a table function can achieve something similar to a correlated query, but in less complicated code. A CROSS APPLY applies the right table expression to each row in the left table, a bit like the correlated sub-query, with CROSS APPLY being similar to a CROSS JOIN.
Similarily OUTER APPLY adds rows for those with NULL in columns for the right table, similar to LEFT OUTER JOIN. In these instances, the right table is the table returned by the APPLY function.
The first thing we need to do is to create a function. Notice we are passing the function the SalesOrderID, and the most expensive item in the order gets passed back.
-- Setup
CREATE FUNCTION SalesLT.udfMaxUnitPrice (@SalesOrderID int)
RETURNS TABLE
AS
RETURN
SELECT SalesOrderID,Max(UnitPrice) as MaxUnitPrice FROM
SalesLT.SalesOrderDetail
WHERE SalesOrderID=@SalesOrderID
GROUP BY SalesOrderID;
To see what code created a function, to help understand what it does at a later time or if created by someone else, we can use the sp_helptext command.
sp_helptext 'saleslt.udfmaxunitprice'
Text |
---|
– Setup |
CREATE FUNCTION SalesLT.udfMaxUnitPrice ((???) int) RETURNS TABLE AS RETURN SELECT SalesOrderID,Max(UnitPrice) as MaxUnitPrice FROM SalesLT.SalesOrderDetail WHERE SalesOrderID=@SalesOrderID GROUP BY SalesOrderID; |
Then we can use that query to get the results. Note this could have been achieved with a correlated sub-query, but this is perahps a little easier to interpret.
SELECT SOH.SalesOrderID, MUP.MaxUnitPrice
FROM SalesLT.SalesOrderDetail AS SOH
CROSS APPLY SalesLT.udfMaxUnitPrice(SOH.SalesOrderID) AS MUP
ORDER BY SOH.SalesOrderID;
SalesOrderID | MaxUnitPrice |
---|---|
71774 | 356.898 |
71774 | 356.898 |
71776 | 63.900 |
71780 | 1391.994 |
71780 | 1391.994 |
71780 | 1391.994 |
71780 | 1391.994 |
71780 | 1391.994 |
71780 | 1391.994 |
71780 | 1391.994 |
** Key Points **
- The APPLY operator enables you to execute a table-valued function for each row in a rowset returned by a SELECT statement. Conceptually, this approach is similar to a correlated subquery.
- CROSS APPLY returns matching rows, similar to an inner join. OUTER APPLY returns all rows in the original SELECT query results with NULL values for rows where no match was found.
1.6.3 Lab Exercises
AdventureWorks products each have a standard cost that indicates the cost of manufacturing the product, and a list price that indicates the recommended selling price for the product. This data is stored in the SalesLT.Product table.
Whenever a product is ordered, the actual unit price at which it was sold is also recorded in the SalesLT.SalesOrderDetail table.
Use subqueries to compare the cost and list prices for each product with the unit prices charged in each sale.
SELECT ProductID, Name, ListPrice
FROM SalesLT.Product
WHERE ListPrice >
(SELECT AVG(UnitPRice) FROM SalesLT.SalesOrderDetail)
ORDER BY ProductID;
ProductID | Name | ListPrice |
---|---|---|
680 | HL Road Frame - Black, 58 | 1431.50 |
706 | HL Road Frame - Red, 58 | 1431.50 |
717 | HL Road Frame - Red, 62 | 1431.50 |
718 | HL Road Frame - Red, 44 | 1431.50 |
719 | HL Road Frame - Red, 48 | 1431.50 |
720 | HL Road Frame - Red, 52 | 1431.50 |
721 | HL Road Frame - Red, 56 | 1431.50 |
731 | ML Road Frame - Red, 44 | 594.83 |
732 | ML Road Frame - Red, 48 | 594.83 |
733 | ML Road Frame - Red, 52 | 594.83 |
AdventureWorks is interested in finding out which products are being sold at a loss. Retrieve the product ID, name, and list price for each product where the list price is 100 or more, and the product has been sold for (strictly) less than 100.
SELECT ProductID, Name, ListPrice
FROM SalesLT.Product
WHERE ProductID IN
(SELECT ProductID FROM SalesLT.SalesOrderDetail
WHERE UnitPrice < 100)
AND ListPrice >= 100
ORDER BY ProductID;
ProductID | Name | ListPrice |
---|---|---|
810 | HL Mountain Handlebars | 120.27 |
813 | HL Road Handlebars | 120.27 |
876 | Hitch Rack - 4-Bike | 120.00 |
894 | Rear Derailleur | 121.46 |
907 | Rear Brakes | 106.50 |
948 | Front Brakes | 106.50 |
996 | HL Bottom Bracket | 121.49 |
In order to get an idea of how many products are selling above or below list price, you want to gather some aggregate product data. Retrieve the product ID, name, cost, and list price for each product along with the average unit price for which that product has been sold.
SELECT ProductID, Name, StandardCost, ListPrice,
(SELECT AVG(UnitPrice)
FROM SalesLT.SalesOrderDetail AS SOD
WHERE P.ProductID = SOD.ProductID) AS AvgSellingPrice
FROM SalesLT.Product AS P
ORDER BY P.ProductID;
ProductID | Name | StandardCost | ListPrice | AvgSellingPrice |
---|---|---|---|---|
680 | HL Road Frame - Black, 58 | 1059.3100 | 1431.50 | NA |
706 | HL Road Frame - Red, 58 | 1059.3100 | 1431.50 | NA |
707 | Sport-100 Helmet, Red | 13.0863 | 34.99 | 20.9940 |
708 | Sport-100 Helmet, Black | 13.0863 | 34.99 | 20.6441 |
709 | Mountain Bike Socks, M | 3.3963 | 9.50 | NA |
710 | Mountain Bike Socks, L | 3.3963 | 9.50 | NA |
711 | Sport-100 Helmet, Blue | 13.0863 | 34.99 | 20.7440 |
712 | AWC Logo Cap | 6.9223 | 8.99 | 5.3740 |
713 | Long-Sleeve Logo Jersey, S | 38.4923 | 49.99 | NA |
714 | Long-Sleeve Logo Jersey, M | 38.4923 | 49.99 | 29.9940 |
AdventureWorks is interested in finding out which products are costing more than they’re being sold for, on average. Filter the query for the previous exercise to include only products where the cost is higher than the average selling price.
SELECT ProductID, Name, StandardCost, ListPrice,
(SELECT AVG(UnitPrice)
FROM SalesLT.SalesOrderDetail AS SOD
WHERE P.ProductID = SOD.ProductID) AS AvgSellingPrice
FROM SalesLT.Product AS P
WHERE StandardCost >
(SELECT AVG(UnitPrice)
FROM SalesLT.SalesOrderDetail AS SOD
WHERE P.ProductID = SOD.ProductID)
ORDER BY P.ProductID;
ProductID | Name | StandardCost | ListPrice | AvgSellingPrice |
---|---|---|---|---|
712 | AWC Logo Cap | 6.9223 | 8.99 | 5.374 |
714 | Long-Sleeve Logo Jersey, M | 38.4923 | 49.99 | 29.994 |
715 | Long-Sleeve Logo Jersey, L | 38.4923 | 49.99 | 29.744 |
716 | Long-Sleeve Logo Jersey, XL | 38.4923 | 49.99 | 29.994 |
717 | HL Road Frame - Red, 62 | 868.6342 | 1431.50 | 858.900 |
718 | HL Road Frame - Red, 44 | 868.6342 | 1431.50 | 858.900 |
722 | LL Road Frame - Black, 58 | 204.6251 | 337.22 | 202.332 |
738 | LL Road Frame - Black, 52 | 204.6251 | 337.22 | 202.332 |
792 | Road-250 Red, 58 | 1554.9479 | 2443.35 | 1466.010 |
793 | Road-250 Black, 44 | 1554.9479 | 2443.35 | 1466.010 |
The AdventureWorksLT database includes a table-valued user-defined function named dbo.ufnGetCustomerInformation. Use this function to retrieve details of customers based on customer ID values retrieved from tables in the database.
Retrieve the sales order ID, customer ID, first name, last name, and total due for all sales orders from the SalesLT.SalesOrderHeader table and the dbo.ufnGetCustomerInformation function.
SELECT SOH.SalesOrderID, SOH.CustomerID, CI.FirstName, CI.LastName, SOH.TotalDue
FROM SalesLT.SalesOrderHeader AS SOH
CROSS APPLY dbo.ufnGetCustomerInformation(SOH.CustomerID) AS CI
ORDER BY SOH.SalesOrderID;
SalesOrderID | CustomerID | FirstName | LastName | TotalDue |
---|---|---|---|---|
71774 | 29847 | David | Hodgson | 972.7850 |
71776 | 30072 | Andrea | Thomsen | 87.0851 |
71780 | 30113 | Raja | Venugopal | 42452.6519 |
71782 | 29485 | Catherine | Abel | 43962.7901 |
71783 | 29957 | Kevin | Liu | 92663.5609 |
71784 | 29736 | Terry | Eminhizer | 119960.8240 |
71796 | 29660 | Anthony | Chor | 63686.2708 |
71797 | 29796 | Jon | Grande | 86222.8072 |
71815 | 30089 | Michael John | Troyer | 1261.4440 |
71816 | 30027 | Joseph | Mitzner | 3754.9733 |
Use the table-valued user-defined function dbo.ufnGetCustomerInformation again to retrieve details of customers based on customer ID values retrieved from tables in the database. Retrieve the customer ID, first name, last name, address line 1 and city for all customers from the SalesLT.Address and SalesLT.CustomerAddress tables, using the dbo.ufnGetCustomerInformation function.
SELECT CA.CustomerID, CI.FirstName, CI.LastName, A.AddressLine1, A.City
FROM SalesLT.Address AS A
JOIN SalesLT.CustomerAddress AS CA
ON A.AddressID = CA.AddressID
CROSS APPLY dbo.ufnGetCustomerInformation(CA.CustomerID) AS CI
ORDER BY CA.CustomerID;
CustomerID | FirstName | LastName | AddressLine1 | City |
---|---|---|---|---|
29485 | Catherine | Abel | 57251 Serene Blvd | Van Nuys |
29486 | Kim | Abercrombie | Tanger Factory | Branch |
29489 | Frances | Adams | 6900 Sisk Road | Modesto |
29490 | Margaret | Smith | Lewiston Mall | Lewiston |
29492 | Jay | Adams | Blue Ridge Mall | Kansas City |
29494 | Samuel | Agcaoili | No. 25800-130 King Street West | Toronto |
29496 | Robert | Ahlering | 6500 East Grant Road | Tucson |
29497 | François | Ferrier | Eastridge Mall | Casper |
29499 | Amy | Alberts | 252851 Rowan Place | Richmond |
29502 | Paul | Alcorn | White Mountain Mall | Rock Springs |
1.7 Using Table Expressions
We can create tables in a number of ways that can then be re-usd, rather than being a one-off query.
1.7.1 Views
One way we can do this is using a view. Rather than repeatadly joining and selected fields from two tables, we can create a view that contains the join syntax then just select the view. This view can be used and queried as if it were a table. The data is still in the underlying tables, but it is a view or presentation on top of the tables. It’s like a named query, which can make this simpler, plus we can add data at the view level, so they can view rights for the view table, but not the underlying data. Note that whilst you are able to use insert to add new rows, you can only do this to one of the underlying tables.
First we create a view
-- Create a view
CREATE VIEW SalesLT.vCustomerAddress
AS
SELECT C.CustomerID, FirstName, LastName, AddressLine1, City, StateProvince
FROM
SalesLT.Customer C JOIN SalesLT.CustomerAddress CA
ON C.CustomerID=CA.CustomerID
JOIN SalesLT.Address A
ON CA.AddressID=A.AddressID
Then we can query the view.
SELECT CustomerID, City
FROM SalesLT.vCustomerAddress
CustomerID | City |
---|---|
29698 | Burnaby |
29997 | Seattle |
29854 | Joliet |
30027 | Oxnard |
30023 | Peoria |
29637 | Irving |
29545 | Bothell |
29890 | Port Orchard |
29772 | Austin |
29883 | Denver |
And we can join data to that view.
SELECT c.StateProvince, c.City, ISNULL(SUM(s.TotalDue), 0.00) AS Revenue
FROM SalesLT.vCustomerAddress AS c
LEFT JOIN SalesLT.SalesOrderHeader AS s
ON s.CustomerID = c.CustomerID
GROUP BY c.StateProvince, c.City
ORDER BY c.StateProvince, Revenue DESC;
StateProvince | City | Revenue |
---|---|---|
Alberta | Calgary | 0 |
Alberta | Edmonton | 0 |
Arizona | Chandler | 0 |
Arizona | Gilbert | 0 |
Arizona | Mesa | 0 |
Arizona | Phoenix | 0 |
Arizona | Scottsdale | 0 |
Arizona | Surprise | 0 |
Arizona | Tucson | 0 |
British Columbia | Vancouver | 0 |
1.7.2 Using Temporary Tables and Table Variables
Views are persistent database objects, however we might want a temporary table. We prefix the object with a # symbol and they are created in a seperate temporary database called tempdb, rather than the db you are working in. Typically using a single # will mean the table exists for the current user session. If you wanted it to persistent over multiple sessions, use the ## prefix.
An alternative approach is to use table variables, which were introduced to cause performance problems if used again, as the table has to be re-created each user session, which can become a problem if there are many tables. If we want to use one we prefix it with an @ sign to signify it is a variable e.g. (???) table so we define the variable as a table. This is connected to the batch rather than the session, so as long as we are in the block of code. It only works well on small databases or queries. They are for temporary situations or a temporary holding space.
In the next block of code we show how we create a temporary table then update and query it.
CREATE TABLE #Colors
(Color varchar(15));
INSERT INTO #Colors
SELECT DISTINCT Color FROM SalesLT.Product;
SELECT * FROM #Colors;
A table variable is similar, with some syntax differences, but this works on the code set (batch of commans ran at the same time) rather than being for the entire session.
DECLARE @Colors AS TABLE (Color varchar(15));
INSERT INTO @Colors
SELECT DISTINCT Color FROM SalesLT.Product;
SELECT * FROM @Colors;
** Key Points **
- Temporary tables are prefixed with a # symbol (You can also create global temporary tables that can be accessed by other processes by prefixing the name with ##)
- Local temporary tables are automatically deleted when the session in which they were created ends. Global temporary tables are deleted when the last user sessions referencing them is closed.
- Table variables are prefixed with a @ symbol.
- Table variables are scoped to the batch in which they are created.
1.7.3 Table Value Functions (TVF)
This is a specific type of function which returns a table. It is a permanently define database object. Unlike views, we can pass values in to a TVF.
So first we create a TVF.
CREATE FUNCTION SalesLT.udfCustomersByCity
(@City AS VARCHAR(20))
RETURNS TABLE
AS
RETURN
(SELECT C.CustomerID, FirstName, LastName, AddressLine1, City, StateProvince
FROM SalesLT.Customer C JOIN SalesLT.CustomerAddress CA
ON C.CustomerID=CA.CustomerID
JOIN SalesLT.Address A ON CA.AddressID=A.AddressID
WHERE City=@City);
Then we can call the TVF passing a parameter - in this instance we prvoide a city name of Bellevue - which will then return the results of the SELECT statement which is a list of customers in that city.
SELECT * FROM SalesLT.udfCustomersByCity('Bellevue')
CustomerID | FirstName | LastName | AddressLine1 | City | StateProvince |
---|---|---|---|---|---|
29559 | Robert | Bernacchi | 2681 Eagle Peak | Bellevue | Washington |
29559 | Robert | Bernacchi | 25915 140th Ave Ne | Bellevue | Washington |
1.7.4 Derived Tables
These are derived tables that return a multi column table. They are a virtual table to simplify a query. The table only exists within that SELECT query. It is a programming construct within a SELECT statement, it is used for programming purposes. They can use internal (aka inline) within the derived table element, or external aliases for columns - outside the () of the derived table.
In the example derived table below, the alias ProdCats is provided external to the query (outside the parentheses). Other elements such as Category as defined as internal aliases.
SELECT Category, COUNT(ProductID) AS Products
FROM
(SELECT p.ProductID, p.Name AS Product, c.Name AS Category
FROM SalesLT.Product AS p
JOIN SalesLT.ProductCategory AS c
ON p.ProductCategoryID = c.ProductCategoryID) AS ProdCats
GROUP BY Category
ORDER BY Category;
Category | Products |
---|---|
Bib-Shorts | 3 |
Bike Racks | 1 |
Bike Stands | 1 |
Bottles and Cages | 3 |
Bottom Brackets | 3 |
Brakes | 2 |
Caps | 1 |
Chains | 1 |
Cleaners | 1 |
Cranksets | 3 |
1.7.5 Using Common Table Expressions (CTEs)
Similar to a TVF before we are defining a temporary table object which is used within the scope of the query. But this time aorund we define the CTE first, then make calls to that CTE afterwards. Unlike a derived query however, you can refer to the CTE mutiple times within the same query/code batch, but it won’t live on like a view. We can also use a CTE for recursive elements like loops, so you might want to get everyone within a management tier, then say you want to go through that process three times for the top 3 levels, using the OPTION(MAXRECURSION 3).
So in the following example, we first define the CTE, then we make reference to it. The result here is the same as our derived query previously, but perhaps a little easiter to understand.
WITH ProductsByCategory (ProductID, ProductName, Category)
AS
(
SELECT p.ProductID, p.Name, c.Name AS Category
FROM SalesLT.Product AS p
JOIN SalesLT.ProductCategory AS c
ON p.ProductCategoryID = c.ProductCategoryID
)
SELECT Category, COUNT(ProductID) AS Products
FROM ProductsByCategory
GROUP BY Category
ORDER BY Category;
Category | Products |
---|---|
Bib-Shorts | 3 |
Bike Racks | 1 |
Bike Stands | 1 |
Bottles and Cages | 3 |
Bottom Brackets | 3 |
Brakes | 2 |
Caps | 1 |
Chains | 1 |
Cleaners | 1 |
Cranksets | 3 |
Next we will look at the our organisation chart example as previously mentioned. First let’s look at our employee table.
SELECT * FROM SalesLT.Employee
EmployeeID | EmployeeName | ManagerID |
---|---|---|
1 | adventure-works8 | 8 |
2 | adventure-works1 | 1 |
3 | adventure-works0 | NA |
4 | adventure-works0 | 3 |
5 | adventure-works1 | 1 |
6 | adventure-works3 | 3 |
7 | adventure-works9 | 9 |
8 | adventure-works0 | 3 |
9 | adventure-works0 | 3 |
We can see that each employee has a manager ID apart from Employee 3 who is the CEO/MD. So we have a hierarchy of employees. We will go 3 levels deep this time (three recursions).
WITH OrgReport (ManagerID, EmployeeID, EmployeeName, Level)
AS
(
-- Anchor query
SELECT e.ManagerID, e.EmployeeID, EmployeeName, 0
FROM SalesLT.Employee AS e
WHERE ManagerID IS NULL
UNION ALL
-- Recursive query
SELECT e.ManagerID, e.EmployeeID, e.EmployeeName, Level + 1
FROM SalesLT.Employee AS e
INNER JOIN OrgReport AS o ON e.ManagerID = o.EmployeeID
)
SELECT * FROM OrgReport
OPTION (MAXRECURSION 3);
ManagerID | EmployeeID | EmployeeName | Level |
---|---|---|---|
NA | 3 | adventure-works0 | 0 |
3 | 4 | adventure-works0 | 1 |
3 | 6 | adventure-works3 | 1 |
3 | 8 | adventure-works0 | 1 |
3 | 9 | adventure-works0 | 1 |
9 | 7 | adventure-works9 | 2 |
8 | 1 | adventure-works8 | 2 |
1 | 2 | adventure-works1 | 3 |
1 | 5 | adventure-works1 | 3 |
** Key Points **
- A derived table is a subquery that generates a multicolumn rowset. You must use the AS clause to define an alias for a derived query.
- Common Table Expressions (CTEs) provide a more intuitive syntax or defining rowsets than derived tables, and can be used mulitple times in the same query.
- You can use CTEs to define recursive queries.
1.7.6 Lab Exercises
AdventureWorks sells many products that are variants of the same product model. You must write queries that retrieve information about these products. Retrieve the product ID, product name, product model name, and product model summary for each product from the SalesLT.Product table and the SalesLT.vProductModelCatalogDescription view.
SELECT P.ProductID, P.Name AS ProductName, PM.Name AS ProductModel, PM.Summary
FROM SalesLT.Product AS P
JOIN SalesLT.vProductModelCatalogDescription AS PM
ON P.ProductModelID = PM.ProductModelID
ORDER BY ProductID;
ProductID | ProductName | ProductModel | Summary |
---|---|---|---|
749 | Road-150 Red, 62 | Road-150 | This bike is ridden by race winners. Developed with the |
Adventure Works Cyc | les professiona | l race team, it has a extremely light | |
heat-treated alumin | um frame, and s | teering that allows precision control. | |
750 | Road-150 Red, 44 | Road-150 | This bike is ridden by race winners. Developed with the |
Adventure Works Cyc | les professiona | l race team, it has a extremely light | |
heat-treated alumin | um frame, and s | teering that allows precision control. | |
751 | Road-150 Red, 48 | Road-150 | This bike is ridden by race winners. Developed with the |
Adventure Works Cyc | les professiona | l race team, it has a extremely light | |
heat-treated alumin | um frame, and s | teering that allows precision control. | |
752 | Road-150 Red, 52 | Road-150 | This bike is ridden by race winners. Developed with the |
Adventure Works Cyc | les professiona | l race team, it has a extremely light | |
heat-treated alumin | um frame, and s | teering that allows precision control. | |
753 | Road-150 Red, 56 | Road-150 | This bike is ridden by race winners. Developed with the |
Adventure Works Cyc | les professiona | l race team, it has a extremely light | |
heat-treated alumin | um frame, and s | teering that allows precision control. | |
754 | Road-450 Red, 58 | Road-450 | A true multi-sport bike that offers streamlined riding and a revolutionary design. |
Aerodynamic | design lets you | ride with the pros, and the gearing will conquer hilly roads. | |
755 | Road-450 Red, 60 | Road-450 | A true multi-sport bike that offers streamlined riding and a revolutionary design. |
Aerodynamic | design lets you | ride with the pros, and the gearing will conquer hilly roads. | |
756 | Road-450 Red, 44 | Road-450 | A true multi-sport bike that offers streamlined riding and a revolutionary design. |
Aerodynamic | design lets you | ride with the pros, and the gearing will conquer hilly roads. | |
757 | Road-450 Red, 48 | Road-450 | A true multi-sport bike that offers streamlined riding and a revolutionary design. |
Aerodynamic | design lets you | ride with the pros, and the gearing will conquer hilly roads. | |
758 | Road-450 Red, 52 | Road-450 | A true multi-sport bike that offers streamlined riding and a revolutionary design. |
Aerodynamic | design lets you | ride with the pros, and the gearing will conquer hilly roads. |
You are only interested in products which have a listed color in the database. Create a table variable and populate it with a list of distinct colors from the SalesLT.Product table. Then use the table variable to filter a query that returns the product ID, name, and color from the SalesLT.Product table so that only products with a color listed in the table variable are returned.
DECLARE @Colors AS TABLE (Color NVARCHAR(15));
INSERT INTO @Colors
SELECT DISTINCT Color FROM SalesLT.Product;
SELECT ProductID, Name, Color
FROM SalesLT.Product
WHERE Color IN (SELECT Color FROM @Colors);
The AdventureWorksLT database includes a table-valued function named dbo.ufnGetAllCategories, which returns a table of product categories (e.g. ‘Road Bikes’) and parent categories (for example ‘Bikes’). Write a query that uses this function to return a list of all products including their parent category and their own category.
SELECT C.ParentProductCategoryName AS ParentCategory,
C.ProductCategoryName AS Category,
P.ProductID, P.Name AS ProductName
FROM SalesLT.Product AS P
JOIN dbo.ufnGetAllCategories() AS C
ON P.ProductCategoryID = C.ProductCategoryID
ORDER BY ParentCategory, Category, ProductName;
ParentCategory | Category | ProductID | ProductName |
---|---|---|---|
Accessories | Bike Racks | 876 | Hitch Rack - 4-Bike |
Accessories | Bike Stands | 879 | All-Purpose Bike Stand |
Accessories | Bottles and Cages | 871 | Mountain Bottle Cage |
Accessories | Bottles and Cages | 872 | Road Bottle Cage |
Accessories | Bottles and Cages | 870 | Water Bottle - 30 oz. |
Accessories | Cleaners | 877 | Bike Wash - Dissolver |
Accessories | Fenders | 878 | Fender Set - Mountain |
Accessories | Helmets | 708 | Sport-100 Helmet, Black |
Accessories | Helmets | 711 | Sport-100 Helmet, Blue |
Accessories | Helmets | 707 | Sport-100 Helmet, Red |
Each AdventureWorks customer is a retail company with a named contact. You must create queries that return the total revenue for each customer, including the company and customer contact names. Retrieve a list of customers in the format Company (Contact Name) together with the total revenue for each customer. Use a derived table or a common table expression to retrieve the details for each sales order, and then query the derived table or CTE to aggregate and group the data.
SELECT CompanyContact, SUM(SalesAmount) AS Revenue
FROM
(SELECT CONCAT(c.CompanyName, CONCAT(' (' + c.FirstName + ' ', c.LastName + ')')), SOH.TotalDue
FROM SalesLT.SalesOrderHeader AS SOH
JOIN SalesLT.Customer AS c
ON SOH.CustomerID = c.CustomerID) AS CustomerSales(CompanyContact, SalesAmount)
GROUP BY CompanyContact
ORDER BY CompanyContact;
CompanyContact | Revenue |
---|---|
Action Bicycle Specialists (Terry Eminhizer) | 119960.8240 |
Aerobic Exercise Company (Rosmarie Carroll) | 2361.6403 |
Bulk Discount Store (Christopher Beck) | 98138.2131 |
Central Bicycle Specialists (Janeth Esteves) | 43.0437 |
Channel Outlet (Richard Byham) | 608.1766 |
Closest Bicycle Store (Pamala Kotc) | 39531.6085 |
Coalition Bike Company (Donald Blanton) | 2669.3183 |
Discount Tours (Melissa Marple) | 3293.7761 |
Eastside Department Store (Kevin Liu) | 92663.5609 |
Engineered Bike Systems (Joseph Mitzner) | 3754.9733 |
1.8 Grouping Sets and Pivoting Data
These are both ways of summarising data - grouping data includes cubing the data.
Grouping sets allows you to define multiple groupings within the same query. So you may haved a number of subtotals for different product types, then a total of all the subtotals combined.
So you would have something like
SELECTFROM
GROUP BY GROUPING SETS (
, () – empty parentheses if aggregating all rows )
A result line will appear with NULLs in all column(s) which is the grand total.
There are other ways to aggregate the data such as ROLLUP and CUBE. ROLLUP is useful if you haave hierarchy - people in households, towns in regions. Cube will show every possible aggregation, rather than specifying each aggregation.
Both are specified in the GROUP BY X () where x is either ROLLUP or CUBE. You can add in GROUPING_ID AS
fiels to make the interpretation easier when using multiple grouping columns. You can then see the grand total more easily, since it will have a 1 in each groupid column. So a usual group by might look something like this - with a high level Parent Category followed by a Product Category and a count of items.
SELECT cat.ParentProductCategoryName, cat.ProductCategoryName, count(prd.ProductID) AS Products FROM SalesLT.vGetAllCategories as cat LEFT JOIN SalesLT.Product AS prd ON prd.ProductCategoryID = cat.ProductcategoryID GROUP BY cat.ParentProductCategoryName, cat.ProductCategoryName ORDER BY cat.ParentProductCategoryName, cat.ProductCategoryName;
Table 1.88: Displaying records 1 - 10 ParentProductCategoryName ProductCategoryName Products Accessories Bike Racks 1 Accessories Bike Stands 1 Accessories Bottles and Cages 3 Accessories Cleaners 1 Accessories Fenders 1 Accessories Helmets 3 Accessories Hydration Packs 1 Accessories Lights 4 Accessories Locks 1 Accessories Panniers 1 Here there are no grand total by parent categories. Also, there might be certain items appear in more than one category without totals e.g. bike racks might be in mutiple parent categories.
Alternatively, we might create GROUPING SETS which also allows us to create totals.
SELECT cat.ParentProductCategoryName, cat.ProductCategoryName, count(prd.ProductID) AS Products FROM SalesLT.vGetAllCategories as cat LEFT JOIN SalesLT.Product AS prd ON prd.ProductCategoryID = cat.ProductcategoryID GROUP BY GROUPING SETS(cat.ParentProductCategoryName, cat.ProductCategoryName, ()) ORDER BY cat.ParentProductCategoryName, cat.ProductCategoryName;
Table 1.89: Displaying records 1 - 10 ParentProductCategoryName ProductCategoryName Products NA NA 296 NA Bib-Shorts 3 NA Bike Racks 1 NA Bike Stands 1 NA Bottles and Cages 3 NA Bottom Brackets 3 NA Brakes 2 NA Caps 1 NA Chains 1 NA Cleaners 1 We now start getting totals, we can see there is a total of 295 products, followed by 3 bib-shorts (irrespective of parent), and later in the table we get the totals by parent categories. But we have no loss some of the relationships with parent categories with this query. It is possible to write a query where these things might appear, GROUPING SETS is the most flexible of the grouping commands, but it might be easier to use one of the other grouping options as required, next shows the the ROLLUP command.
SELECT cat.ParentProductCategoryName, cat.ProductCategoryName, count(prd.ProductID) AS Products FROM SalesLT.vGetAllCategories as cat LEFT JOIN SalesLT.Product AS prd ON prd.ProductCategoryID = cat.ProductcategoryID GROUP BY ROLLUP (cat.ParentProductCategoryName, cat.ProductCategoryName) ORDER BY cat.ParentProductCategoryName, cat.ProductCategoryName;
Table 1.90: Displaying records 1 - 10 ParentProductCategoryName ProductCategoryName Products NA NA 296 Accessories NA 30 Accessories Bike Racks 1 Accessories Bike Stands 1 Accessories Bottles and Cages 3 Accessories Cleaners 1 Accessories Fenders 1 Accessories Helmets 3 Accessories Hydration Packs 1 Accessories Lights 4 It is now giving us totals for parents and products. However, we are now missing situations where items might appear in more than one parent category e.g. bottles and cages that are in a category other than accessories. So we can use CUBE to get these other aggregation options, which will give us all possible aggregations.
SELECT cat.ParentProductCategoryName, cat.ProductCategoryName, count(prd.ProductID) AS Products FROM SalesLT.vGetAllCategories as cat LEFT JOIN SalesLT.Product AS prd ON prd.ProductCategoryID = cat.ProductcategoryID GROUP BY CUBE (cat.ParentProductCategoryName, cat.ProductCategoryName) ORDER BY cat.ParentProductCategoryName, cat.ProductCategoryName;
Table 1.91: Displaying records 1 - 10 ParentProductCategoryName ProductCategoryName Products NA NA 296 NA Bib-Shorts 3 NA Bike Racks 1 NA Bike Stands 1 NA Bottles and Cages 3 NA Bottom Brackets 3 NA Brakes 2 NA Caps 1 NA Chains 1 NA Cleaners 1 ** Key Points ** * Use GROUPING SETS to define custom groupings.
* Use ROLLUP to include subtotals and a grand total for hierarchical groupings.
* Use CUBE to include all possible groupings.1.8.1 Pivoting Data
Pivoting is another form of summarising data. We take row based items and pivot them in to single, summarised, columns in a new table. We simply use the PIVOT command. It is possible to undue this using UNPIVOT however this may loose some of the detail that existed in the original data. For instance line level data, such as which particular clothing items were purchased and at which value, is now lost as we just get a clothing total, but re-orientated to match the original data format.
We might want to be able to see the totals by colour in our data and by cateory. You have to know which columns you wish to create using the PIVOT table, for instance in the code below we are listing the actual colours our in the PIVOT command. If one colour was missed, the query would work, but you obviously wouldn’t get that colour returned or inlcuded in totals.
SELECT * FROM (SELECT P.ProductID, PC.Name,ISNULL(P.Color, 'Uncolored') AS Color FROM saleslt.productcategory AS PC JOIN SalesLT.Product AS P ON PC.ProductCategoryID=P.ProductCategoryID ) AS PPC PIVOT(COUNT(ProductID) FOR Color IN([Red],[Blue],[Black],[Silver],[Yellow],[Grey], [Multi], [Uncolored])) as pvt ORDER BY Name;
Table 1.92: Displaying records 1 - 10 Name Red Blue Black Silver Yellow Grey Multi Uncolored Bib-Shorts 0 0 0 0 0 0 3 0 Bike Racks 0 0 0 0 0 0 0 1 Bike Stands 0 0 0 0 0 0 0 1 Bottles and Cages 0 0 0 0 0 0 0 3 Bottom Brackets 0 0 0 0 0 0 0 3 Brakes 0 0 0 2 0 0 0 0 Caps 0 0 0 0 0 0 1 0 Chains 0 0 0 1 0 0 0 0 Cleaners 0 0 0 0 0 0 0 1 Cranksets 0 0 3 0 0 0 0 0 If we wanted to, we could copy/archive this in to a new table, which can be re-used again or compared and retrieved in the future.
CREATE TABLE #ProductColorPivot (Name varchar(50), Red int, Blue int, Black int, Silver int, Yellow int, Grey int , multi int, uncolored int); INSERT INTO #ProductColorPivot SELECT * FROM (SELECT P.ProductID, PC.Name,ISNULL(P.Color, 'Uncolored') AS Color FROM saleslt.productcategory AS PC JOIN SalesLT.Product AS P ON PC.ProductCategoryID=P.ProductCategoryID ) AS PPC PIVOT(COUNT(ProductID) FOR Color IN([Red],[Blue],[Black],[Silver],[Yellow],[Grey], [Multi], [Uncolored])) as pvt ORDER BY Name;
If we then wanted to unpvivot the data we would do so as follows.
SELECT Name, Color, ProductCount FROM (SELECT Name, [Red],[Blue],[Black],[Silver],[Yellow],[Grey], [Multi], [Uncolored] FROM #ProductColorPivot) pcp UNPIVOT (ProductCount FOR Color IN ([Red],[Blue],[Black],[Silver],[Yellow],[Grey], [Multi], [Uncolored]) ) AS ProductCounts
** Key Points ** * Use PIVOT to re-orient a rowset by generating mulitple columns from values in a single column.
* Use UNPIVOT to re-orient mulitple columns in a an existing rowset into a single column.1.8.2 Lab Exercises
AdventureWorks sells products to customers in multiple country/regions around the world.
An existing report uses the query provided in the editor to return total sales revenue grouped by country/region and state/province. Modify the query so that the results include a grand total for all sales revenue and a subtotal for each country/region in addition to the state/province subtotals that are already returned.
SELECT a.CountryRegion, a.StateProvince, SUM(soh.TotalDue) AS Revenue FROM SalesLT.Address AS a JOIN SalesLT.CustomerAddress AS ca ON a.AddressID = ca.AddressID JOIN SalesLT.Customer AS c ON ca.CustomerID = c.CustomerID JOIN SalesLT.SalesOrderHeader as soh ON c.CustomerID = soh.CustomerID GROUP BY ROLLUP (a.CountryRegion, a.StateProvince) ORDER BY a.CountryRegion, a.StateProvince;
Table 1.93: 9 records CountryRegion StateProvince Revenue NA NA 956303.5949 United Kingdom NA 572496.5594 United Kingdom England 572496.5594 United States NA 383807.0355 United States California 346517.6072 United States Colorado 14017.9083 United States Nevada 7330.8972 United States New Mexico 15275.1977 United States Utah 665.4251 Modify your query to include a column named Level that indicates at which level in the total, country/region, and state/province hierarchy the revenue figure in the row is aggregated.
SELECT a.CountryRegion, a.StateProvince, IIF(GROUPING_ID(a.CountryRegion) = 1 AND GROUPING_ID(a.StateProvince) = 1, 'Total', IIF(GROUPING_ID(a.StateProvince) = 1, a.CountryRegion + ' Subtotal', a.StateProvince + ' Subtotal')) AS Level, SUM(soh.TotalDue) AS Revenue FROM SalesLT.Address AS a JOIN SalesLT.CustomerAddress AS ca ON a.AddressID = ca.AddressID JOIN SalesLT.Customer AS c ON ca.CustomerID = c.CustomerID JOIN SalesLT.SalesOrderHeader as soh ON c.CustomerID = soh.CustomerID GROUP BY ROLLUP(a.CountryRegion, a.StateProvince) ORDER BY a.CountryRegion, a.StateProvince;
Table 1.94: 9 records CountryRegion StateProvince Level Revenue NA NA Total 956303.5949 United Kingdom NA United Kingdom Subtotal 572496.5594 United Kingdom England England Subtotal 572496.5594 United States NA United States Subtotal 383807.0355 United States California California Subtotal 346517.6072 United States Colorado Colorado Subtotal 14017.9083 United States Nevada Nevada Subtotal 7330.8972 United States New Mexico New Mexico Subtotal 15275.1977 United States Utah Utah Subtotal 665.4251 Or to extend your query to include a grouping for individual cities.
SELECT a.CountryRegion, a.StateProvince, a.City, CHOOSE (1 + GROUPING_ID(a.CountryRegion) + GROUPING_ID(a.StateProvince) + GROUPING_ID(a.City), a.City + ' Subtotal', a.StateProvince + ' Subtotal', a.CountryRegion + ' Subtotal', 'Total') AS Level, SUM(soh.TotalDue) AS Revenue FROM SalesLT.Address AS a JOIN SalesLT.CustomerAddress AS ca ON a.AddressID = ca.AddressID JOIN SalesLT.Customer AS c ON ca.CustomerID = c.CustomerID JOIN SalesLT.SalesOrderHeader as soh ON c.CustomerID = soh.CustomerID GROUP BY ROLLUP(a.CountryRegion, a.StateProvince, a.City) ORDER BY a.CountryRegion, a.StateProvince, a.City;
Table 1.95: Displaying records 1 - 10 CountryRegion StateProvince City Level Revenue NA NA NA Total 956303.5949 United Kingdom NA NA United Kingdom Subtotal 572496.5594 United Kingdom England NA England Subtotal 572496.5594 United Kingdom England Abingdon Abingdon Subtotal 45.1995 United Kingdom England Cambridge Cambridge Subtotal 2711.4098 United Kingdom England Gloucestershire Gloucestershire Subtotal 70698.9922 United Kingdom England High Wycombe High Wycombe Subtotal 608.1766 United Kingdom England Liverpool Liverpool Subtotal 86222.8072 United Kingdom England London London Subtotal 206736.1667 United Kingdom England Maidenhead Maidenhead Subtotal 43.0437 AdventureWorks products are grouped into categories, which in turn have parent categories (defined in the SalesLT.vGetAllCategories view).
AdventureWorks customers are retail companies, and they may place orders for products of any category. The revenue for each product in an order is recorded as the LineTotal value in the SalesLT.SalesOrderDetail table.
Retrieve a list of customer company names together with their total revenue for each parent category in Accessories, Bikes, Clothing, and Components. Make sure to use the aliases provided, and default column names elsewhere.
SELECT * FROM (SELECT cat.ParentProductCategoryName, cust.CompanyName, sod.LineTotal FROM SalesLT.SalesOrderDetail AS sod JOIN SalesLT.SalesOrderHeader AS soh ON sod.SalesOrderID = soh.SalesOrderID JOIN SalesLT.Customer AS cust ON soh.CustomerID = cust.CustomerID JOIN SalesLT.Product AS prod ON sod.ProductID = prod.ProductID JOIN SalesLT.vGetAllCategories AS cat ON prod.ProductcategoryID = cat.ProductCategoryID) AS catsales PIVOT (SUM(LineTotal) FOR ParentProductCategoryName IN ([Accessories], [Bikes], [Clothing], [Components])) AS pivotedsales ORDER BY CompanyName;
Table 1.96: Displaying records 1 - 10 CompanyName Accessories Bikes Clothing Components Action Bicycle Specialists 1299.885 76613.6518 2461.6573 9494.082 Aerobic Exercise Company NA NA NA 1732.890 Bulk Discount Store 730.464 70597.2840 851.5620 1980.918 Central Bicycle Specialists NA NA NA 31.584 Channel Outlet 216.000 NA 308.6640 NA Closest Bicycle Store NA 20389.6680 559.1641 8001.846 Coalition Bike Company NA 529.4928 124.7760 1201.938 Discount Tours 72.000 2041.1880 341.0580 72.882 Eastside Department Store 1220.236 51096.0548 2772.2296 10594.848 Engineered Bike Systems NA 2604.7620 178.7460 63.900 1.9 Modifying Data
This section isn’t about querying data, but adding, deleting and updating data in our database.
To add data we use the INSERT … VALUES command, where you can leave blank columns that allow NULLs and cols with default constraints. You can also leave identify/primary key fields blank since they will be generated automatically, depending on what seed and increment values were specified when the table was set up, however it is possible to overide these values if you wish. You can add a NULL value to a column if you wish.
You can also insert the results of another query in to an existing table using INSERT … SELECT or INSERT … EXEC where EXEC is the execution of a stored query.
Another option is SELECT … INTO will explicity create a new table, inserting data based on query. However this newly created table will not have contraints, defaults, indexes or primary keys, just a set of columns with the results of the query. This is currently not supported in Azure, however you could create the table first with the items needed such as columns and pk, then use the INSERT INTO for Azure.
Our Identity property of a column generates sequential numbers for insertations of new data into a database and we can set the parameters of this identity column. If you insert a new item (row) in to a table, you might want to then get back this identity column for this new item e.g. you automatically generate a new sales ID but a new order, however you might want to then present this back to the customer, this is acheived using @(???), but this returns the last identity referenced, so if you are adding the sales order first then the individual items in a seperate table, it will retrieve the idenity column from the sales order details rather than sales order header. This command therefore returns the last identity from the current connected session.
An alternative is to use SELECT SCOPE_IDENTITY() AS ORDERID which wil bring back the last identity column for the specified table in the current session, but again this can be problematic. The alternative is to use IDENT_CURRENT(’
) which passes the last identity inserted into a particular table, however this is across all sessions, so if others are inserting the table this can be problematic. So generally the best option is to use SCOPE_IDENTITY and present that, before then going on to do other things. If you perhaps wanted to have two different tables for two different order types - say in store purchases and online purchases as two dinstinct tables - but to both use the same sequential list for order ID, you could do this using sequences. They exist independently of tables so can be referenced by mutiple tables, the query will ask the server what the next id is in the sequence and be provided with that. It is like a central store that issues new numbers on request.
So to create a new table for customer call logs we do the following:
CREATE TABLE SalesLT.CallLog ( CallID int IDENTITY PRIMARY KEY NOT NULL, -- no seed or increment set, so will start at 1 then add 1 CallTime datetime NOT NULL DEFAULT GETDATE(), -- if not date and time set, it will get the current date and time SalesPerson nvarchar(256) NOT NULL, CustomerID int NOT NULL REFERENCES SalesLT.Customer(CustomerID), -- references a foreign key from the customer table PhoneNumber nvarchar(25) NOT NULL, Notes nvarchar(max) NULL -- the only column which is allowed to be NULL, since perhaps the customer does not answer ); GO
If we then wanted to explicitly add or insert a row in to the datbase we do the following - note the CallID does not need to be given since this is automatically created :
– INT is optional, we state values as we are explicitly stating the values
INSERT INTO SalesLT.CallLog VALUES ('2015-01-01T12:30:00', 'adventure-works\pamela0', 1, '245-555-0173', 'Returning call re: enquiry about delivery');
If we wantedc to accept the default options, we can do that by using DEFAULT (e.g. for data/time) and if there are no notes we can use NULL.
INSERT INTO SalesLT.CallLog VALUES (DEFAULT, 'adventure-works\david8', 2, '170-555-0127', NULL);
Perhaps we need to re-order our fields rather than accepting the default sequence, perhaps it is a modified table or a table form another source. To do this, we explicitly state the columns.
INSERT INTO SalesLT.CallLog (SalesPerson, CustomerID, PhoneNumber) VALUES ('adventure-works\jillian0', 3, '279-555-0130');
If we wanted to add two records, we can do this at the same time, and even use some different options for the different rows.
INSERT INTO SalesLT.CallLog VALUES (DATEADD(mi,-2, GETDATE()), 'adventure-works\jillian0', 4, '710-555-0173', NULL), (DEFAULT, 'adventure-works\shu0', 5, '828-555-0186', 'Called to arrange deliver of order 10987');
There are more examples with adding query results into tables and getting identity values in the demo files.
** Key Points ** * Use the INSERT statement to insert one or more rows into a table.
* When inserting explicit values, you can omit identity columns, columns that allow NULLs, and columns on which a default constraint is defined.
* Identity columns generate a unique integer identifier for each row. You can also use a sequence to generate unique values that can be used in multiple tables.1.9.1 Updating and Deleting Data
The UPDATE command will update all the rows in a table or view and can be filtered using WHERE and the columns can be selected using FROM. It is more likely we will want to select a particular set of rows, so we will be using the WHERE clause in most instances. We use the SET clause in the UPDATE statement to assign a value and we can use a calculation and we can specify multiple columns e.g. SET unitprice = (unitprice *1.04), notes = ‘price increase by 4% due to inflation’. You can also update values in a table based on the results of a query. Only columns in the set clause are affected. There are a number of examples in the MSDN TSQL pages Transact-SQL Refrence.
A common challenge in data warehousing is what is sometimes referred to as ‘upserting’ - we are updating records that already exist AND we insert new records if they do not. To do this, we use the MERGE command, which modifies data based on a condition. We look at situations of when the source data matches the target data. So we might check if the ProductID already exists, and where it does we use an UPDATE. But if there is a record in the source table but on the target table (db) we will INSERT a new product. The code below is an example
MERGE INTO Production.Products as P USING Production.ProductsStaging as S ON P.ProductID=S.ProductID WHEN MATCHED THEN UPDATE SET P.UnitPrice = S.UnitPrice, P.Discontinued=S.Discontinued WHEN NOT MATCHED THEN INSERT (ProductName, CategoryID, UnitPrice, Discontinued) VALUES (S.ProductName, S.CategoryID, S.UnitPrice, S.Discontinued);
We can delete data from a table, however we need to be careful, if we don’t specify a WHERE clause we may delete a lot of data!
DELETE FROM Sales.OrderDetails WHERE orderid = 10248;
There is an overhead with this approach, since SQL Server will record this transaction. If we are deleting a whole table and wish to deallocate the entire table space, we are based to use TRUNCATE. TRUNCATE TABLE Will fail if the table is referenced by a foreign key constraint in another table, the same will happen with the DELETE command. The table definition remains, but all the data is removed and the space reallocated. This is often used in staging tables, so we might load new data in to a staging table, then use the MERGE command for your target table, then empty the staging table but keep the strucutre, so we use TRUNCATE.
So if we wanted to add a value to where a value is currently NULL, for instance to state that no notes were recorded, we can do this as:
UPDATE SalesLT.CallLog SET Notes = 'No notes' WHERE Notes IS NULL;
Or we can update multiple columns, in the following example we set two empty strings using ’’ - note this does it for the ENTIRE table, as there is no WHERE clause.
UPDATE SalesLT.CallLog SET SalesPerson = '', PhoneNumber = '';
Now we want to re-add the data just deleted, using a query, getting Sales Person and Phone Number from the customer table, where the CustomerID matches.
UPDATE SalesLT.CallLog SET SalesPerson = c.SalesPerson, PhoneNumber = c.Phone FROM SalesLT.Customer AS c WHERE c.CustomerID = SalesLT.CallLog.CustomerID;
We might want to archive off then delete anything older than this week. This can be achived by using DELETE and WHERE.
DELETE FROM SalesLT.CallLog WHERE CallTime < DATEADD(dd, -7, GETDATE());
Or if we want to complete wipe the call log table we use TRUNCATE.
TRUNCATE TABLE SalesLT.CallLog;
** Key Points ** * Use the UPDATE statement to modify the values of one or more columns in specified rows of a table.
* Use the DELETE statement to delete specified rows in a table.
* Use the MERGE statement to insert, update, and delete rows in a target table based on data in a source table.1.9.2 Lab Exercises
AdventureWorks has started selling the new product - LED Lights. Insert it into the SalesLT.Product table, using default or NULL values for unspecified columns. Once you’ve inserted the product, run SELECT SCOPE_IDENTITY(); to get the last identity value that was inserted. Add a query to view the row for the product in the SalesLT.Product table.
INSERT INTO SalesLT.Product (Name, ProductNumber, StandardCost, ListPrice, ProductCategoryID, SellStartDate) VALUES ('LED Lights', 'LT-L123', 2.56, 12.99, 37, GETDATE()); SELECT SCOPE_IDENTITY(); SELECT * FROM SalesLT.Product WHERE ProductID = SCOPE_IDENTITY();
Insert the two new products with the appropriate ProductCategoryID value, based on the product details above.
Finish the query to join the SalesLT.Product and SalesLT.ProductCategory tables. That way, you can verify that the data has been inserted. Make sure to use the aliases provided, and default column names elsewhere.
INSERT INTO SalesLT.ProductCategory (ParentProductCategoryID, Name) VALUES (4, 'Bells and Horns'); INSERT INTO SalesLT.Product (Name, ProductNumber, StandardCost, ListPrice, ProductCategoryID, SellStartDate) VALUES ('Bicycle Bell', 'BB-RING', 2.47, 4.99, IDENT_CURRENT('SalesLT.ProductCategory'), GETDATE()), ('Bicycle Horn', 'BB-PARP', 1.29, 3.75, IDENT_CURRENT('SalesLT.ProductCategory'), GETDATE()); SELECT c.Name As Category, p.Name AS Product FROM SalesLT.Product AS p JOIN SalesLT.ProductCategory as c ON p.ProductCategoryID = c.ProductCategoryID WHERE p.ProductCategoryID = IDENT_CURRENT('SalesLT.ProductCategory');
You must now update the records you have previously inserted to reflect the correct pricing. The sales manager at AdventureWorks has mandated a 10% price increase for all products in the Bells and Horns category. Update the rows in the SalesLT.Product table for these products to increase their price by 10%.
UPDATE SalesLT.Product SET ListPrice = ListPrice * 1.1 WHERE ProductCategoryID = (SELECT ProductCategoryID FROM SalesLT.ProductCategory WHERE Name = 'Bells and Horns');
The new LED lights you inserted in the previous challenge are to replace all previous light products. Update the SalesLT.Product table to set the DiscontinuedDate to today’s date for all products in the Lights category (ProductCategoryID 37) other than the LED Lights product you inserted previously. So we delete records from the SalesLT.Product table first then delete records from the SalesLT.ProductCategory table.
DELETE FROM SalesLT.Product WHERE ProductCategoryID = (SELECT ProductCategoryID FROM SalesLT.ProductCategory WHERE Name = 'Bells and Horns'); DELETE FROM SalesLT.ProductCategory WHERE ProductCategoryID = (SELECT ProductCategoryID FROM SalesLT.ProductCategory WHERE Name = 'Bells and Horns');
UPDATE SalesLT.Product SET DiscontinuedDate = GETDATE() WHERE ProductCategoryID = 37 AND ProductNumber <> 'LT-L123';
The Bells and Horns category has not been successful and it must be deleted from the database. Delete the records for the Bells and Horns category and its products. You must ensure that you delete the records from the tables in the correct order to avoid a foreign-key constraint violation.
1.10 Programming with Transact-SQL
We will start looking at procedural logic in this section.
1.10.1 Batches
Batches are sets of commands to to SQL Server as a unit. We can seperate out elements using the GO command, which will send that batch. GO is not a T-SQL command, it interpretor which uses the GO command to send that batch.
1.10.3 Variables
Variables - usually single scalar values - can be used for many things. We delare the variable, give it a data type, then initialise it with a value if you like, or assign a value later. They are declaring using DECLARE (???) nvarchar(15)
DECLARE @City VARCHAR(20)='Toronto';
But rather than declaring a variables value, we can have a variable which receives something e.g. the results of a select query.
DECLARE @Result money SELECT @Result=MAX(TotalDue) FROM SalesLT.SalesOrderHeader PRINT @Result
** Key Points ** * A batch defines a group of Transact-SQL command submitted by a client application for execution. Some commands can only be executed at the start of a new batch, and variable values cannot span batches.
* Use comments to document your Transact-SQL code. Inline comments are prefixed by –, and multi-line comment blocks are enclosed in /* and /.
Declare variables by using the DECLARE keyword, specifying a name (prefixed with @) and a data type. You can optionally assign an initial value.
* Assign values to variables by using the SET keyword or in a SELECT statement.1.10.4 Conditional Branching
An IF statement is one such an example, do one thing or something else or indeed nothing. We can go further using IF and ELSE. We can also use this for providing messages, as shown below where we print a message if the row is <1 i.e. it is not found.
It is not neccessary to add a BEGIN adn END in all cases, but it can be easier to read and helps when possibly expanding the IF list later.
UPDATE SalesLT.Product SET DiscontinuedDate=getdate() WHERE ProductID=1; IF @@ROWCOUNT<1 BEGIN PRINT 'Product was not found' END ELSE BEGIN PRINT 'Product Updated' END
** Key Points ** * Use the IF keyword to execute a task based on the results of a conditional test.
* Use an ELSE clause if you need to execute an alternative task if the conditional test returns false.
* Enclose mulitple statements in an IF or ELSE clause between BEGIN and END keywords.1.10.5 Looping
Typically in SQL Server we don’t want to loop through our data, we work on sets of data, as looping can be slow, so the ideal is to work with sets. A loop will end where a condition (predicate) evaluates to FALSE or unknown. Either we have a BREAK which stops everything or continue, which will move on to the next set of code. So in the example below 5 rows will be affected, as we set the counter as <= 5, adding a row. Looping is best used when you might need to check for something, where we don’t know how many times we might need to do something, a situation which is not that common in SQL Server.
DECLARE @Counter int=1 WHILE @Counter <=5 BEGIN INSERT SalesLT.DemoTable(Description) VALUES ('ROW '+CONVERT(varchar(5),@Counter)) SET @Counter=@Counter+1 END SELECT Description FROM SalesLT.DemoTable
** Key Points **
- Use a WHILE loop if you need to repeat one or more statements until a specified condition is true.
- Use BREAK and CONTINUE to exit or restart the loop.
- Avoid using loops to iteratively update or retrieve single records - in most cases, you should use set-based operations to retrieve and modify data.
1.10.6 Stored Procedures
This is encapsulated a block of code in to a named block. They are setup to take parameters as inputs and can use conidtional branching (IF ELSE) and other such items. Sounds an awful lot like a function in R. In the example below, if we don’t specify a value (the (???)) then we run a query which will select all the products (the IF element). If I do specify a category, then return the products in the category (ELSE).
CREATE PROCEDURE SalesLT.GetProductsByCategory (@CategoryID INT = NULL) AS IF @CategoryID IS NULL SELECT ProductID, Name, Color, Size, ListPrice FROM SalesLT.Product ELSE SELECT ProductID, Name, Color, Size, ListPrice FROM SalesLT.Product WHERE ProductCategoryID = @CategoryID;
The first time around, the object - the stored proceedure - will be created. To use it, we must use an EXEC or EXECUTE (UTE is optional) statement.
EXEC SalesLT.GetProductsByCategory
If we want to pass a paramter e.g. category 6, we do a similar exercise.
EXEC SalesLT.GetProductsByCategory 6
** Key Points ** * Use stored procedures to encapsulate Transact-SQL code in a reusable database objects.
* You can define parameters for a stored procedure, and use them as variables in the Transact-SQL code it contains.
* Stored procedures can return rowsets (usually the results of a SELECT statement). They can also return output parameters, and they always return a return value, which is used to indicate status.1.10.7 Lab Exercises
You want to create reusable scripts that make it easy to insert sales orders. You plan to create a script to insert the order header record, and a separate script to insert order detail records for a specified order header.
Both scripts will make use of variables to make them easy to reuse. Your script to insert an order header must enable users to specify values for the order date, due date, and customer ID.
DECLARE @OrderDate datetime = GETDATE(); DECLARE @DueDate datetime = DATEADD(dd, 7, GETDATE()); DECLARE @CustomerID int = 1; INSERT INTO SalesLT.SalesOrderHeader (OrderDate, DueDate, CustomerID, ShipMethod) VALUES (@OrderDate, @DueDate, @CustomerID, 'CARGO TRANSPORT 5'); PRINT SCOPE_IDENTITY();
As a next step, you want to insert an order detail. The script for this must enable users to specify a sales order ID, a product ID, a quantity, and a unit price.
The script should check if the specified sales order ID exists in the SalesLT.SalesOrderHeader table. This can be done with the EXISTS predicate.
If it does, the code should insert the order details into the SalesLT.SalesOrderDetail table (using default values or NULL for unspecified columns).
If the sales order ID does not exist in the SalesLT.SalesOrderHeader table, the code should print the message ‘The order does not exist’.
-- Code from previous exercise DECLARE @OrderDate datetime = GETDATE(); DECLARE @DueDate datetime = DATEADD(dd, 7, GETDATE()); DECLARE @CustomerID int = 1; INSERT INTO SalesLT.SalesOrderHeader (OrderDate, DueDate, CustomerID, ShipMethod) VALUES (@OrderDate, @DueDate, @CustomerID, 'CARGO TRANSPORT 5'); DECLARE @OrderID int = SCOPE_IDENTITY(); -- Additional script to complete DECLARE @ProductID int = 760; DECLARE @Quantity int = 1; DECLARE @UnitPrice money = 782.99; IF EXISTS (SELECT * FROM SalesLT.SalesOrderDetail WHERE SalesOrderID = @OrderID) BEGIN INSERT INTO SalesLT.SalesOrderDetail (SalesOrderID, OrderQty, ProductID, UnitPrice) VALUES (@OrderID, @Quantity, @ProductID, @UnitPrice) END ELSE BEGIN PRINT 'The order does not exist' END
Adventure Works has determined that the market average price for a bike is $2,000, and consumer research has indicated that the maximum price any customer would be likely to pay for a bike is $5,000.
You must write some Transact-SQL logic that incrementally increases the list price for all bike products by 10% until the average list price for a bike is at least the same as the market average, or until the most expensive bike is priced above the acceptable maximum indicated by the consumer research.
The product categories in the Bikes parent category can be determined from the SalesLT.vGetAllCategories view.
** Key Points **
- The loop should execute only if the average list price of a product in the ‘Bikes’ parent category is less than the market average.
- Update all products that are in the ‘Bikes’ parent category, increasing the list price by 10%.
- Determine the new average and maximum selling price for products that are in the ‘Bikes’ parent category.
- If the new maximum price is greater than or equal to the maximum acceptable price, exit the loop; otherwise continue.
DECLARE @MarketAverage money = 2000; DECLARE @MarketMax money = 5000; DECLARE @AWMax money; DECLARE @AWAverage money; SELECT @AWAverage = AVG(ListPrice), @AWMax = MAX(ListPrice) FROM SalesLT.Product WHERE ProductCategoryID IN (SELECT DISTINCT ProductCategoryID FROM SalesLT.vGetAllCategories WHERE ParentProductCategoryName = 'Bikes'); WHILE @AWAverage < @MarketAverage BEGIN UPDATE SalesLT.Product SET ListPrice = ListPrice * 1.1 WHERE ProductCategoryID IN (SELECT DISTINCT ProductCategoryID FROM SalesLT.vGetAllCategories WHERE ParentProductCategoryName = 'Bikes'); SELECT @AWAverage = AVG(ListPrice), @AWMax = MAX(ListPrice) FROM SalesLT.Product WHERE ProductCategoryID IN (SELECT DISTINCT ProductCategoryID FROM SalesLT.vGetAllCategories WHERE ParentProductCategoryName = 'Bikes'); IF @AWMax >= @MarketMax END ELSE CONTINUE END PRINT 'New average bike price:' + CONVERT(VARCHAR, @AWAverage); PRINT 'New maximum bike price:' + CONVERT(VARCHAR, @AWMax);
1.11 Error Handling and Transactions
You can raise errors yourself because of a condition, and not just some system error. For instance you can introduce business logic, so whilst a transaction in to the database might in principle be ok, you might not see that some business rules are valide i.e. too high of a discount price for instance. You can also pass error mesages back to client applications.
The errors have different levels of severity from 1 to 25 and includes the line number which caused the error. A line with proceedure will state which procedure or sub-proceedure caused the problem. If the error message is particularly high, SQL Server will also udnertake some certain actions to mitigte some of these problems like disconnect from the server if the disk is corrupt. A state is also given, which can be useful when seeking assistance for troubleshooting. Within the database error messages are stored in the sys.messages table in the database and you can add custom messages using sp_addmessage to this table (note this is for SQL Server and not Azure). It is better and more ‘current’ to throw custom error messages within your code, rather than in the db table, and this approach will work with SQL on Azure.
1.11.1 Raising or Throwing Errors
if you wish to raise an error you can do so using the RAISERROR command - this works in both SQL Server and Azure. You can specify the serverity and state when doing so.
Another way of raising errors which is newer syntax, it will in time replace RAISERROR, which is THROW. Such user defined errors can only be generated for lower levels of severity, so higher level problems don’t cause SQL Server to take mitigating actions like disconnecting the server. The customer error message must be above 50000.
In the following example, the SQL query would run, but we introduce a rule based on business logic which will return an error if no rows are affected by the query. Nothing has failed, but nothing actually happended.
UPDATE SalesLT.Product SET DiscontinuedDate = GETDATE() WHERE ProductID = 0; IF @@ROWCOUNT < 1 RAISERROR('The product was not found - no products have been updated', 16, 0);
Or we can use the newer THROW command.
UPDATE SalesLT.Product SET DiscontinuedDate = GETDATE() WHERE ProductID = 0; IF @@ROWCOUNT < 1 THROW 50001, 'The product was not found - no products have been updated', 0;
** Key Points ** * System errors have pre-defined numbers, messages, severity levels, and other characteristics that you can use to troubleshoot issues.
* You can use RAISERROR and THROW to raise custom errors.1.11.2 Catching and Handling Errors
This is sometimes called stuctured exception handling, where we have a TRY block of code which is something we want to run, then we have a CATCH block of code so that if the TRY code fails we catch and handle that error. We either deal with the rror and fix it or thrown an error to the compiler. IF the TRY block code works, no error will be thrown. There are a number of system global variables we can call for errors depending on what we are trying to catch. We can throw the original message, or we can throw a custom error.
In the first example below, we update the product table using the ProductNumber as ProductID divided by the weight. If the weight is missing, we add 0 in to the divide, which is likely to cause problems since this divides by infinity. If something does go wrong, we will print an error message of what occured. No red error text will occur, since the application is correctly running - the rror is captured by the code.
BEGIN TRY UPDATE SalesLT.Product SET ProductNumber = ProductID / ISNULL(Weight, 0); END TRY BEGIN CATCH PRINT 'The following error occurred:'; PRINT ERROR_MESSAGE(); END CATCH;
Another approach is to return the error to the application, so that it can do it’s own error handling. Here we re THROW the error back to the client application. This time the error will be returned in red as an error.
BEGIN TRY UPDATE SalesLT.Product SET ProductNumber = ProductID / ISNULL(Weight, 0); END TRY BEGIN CATCH PRINT 'The following error occurred:'; PRINT ERROR_MESSAGE(); THROW; END CATCH;
Anotehr more advanced way would be to create some variables in the CATCH block and use a stored procedure in the database. This stored procedure will store the error in to an error log table and throws that error back to the stored procedure. This means you can transport the strored proceedure across different databases without having to re-write code and still have a consistent way to handle errors.
BEGIN TRY UPDATE SalesLT.Product SET ProductNumber = ProductID / ISNULL(Weight, 0); END TRY BEGIN CATCH DECLARE @ErrorLogID as int, @ErrorMsg AS varchar(250); EXECUTE dbo.uspLogError @ErrorLogID OUTPUT; SET @ErrorMsg = 'The update failed because of an error. View error #' + CAST(@ErrorLogID AS varchar) + ' in the error log for details.'; THROW 50001, @ErrorMsg, 0; END CATCH;
Then we can view the error log if we want.
SELECT * FROM dbo.ErrorLog;
Table 1.97: Displaying records 1 - 10 ErrorLogID ErrorTime UserName ErrorNumber ErrorSeverity ErrorState ErrorProcedure ErrorLine ErrorMessage 1 2018-03-14 12:16:08 dbo 8134 16 1 NA 2 Divide by zero error encountered. 2 2018-03-16 13:25:48 dbo 8134 16 1 NA 2 Divide by zero error encountered. 3 2018-03-16 13:27:24 dbo 8134 16 1 NA 2 Divide by zero error encountered. 4 2018-03-16 13:27:33 dbo 8134 16 1 NA 2 Divide by zero error encountered. 5 2018-03-16 13:27:50 dbo 8134 16 1 NA 2 Divide by zero error encountered. 6 2018-03-23 04:03:28 dbo 8134 16 1 NA 2 Divide by zero error encountered. 7 2018-03-23 04:08:49 dbo 8134 16 1 NA 2 Divide by zero error encountered. 8 2018-03-23 04:20:56 dbo 8134 16 1 NA 2 Divide by zero error encountered. 9 2018-03-23 04:21:37 dbo 8134 16 1 NA 2 Divide by zero error encountered. 10 2018-03-23 04:22:19 dbo 8134 16 1 NA 2 Divide by zero error encountered. ** Key Points ** * Use TRY…CATCH blocks in your Transact-SQL code to catch and handle exceptions.
* A common exception handling pattern is to log the error, and then if the operation cannot be completed successfully, throw it (or a new custom error) to the calling application.1.11.3 Transactions
We are concerned about protecting the integrity of the database. A transaction is a group of tasks, possibly inserting in to multiple tables. A transaction can be ACID
- Atomic - one unit of work that everythign contributes towards, you can split it up in to seperate parts
- Consistent - when the transaction is finished, there won’t be any orphaned records or rows
- Isolated - whilst that transaction is happening, other transactions or queries are not effected, they won’t get to see the in flight data
- Durable - one finished the data will be stored in the database unless we explicity DELETE it
A transaction must either fully succeed or wholly fail, we cannot have a partial succeed. So for instance if we add a new order, we might add summary info in to a order header table then all the prodcuts into the order detail table. We don’t want half of this to work - just inserts in to one table - the whole thing, the transaction, must take place for it to be successful. If for some reason it something doesn’t work, perhaps as a consequence for a system fail or network issue, we want the operation to fail not partially succeed or complete. SQL Server will handle this by itself, you don’t need to add any transaction error codes or error handling, it will lock or isolate the data as required.
However, if there are multiple statements that you want to execute, it is best done as an encapsulated transaction explicitly. This is done by BEGIN TRANSACTION
then COMMIT TRANSACTION if there is an error will can throw an error and use ROLLBACK TRANSACTION. These can be nexted TRANSACTIONS which can get complicated. If you ROLLBACK it will ROLLBACK right back to the beggining - so it is Atomic and Consistent. Rather than use ROLLBACK you can use XACT_ABORT which will rollback all transactions if an error is encountered. So if we had some code as shown below, this would lead to a record being adding to the order header, but no details being added to the product table, since the product we are trying to add (ProductID 9999) does not exist. This would lead to an orphaned sales order header.
BEGIN TRY INSERT INTO SalesLT.SalesOrderHeader (DueDate, CustomerID, ShipMethod) VALUES (DATEADD(dd, 7, GETDATE()), 1, 'STD DELIVERY'); DECLARE @SalesOrderID int = SCOPE_IDENTITY(); INSERT INTO SalesLT.SalesOrderDetail (SalesOrderID, OrderQty, ProductID, UnitPrice, UnitPriceDiscount) VALUES (@SalesOrderID, 1, 99999, 1431.50, 0.00); END TRY BEGIN CATCH PRINT ERROR_MESSAGE(); END CATCH;
So we want to add this code within a single transaction, so if one part fails the whole thing will. In this instance it wil try to do both parts individually, but the whole thing will fail as the second part - adding in an invalid productID - fails. So in effect it will undo the transaction - the first part which succeeds adding the order header. If you have many rows, this can therefore be time consuming as it might try to add 10,000 rows, then fails at a later stage, then has to go back and undo all those 10,000 rows. So you may want to add some logic at the beggining of the transaction, before this addition and error followed by rollback occurs.
BEGIN TRY BEGIN TRANSACTION INSERT INTO SalesLT.SalesOrderHeader (DueDate, CustomerID, ShipMethod) VALUES (DATEADD(dd, 7, GETDATE()), 1, 'STD DELIVERY'); DECLARE @SalesOrderID int = SCOPE_IDENTITY(); INSERT INTO SalesLT.SalesOrderDetail (SalesOrderID, OrderQty, ProductID, UnitPrice, UnitPriceDiscount) VALUES (@SalesOrderID, 1, 99999, 1431.50, 0.00); COMMIT TRANSACTION END TRY BEGIN CATCH IF @@TRANCOUNT > 0 BEGIN PRINT XACT_STATE(); ROLLBACK TRANSACTION; END PRINT ERROR_MESSAGE(); THROW 50001,'An insert failed. The transaction was cancelled.', 0; END CATCH;
As previously discussed, we can use XACT_ABORT rather than ROLLBACK. Again we might want to add some logic at the beggining to check for validity before adding and undoing mutiple
SET XACT_ABORT ON; BEGIN TRY BEGIN TRANSACTION INSERT INTO SalesLT.SalesOrderHeader (DueDate, CustomerID, ShipMethod) VALUES (DATEADD(dd, 7, GETDATE()), 1, 'STD DELIVERY'); DECLARE @SalesOrderID int = SCOPE_IDENTITY(); INSERT INTO SalesLT.SalesOrderDetail (SalesOrderID, OrderQty, ProductID, UnitPrice, UnitPriceDiscount) VALUES (@SalesOrderID, 1, 99999, 1431.50, 0.00); COMMIT TRANSACTION END TRY BEGIN CATCH PRINT ERROR_MESSAGE(); THROW 50001,'An insert failed. The transaction was cancelled.', 0; END CATCH; SET XACT_ABORT OFF;
** Key Points ** * Transactions are used to protect data integrity by ensuring that all data changes within a transaction succeed or fail as a unit.
* Individual Transact-SQL statements are inherently treated as transactions, and you can define explicit transactions that encompass mulitple * statements.
* Use the BEGIN TRANSACTION, COMMIT TRANSACTION, and ROLLBACK TRANSACTION statements to manage transactions.
* Enable the XACT_ABORT option to automatically rollback all transactions if an exception occurs.
* Use the @(???) system variable and XACT_STATE system function to determine transaction status.1.11.4 Lab Exercises
You are implementing a Transact-SQL script to delete orders, and you want to handle any errors that occur during the deletion process.
DECLARE @OrderID int = 0 -- Declare a custom error if the specified order doesn't exist DECLARE @error VARCHAR(25) = 'Order #' + cast(@OrderID as VARCHAR) + ' does not exist'; IF NOT NULL (SELECT * FROM SalesLT.SalesOrderHeader WHERE SalesOrderID = @OrderID) BEGIN THROW 50001, @error, 0; END ELSE BEGIN SELECT FROM SalesLT.SalesOrderDetail WHERE SalesOrderID = @OrderID; DELETE FROM SalesLT.SalesOrderHeader WHERE SalesOrderID = @OrderID; END
Now your code now throws an error if the specified order does not exist. Refine your code to catch this (or any other) error and print the error message to the user interface using the PRINT command. You can use BEGIN TRY, END TRY, BEGIN CATCH and END CATCH for this.
DECLARE @OrderID int = 71774 DECLARE @error VARCHAR(25) = 'Order #' + cast(@OrderID as VARCHAR) + ' does not exist'; -- Wrap IF ELSE in a TRY block TRY CATCH IF NOT EXISTS (SELECT * FROM SalesLT.SalesOrderHeader WHERE SalesOrderID = @OrderID) BEGIN THROW 50001, @error, 0 END ELSE BEGIN DELETE FROM SalesLT.SalesOrderDetail WHERE SalesOrderID = @OrderID; DELETE FROM SalesLT.SalesOrderHeader WHERE SalesOrderID = @OrderID; END END CATCH -- Add a CATCH block to print out the error BEGIN CATCH PRINT ERROR_MESSAGE(); END CATCH
You have implemented error handling logic in some Transact-SQL code that deletes order details and order headers. However, you are concerned that a failure partway through the process will result in data inconsistency in the form of undeleted order headers for which the order details have been deleted.
Your task is to enhance the code you created in the previous challenge so that the two DELETE statements are treated as a single transactional unit of work.
DECLARE @OrderID int = 0 DECLARE @error VARCHAR(25) = 'Order #' + cast(@OrderID as VARCHAR) + ' does not exist'; BEGIN TRY IF NOT EXISTS (SELECT * FROM SalesLT.SalesOrderHeader WHERE SalesOrderID = @OrderID) BEGIN THROW 50001, @error, 0 END ELSE BEGIN BEGIN TRANSACTION DELETE FROM SalesLT.SalesOrderDetail WHERE SalesOrderID = @OrderID; DELETE FROM SalesLT.SalesOrderHeader WHERE SalesOrderID = @OrderID; END TRANSACTION END END TRY BEGIN CATCH IF @@TRANCOUNT > 0 BEGIN ROLLBACK TRANSACTION; END ELSE BEGIN PRINT ERROR_MESSAGE(); END END CATCH
1.12 Final Assessment
1.12.1 Section 1
Select the quantity per unit for all products in the Products table.
SELECT QuantityPerUnit FROM Products;
Select the unique category IDs from the Products table.
SELECT DISTINCT CategoryID FROM Products;
Select the names of products from the Products table which have more than 20 units left in stock.
SELECT ProductName FROM Products WHERE UnitsInStock > 20;
Select the product ID, product name, and unit price of the 10 most expensive products from the Products table.
SELECT TOP (10) ProductID, ProductName, UnitPrice FROM Products ORDER BY UnitPrice DESC;
Select the product ID, product name, and quantity per unit for all products in the Products table. Sort your results alphabetically by product name.
SELECT ProductID, ProductName, QuantityPerUnit FROM Products ORDER BY ProductName;
Select the product ID, product name, and unit price of all products in the Products table. Sort your results by number of units in stock, from greatest to least.
Skip the first 10 results and get the next 5 after that.
SELECT ProductID, ProductName, UnitPrice FROM Products ORDER BY UnitsInStock DESC OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY;
Use STR, CONVERT, and NVARCHAR(30) where appropriate to display the first name, employee ID and birthdate (as Unicode in ISO 8601 format) for each employee in the Employees table.
SELECT FirstName + ' has an EmployeeID of ' + STR(EmployeeID, 1) + ' and was born ' + CONVERT(NVARCHAR(30), BirthDate, 126) FROM Employees;
Select from the Orders table.
The first column of your result should be a single string in exactly the following format:
<
> is from < > If there is no ShipCity, then you should select ShipRegion, and if there is no ShipRegion you should select ShipCountry.
SELECT ShipName + ' is from ' + CASE WHEN ShipCity IS NULL THEN 'ShipRegion' ELSE ShipCity END FROM Orders;
The above works, but doesn’t properly handle ShipCountry, a better option is to use the COALESCE which returns the first non-null value. The AS destination is optional, if not specified the column will not have a title.
SELECT ShipName + ' is from ' + COALESCE(ShipCity, ShipRegion,ShipCountry) AS destination FROM Orders;
Select the ship name and ship postal code from the Orders table. If the postal code is missing, display ‘unknown’. The following answer may be differnt for other dbs or flavours of SQL.
SELECT ShipName, ISNULL(ShipPostalCode, 'unknown') FROM Orders;
Using the Suppliers table, select the company name, and use a simple CASE expression to display ‘outdated’ if the company has a fax number, or ‘modern’ if it doesn’t.
SELECT CompanyName, CASE WHEN Fax IS NOT NULL THEN 'outdated' ELSE 'modern' END AS Status FROM Suppliers;
1.12.2 Section 2
Get the order ID and unit price for each order by joining the Orders table and the Order Details table. Note that you need to use [Order Details] since the table name contains whitespace.
SELECT o.OrderID, od.UnitPrice FROM Orders AS o INNER JOIN [Order Details] AS od ON o.OrderID = od.OrderID;
Get the order ID and first name of the associated employee by joining the Orders and Employees tables.
SELECT o.OrderID, e.FirstName FROM Orders AS o INNER JOIN Employees as e ON o.EmployeeID = e.EmployeeID;
Get the employee ID and related territory description for each territory an employee is in, by joining the Employees, EmployeeTerritories and Territories tables.
SELECT e.EmployeeID, t.TerritoryDescription FROM Employees AS e INNER JOIN EmployeeTerritories AS et ON e.EmployeeID = et.EmployeeID INNER JOIN Territories AS t ON et.TerritoryID = t.TerritoryID;
Select all the different countries from the Customers table and the Suppliers table using UNION.
SELECT Country FROM Customers UNION SELECT Country FROM Suppliers ORDER BY Country;
Select all the countries, including duplicates, from the Customers table and the Suppliers table using UNION ALL.
SELECT Country FROM Customers UNION ALL SELECT Country FROM Suppliers ORDER BY Country;
Using the Products table, get the unit price of each product, rounded to the nearest dollar.
SELECT ROUND(UnitPrice, 0) FROM Products;
Using the Products table, get the total number of units in stock across all products.
SELECT SUM(UnitsInStock) FROM Products;
Using the Orders table, get the order ID and year of the order by using YEAR(). Alias the year as OrderYear.
SELECT OrderID, YEAR(OrderDate) AS OrderYear FROM Orders;
Using the Orders table, get the order ID and month of the order by using DATENAME(). Alias the month as OrderMonth.
SELECT OrderID, DATENAME(mm, OrderDate) AS OrderMonth FROM Orders;
Use LEFT() to get the first two letters of each region description from the Region table.
SELECT LEFT(RegionDescription, 2) FROM Region;
Using the Suppliers table, select the city and postal code for each supplier, using WHERE and ISNUMERIC() to select only those postal codes which have no letters in them.
SELECT City, PostalCode FROM Suppliers WHERE ISNUMERIC(PostalCode) = 1;
Use LEFT() and UPPER() to get the first letter (capitalized) of each region description from the Region table.
SELECT UPPER(LEFT(RegionDescription, 1)) FROM Region;
1.12.3 Section 3
Use a subquery to get the product name and unit price of products from the Products table which have a unit price greater than the average unit price from the Order Details table.
Note that you need to use [Order Details] since the table name contains whitespace.
SELECT ProductName, UnitPrice FROM Products WHERE UnitPrice > (SELECT AVG(UnitPrice) FROM [Order Details]);
Select from the Employees and Orders tables. Use a subquery to get the first name and employee ID for employees who were associated with orders which shipped from the USA.
SELECT FirstName, EmployeeID FROM Employees WHERE EmployeeID IN (SELECT DISTINCT(EmployeeID) FROM Orders WHERE ShipCountry = 'USA');
Create a new temporary table called ProductNames which has one field called ProductName (a VARCHAR of max length 40). Insert into this table the names of every product from the Products table. Select all columns from the ProductNames table you created.
CREATE TABLE #ProductNames (ProductName VARCHAR(40)) INSERT INTO #ProductNames SELECT ProductName FROM Products SELECT * FROM #ProductNames;
1.12.4 Section 4
Use CHOOSE() and MONTH() to get the season in which each order was shipped from the Orders table. You should select the order ID, shipped date, and then the season aliased as ShippedSeason. Be careful to filter out any NULL shipped dates.
SELECT OrderID, ShippedDate, CHOOSE (MONTH(ShippedDate), 'Winter', 'Winter', 'Spring', 'Spring', 'Spring', 'Summer', 'Summer', 'Summer', 'Autumn', 'Autumn', 'Autumn', 'Winter') AS ShippedSeason FROM Orders WHERE ShippedDate IS NOT NULL;
Using the Suppliers table, select the company name and use a simple IIF expression to display ‘outdated’ if a company has a fax number, or ‘modern’ if it doesn’t. Alias the result of the IIF expression to Status.
SELECT CompanyName, IIF(FAX IS NOT NULL, 'outdated', 'modern') AS Status FROM Suppliers;
Select from the Customers, Orders, and Order Details tables. Note that you need to use [Order Details] since the table name contains whitespace.
Use GROUP BY and ROLLUP() to get the total quantity ordered by all countries, while maintaining the total per country in your result set.
Your first column should be the country, and the second column the total quantity ordered by that country, aliased as TotalQuantity.
SELECT c.Country, SUM(od.Quantity) AS TotalQuantity FROM Customers AS c JOIN Orders as o ON c.CustomerID = o.CustomerID JOIN [Order Details] AS od ON o.OrderID = od.OrderID GROUP BY ROLLUP (c.Country);
From the Customers table, use GROUP BY to select the country, contact title, and count of that contact title aliased as Count, grouped by country and contact title (in that order).
Then use CASE WHEN, GROUPING_ID(), and ROLLUP() to add a column called Legend, which shows one of two things:
When the GROUPING_ID is 0, show ‘’ (i.e., nothing) When the GROUPING_ID is 1, show Subtotal for << Country >>’
SELECT Country, ContactTitle, COUNT(ContactTitle) AS Count, CASE WHEN GROUPING_ID(Country, ContactTitle) = 0 THEN '' WHEN GROUPING_ID(Country, ContactTitle) = 1 THEN 'Subtotal for ' + Country END AS Legend FROM Customers GROUP BY ROLLUP (Country, ContactTitle);
Convert the following query to be pivoted, using PIVOT().
SELECT CategoryID, AVG(UnitPrice) FROM Products GROUP BY CategoryID;
Note the first part of the answer defines the item that appears in the row, the second ‘Per Category’ is the table title.
SELECT 'Average Unit Price' AS 'Per Category', [1], [2], [3], [4], [5], [6], [7], [8] FROM ( SELECT CategoryID, UnitPrice FROM Products) AS SourceTable PIVOT ( AVG(UnitPrice) FOR CategoryID IN ([1], [2], [3], [4], [5], [6], [7], [8])) AS PivotTable;
So we go from long to wide format as illustrated below.
Insert into the Region table the region ID 5 and the description ‘Space’.
Then, in a second query, select the newly inserted data from the table using a WHERE clause.
INSERT INTO Region VALUES (5, 'Space'); SELECT * FROM Region WHERE RegionID = 5;
Update the region descriptions in the Region table to be all uppercase, using SET and UPPER().
Next, select all data from the table to view your updates.
UPDATE Region SET RegionDescription = UPPER(RegionDescription); SELECT * FROM Region;
Declare a custom region (???) called ‘Space’, of type NVARCHAR(25).
Use IF NOT EXISTS, ELSE, and BEGIN..END to throw an error message ‘Error!’ if (???) is not in the Region table. Do this check using SELECT *.
If the specified description does exist, you should select all columns for that region from the Region table.
DECLARE @region AS NVARCHAR(25) = 'Space'; TRY CATCH IF NOT EXISTS (SELECT * FROM Region WHERE RegionDescription = @region) BEGIN PRINT 'Error!' END ELSE BEGIN SELECT * FROM Region; END END CATCH;
1.10.2 Comments
We can add a code block using /* Comment */ for multiple lines or a comment block, this can come in useful to explain what the code is for or why you are doing something a certain way.