Study Notes
Preface
1
Importing data - Part 1
1.1
Introduction
1.2
Reading CSV files
1.3
Reading tab deliminated files or other table formats
1.4
Readr and data.table
1.4.1
data.table fread
1.5
Reading Excel files
1.5.1
Alternatives for importing Excel files
1.6
XLConnect - read and write to excel
2
Importing data - Part 2
2.1
Importing from Databases - 1
2.2
SQL Queries Inside R
2.3
Web Data
2.4
JSON and APIs
2.5
Importing from other statistical software
References
3
Writing Functions in R
3.1
Function overview
3.1.1
Scoping
3.1.2
Data Structures
3.1.3
For loops
3.2
When and how you should write a function
3.2.1
Rescale example
3.2.2
Write a function step by step
3.2.3
How can you write a good function?
3.3
Functional Programming
3.3.1
Using purrr
3.3.2
Shortcuts
3.4
Advanced Inputs and Outputs
3.4.1
Maps over multiple arguments
3.4.2
Side effect functions
3.5
Robust Functions
3.5.1
Unstable types
3.5.2
Non standard evaluation (NSE)
3.5.3
Hidden arguments
4
Joining Data in R with dplyr
4.1
Mutating joins
4.1.1
Keys
4.1.2
Left and right joins
4.1.3
Inner and full joins
4.2
Filtering joins and set operations
4.3
Set Operations
4.4
Bind in Dplyr
4.4.1
Data frames
4.4.2
Data Types
4.5
Advanced Joining
4.6
Joining mutiple tables
4.7
Other implentations
4.8
Case Study - Lahman DB
5
Cleaning Data
5.1
Tidying data
5.2
Preparing data for analysis
5.3
String manipulation
5.4
Missing, Specials and Outliers
5.5
Examples
6
Importing & Cleaning Data in R: Case Studies
6.1
Ticket Sales Data
6.1.1
Removing redundant info
6.2
Working with dates
6.3
MBTA Ridership Data
6.4
World Food Facts
6.5
School Attendance Data
7
Data Visualization
7.1
Base R Graphics
7.1.1
Avoiding pie charts
7.2
Different Plot Types
7.3
Adding details to plots
7.4
Adding text
7.4.1
Adding details to plots
7.5
How much is too much
7.5.1
Mutiple Plots
7.6
Advanced Plot Customisation
7.6.1
Using colour effectively
7.7
Other graphics systems
8
Intermediate ggplot2
8.1
Statistics
8.1.1
Stats outside geoms
8.2
Coordinates Layer
8.3
Facets
8.4
Themes
8.4.1
Recycling themes
8.5
Best Practice
8.5.1
Bar Plots
8.5.2
Pie Charts
8.5.3
Heat Maps
8.6
Case Study - Descriptive statistics
8.6.1
Mosaic Plots
9
Advanced ggplot2
9.1
Refresher
9.2
Statistical plots
9.2.1
Box plots
9.2.2
Density Plots
9.3
Multiple groups or variables
9.4
Plots for Specific Data 1
9.4.1
Big data
10
Introduction to Data
10.1
Language of Data
10.2
Observational Studies and Experiments
10.3
Sampling strategies and experimental design
References
11
Foundations of Inference
11.1
Introduction to Inference
11.2
Home Ownership by Gender
11.3
Density Plots
11.4
Gender Discrimination (p-values)
11.5
Opportunity Cost
11.6
Type I and Type II errors
11.7
Bootstrapping
References
12
Exploratory Data Analysis
12.1
Categorical Data
12.2
Numerical Data
12.3
Numerical Summaries
12.3.1
Transformations
12.3.2
Outliers
12.4
Email Case Study
13
Correlation and Regression
13.1
Visualizing two variables
13.1.1
Transformations
13.1.2
Identifying Outliers
13.2
Correlation
13.2.1
Anscombe Dataset
13.3
Linear Regression
13.3.1
Regression to the Mean
13.3.2
Fitting linear models
13.4
Model fit
13.4.1
Unusual points
13.4.2
High leverage Points
14
Supervised Learning
14.1
Tree Based Models
14.1.1
Random Forests
14.1.2
One-Hot-Encoding Categorical Variables
14.2
Gradient Boosting Machines
15
Dimensional Modelling
15.1
Introduction to Dimensional Data
15.1.1
Data Modelling levels
15.2
Architecture considerations
15.3
Graphical Representations
15.4
Kimball Approach
15.5
Four-Step Dimensional Design Process
15.6
Tips
References
Published with bookdown
Study notes
References