class: center, middle, inverse, title-slide # R for Sports Data ## β½οΈ π π βΎοΈ π₯ πΎ π π --- class: left, top # Why R? R is an incredibly versatile, free and open source programming language, which is capable of performing a range of tasks that are useful when dealing with sports data. These include: - Importing data direct from an Athlete Management System - Scraping match or performance data from the web - Wrangling and cleaning data - Calculating statistics - Visualising data in various forms - Building statistical models - Generating reproducible reports or presentations - Building web applications <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/R_logo.svg/724px-R_logo.svg.png" width="20%" height=10% align="right" /> --- class: left, top # R & RStudio <center> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/R_logo.svg/724px-R_logo.svg.png" width="10%" height=5% align="center" /> </center> To download R to your laptop, go to the CRAN (comprehensive R archive network). To determine the closest mirror to you, follow this link to the *[cloud mirror](https://cloud.r-project.org)*. Download the version that is suitable for your operating system (e.g., Mac, Linux, Windows). <center> <img src="https://i.stack.imgur.com/bUpIh.png" width="10%" height=5% align="center" /> </center> RStudio is an IDE (integrated development environment). To download it on your laptop, go to the *[RStudio website](http://www.rstudio.com/download)*. Download the version that is suitable for your operating system (e.g., Mac, Linux, Windows). --- class: centre, top # Typical R Workflow <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/openscapes/environmental-data-science-r4ds-general.png" width="100%" height=50% align="middle" /> --- class: centre, top # Tidy Data <img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" width="16%" height=8% align="right" /> Tidy data formatting has 3 main principles. These are: - Variables are in columns - Observations are in rows - Values are in cells <img src="https://d33wubrfki0l68.cloudfront.net/6f1ddb544fc5c69a2478e444ab8112fb0eea23f8/91adc/images/tidy-1.png" width="80%" height=40% align="bottom" /> .footnote[[R for Data Science - ebook](https://r4ds.had.co.nz/)] --- class: left, top # Classes of objects in R <center> <img src="https://bookdown.org/rdpeng/rprogdatascience/cover_sm.png" width="18%" height=9% align="right" /> </center> There are five basic ("atomic") types of data that R recognises. These are: - Character (e.g., words; "hello") - Factor (e.g., limited set options; "Female", "Male", "Non-binary") - Logical (e.g., true or false; "TRUE"/"T") - Numeric (e.g., real numbers; "6.2") - Integer (e.g., whole number; "6" - Note: use "L" to assign as integer) *Complex (special uses - rarely needed) .footnote[[R Programming for Data Science - ebook](https://bookdown.org/rdpeng/rprogdatascience/)] --- class: centre, top # Data structures in R <center> <img src="http://venus.ifca.unican.es/Rintro/_images/dataStructuresNew.png" width="80%" height=40% align="left" /> </center> --- class: centre, top # Consistency in R <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/other-stats-artwork/coding_cases.png" width="80%" height=40% align="left" /> </center> --- class: centre, top # Packages and their Functions <center> <img src="https://tidy-ds.wjakethompson.com/img/hex-stickers.png" width="70%" height=35% align="left" /> </center> --- class: centre, top # {dplyr} <img src="https://d33wubrfki0l68.cloudfront.net/621a9c8c5d7b47c4b6d72e8f01f28d14310e8370/193fc/css/images/hex/dplyr.png" width="20%" height=10% align="right" /> <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/dplyr_filter.jpg" width="80%" height=40% align="left" /> </center> --- class: centre, top # {dplyr} <img src="https://d33wubrfki0l68.cloudfront.net/621a9c8c5d7b47c4b6d72e8f01f28d14310e8370/193fc/css/images/hex/dplyr.png" width="20%" height=10% align="right" /> <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/dplyr_case_when.png" width="80%" height=40% align="left" /> </center> --- class: centre, top # {janitor} <img src="https://github.com/sfirke/janitor/blob/master/man/figures/logo_small.png?raw=true" width="20%" height=10% align="right" /> <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/janitor_clean_names.png" width="80%" height=40% align="left" /> </center> --- class: centre, top # {lubridate} <img src="https://github.com/tidyverse/lubridate/raw/master/man/figures/logo.png" width="20%" height=10% align="right" /> <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/lubridate_ymd.png" width="80%" height=40% align="left" /> </center> --- class: centre, top # {ggplot2} <img src="https://github.com/tidyverse/ggplot2/raw/master/man/figures/logo.png" width="20%" height=10% align="right" /> <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/ggplot2_masterpiece.png" width="80%" height=40% align="left" /> </center> --- class: center, top #Here are some example outputs from R packages: <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/exploder.gif" width="50%" height=25% align="centre" /> </center> --- class: left, top #Netball Dashboard - {shiny} Here is an example of a dynamic dashboard that automatically scrapes match statistics from the web and presents them in a structured format. <img src="https://github.com/edbrooks/R_for_sports_data/blob/main/images/netball_shiny_app.png?raw=true" width="90%" height=45% align="centre" /> --- class: left, top #Netball Plot - {ggplot2} Here is an example of a simple score-flow plot that can be produced and customised using ggplot2 in R. <img src="https://github.com/edbrooks/R_for_sports_data/blob/main/images/R1G1_2019.png?raw=true" width="90%" height=45% align="centre" /> --- class: centre, middle # Welcome to R, let's get started! <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/welcome_to_rstats_twitter.png" width="70%" height=35% align="right" /> </center> --- class: centre, middle # Today we'll be learning a few basics: - Loading packages - Using functions - Importing data - Data wrangling - Summary statistics - Basic visualisation <center> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/rstats-artwork/data_cowboy.png" width="60%" height=30% align="top" /> </center> --- class: center # Acknowledgements **Slides using 'xaringan' package β - credit:[@xieyihui](https://twitter.com/xieyihui)** https://bookdown.org/yihui/rmarkdown/xaringan.html <img src="https://user-images.githubusercontent.com/163582/45438104-ea200600-b67b-11e8-80fa-d9f2a99a03b0.png" width="15%" height=10% align="top" /> **R illustrations - credit: [@allison_horst](https://twitter.com/allison_horst)** π¨ Artist-in-residence [@RStudio](https://twitter.com/RStudio) https://github.com/allisonhorst/stats-illustrations <img src="https://www.allisonhorst.com/post/2019-11-19-rstudio-artist-in-residence/featured_horst_air_hu30ff77530c1fd0633263481c1b3351da_1645822_720x0_resize_lanczos_2.png" width="25%" height=10% align="top" />