*Part 2 will take place on Thursday 6th August from 6:00pm - 8:30pm
AT A GLANCE
The most broadly accepted definition of data science comes from computational political scientist Drew Conway, made famous by his Venn diagram.
That is, data science is what we call the application of rigorous statistical routines and computation to help us solve problems in our own domains of expertise. Becoming a data scientist requires building competency in these three fields simultaneously. This is hard work, but it makes good data scientists indispensible members—and leaders—of modern organisations.
Because Data Scientists can potentially add a huge amount of value for their employers, they are in hot demand. According to Glassdoor, median data science salaries in the US are a shade over $120K. In Australia, IAPA’s survey respondents said that they earn $125K on average. It’s not bad work if you can get it!
But learning data science is not easy. This is because learning to do good data science is not just learning a piece of software (though software is useful), or learning a set of algorithms (which are useful too). Good data science is the application of difficult-to-learn skills to difficult tasks. It is human centric—far closer to management than to IT. It is creative, and it is practical.
Good data science is also scientific; it rests on a desire to discover truth from an overwhelming amount of data. As economist John Hicks put it “in the clouds of absorbing detail, we want to discover the shapes that do repeat among the many that do not”. Good data science looks past simple correlations, which are often highly compelling and, as often, highly misleading, to learn underlying structure. It explores biases in our data and methods that can steer us away from the truth. It is difficult, and it is fun.
ABOUT THIS COURSE
In this short course, running 2 nights a week for 4 weeks, you will build solid foundational skills from across the data science skill spectrum. The intention of the course is to make you comfortable using several widely used, cutting-edge techniques, but also give you a deep understanding of the fundamental issues in data science. This will make learning new techniques on your own time much easier, long after you have finished the course.
The course is highly practical. You will work through real live examples using a variety of fun data sources, including transaction-level data from a café, a quarter of a million tax returns, and a survey of cheating spouses. If you have access to a dataset that you would like to use for your homework, bring it along!
A big part of learning data science is getting plugged into the community. To this end, you will be trying to solve difficult problems with other interesting people in the class. You will make friends.
WHAT YOU WILL LEARN
- A solid foundation of skills from across the data science spectrum: extracting and tidying data, building predictive and inferential models, and visualising your analysis;
- Recognising and avoiding pitfalls. How to think deeply about biases in data and analytical techniques. How to go about fixing these biases;
- How to perform swift, replicable analysis using the freely-available R statistical computing language.
WHAT YOU GET
- Comfort in extracting large amounts of data from databases and spread-sheets
- Understanding of the principles of tidy data, and understanding of how tidy data makes analysis easy and fun
- Ability to use the split-apply-combine philosophy to produce high quality output quickly
- Ability to build a simple predictive model using regression and machine-learning techniques
- Intuition of how to build inferential models to answer “what-if” questions
- Ability to plot data using the powerful ggplot2 package
- Top ten tricks to generate amazing analysis in a heartbeat
- A collection of 1-page vignettes on the main issues and code examples discussed in the course
- A roadmap of the easiest ways to learn the skills not covered in the course
WEEK 1: DATA MUNGING
- Introduction to R
- Reading in data from csv files
- Introduction to databases
- Extracting data from databases
- Merging data tables
- Tidy data
- Writing functions
- The split-apply-combine strategy using dplyr
- Generating summary statistics for arbitrary sub-groups
- Writing data to files & databases