Schedule

November 14, 2017

8:30 - 9:25am

Welcome and Registration

9:25 - 9:30am

Introduction

9:30 - 10:00am

Making Data Science Tangible

Today, more than ever before, maps are being used to bring data to life. In this presentation I will demonstrate how geoviz can make data science more tangible by providing an interactive canvas for spatial data. I will show several examples of how maps are being used enhance how we communicate data and how this applies across all scales, including spatial, temporal, and size of data.


 


10:00 - 10:30am

Reproducible dashboards and other great things to do with Jupyter

10:30 - 11:00am

Better data through data science: using Bayesian methods to clean up human labels

In the "real world", data science is as much about creating and enhancing data as it is with creating models based on data that is already available. This talk will demonstrate how results from the research literature can be applied to improve the quality of manually annotated training data sets. After a brief introduction to Bayesian graphical models, the presentation will illustrate their application to the task at hand using the pyStan framework, and provide empirical results. You may never trust your annotators again.


 


11:00 - 11:30am

Coffee Break

11:30 - 12:15am

Summertime Analytics: Predicting E. Coli and West Nile Virus

12:45 - 1:45pm

Lunch

2:45 - 3:15pm

Leveraging Data Science in Automotive Industry

3:15 - 3:45pm

Supporting innovation in insurance with randomized experimentation

3:45 - 4:15pm

The Proliferation of New Database Technologies and Implications for Data Science Workflows

4:15 - 4:30pm

Coffee Break

4:30 - 5:00pm

Data Quality Analytics: Understanding What is in Your Data, Before Using It

Analytics and data science are ever growing fields, as business decision makers continue to use data to drive decisions. The pinnacle of these fields are the models and their accuracy/fit,; what about the data? Is your data clean, and how do you know that? Our discussion will focus on best practices for data preprocessing for analytic uses. Beginning with essential distributional checks of a dataset to a propose method for automated data validation process during ETL for transactional data.


 


5:00 - 5:15pm

Lightning Talk: Racial Bias in Policing: an analysis of Illinois traffic stops data

Since 2004, Illinois has collected demographic information about traffic stops conducted by police in an effort to identify racial bias. This data has been used by groups such as the ACLU and the Stanford Open Policing Project to identify key markers that infer racial bias in policing. We have applied exploratory data analysis to investigate whether systemic racial bias may appear and to what extent. This talk will walk the audience through the insights gleaned from the exploration of this data along with the challenges posed and ongoing questions raised.


 


6:00 - 6:05pm

Closing Remarks

6:05 - 8:00pm

Networking Reception

Enjoy drinks & bites to wrap up the day

  • Domino Data Lab
  • Allstate
  • Git
  • Bank of America
  • O'Reilly
  • Plotly