Today, more than ever before, maps are being used to bring data to life. In this presentation I will demonstrate how geoviz can make data science more tangible by providing an interactive canvas for spatial data. I will show several examples of how maps are being used enhance how we communicate data and how this applies across all scales, including spatial, temporal, and size of data.
In the "real world", data science is as much about creating and enhancing data as it is with creating models based on data that is already available. This talk will demonstrate how results from the research literature can be applied to improve the quality of manually annotated training data sets. After a brief introduction to Bayesian graphical models, the presentation will illustrate their application to the task at hand using the pyStan framework, and provide empirical results. You may never trust your annotators again.
Analytics and data science are ever growing fields, as business decision makers continue to use data to drive decisions. The pinnacle of these fields are the models and their accuracy/fit,; what about the data? Is your data clean, and how do you know that? Our discussion will focus on best practices for data preprocessing for analytic uses. Beginning with essential distributional checks of a dataset to a propose method for automated data validation process during ETL for transactional data.
Since 2004, Illinois has collected demographic information about traffic stops conducted by police in an effort to identify racial bias. This data has been used by groups such as the ACLU and the Stanford Open Policing Project to identify key markers that infer racial bias in policing. We have applied exploratory data analysis to investigate whether systemic racial bias may appear and to what extent. This talk will walk the audience through the insights gleaned from the exploration of this data along with the challenges posed and ongoing questions raised.
Enjoy drinks & bites to wrap up the day