This workshop will go way beyond, “Hey, you should use LIME.” We will share concrete tips for how to navigate the need to balance interpretable and predictive modeling in the wild. Specific goals for the workshop are to: 1) Explain the difference between interpretable and predictive models, and the difference between local and global interpretability. 2) Cover a few concepts about data collection and data preparation related to interpretable vs predictive models. 3) Dive into which algorithms predict well, which interpret well, and which do a bit of both. 4) Share a decision flow chart showing how to approach a project based on the need for interpretability and/or predictive power. 5) Review Python code examples of what it means to navigate between interpretability and predictive models (yes, LIME is included in the examples). Perhaps the most pervasive argument for model interpretability today is that model consumers and model stakeholders need to trust model recommendations and understand how to incorporate them into their decision making. Without trust and understanding, your model runs a real risk of becoming shelf-ware or being misused. Because of this reality, an understanding of the concepts discussed in this talk are absolutely vital to the success of a data scientist.
According to Forrester Research only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are either high cost IT projects, local applications that are not built to scale for production workflows or laptop decision support projects that never impact customers. Despite the high failure rate we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models but not many people talk about getting them into production where they can impact customers. This presentation will be an entertaining and practical introduction to DataOps, a new and independent approach to delivering Data science value at scale used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean and Dev Ops. However, DataOps is not just about shipping working machine learning models, but starts with better alignment of data science with the rest of the organisation and its goals. This talk will demonstrate experience-based solutions for increasing your velocity of value creation including: agile prioritisation and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction; self-service tools giving data scientists freedom from bottlenecks, and more. DataOps methodology will enable you to eliminate daily barriers, putting data scientists in control of delivering ever-faster cutting-edge innovation for their organisations and customers.
Computer vision is considered as one of the main approaches encapsulated within artificial intelligence. The ability to automate analytical model building, apply image processing techniques, build deep learning models and deploy this insight is what is enabling SAS customers to drive real business value. In this session we will explore real case scenarios where this approach has been applied. We will also present a project implemented by SAS to help a large manufacturing organisation assess the application of image processing. This was a process of exploring images captured from their manufacturing line to produce devices. The main objective was to correctly predict which images would lead to a defective device, thus reducing their false negative rate. By employing a novel combination of unsupervised and supervised learning approaches through the Python SWAT API, SAS DLPy package, we were able to correctly predict the classification of images with 99% accuracy.
In data science we can and should learn from test driven development principles used in software engineering. At BT Research we are looking at tools and ways of working that help data scientist create reproducible analytics in collaborative environments. Data scientists are familiar with model testing and using principles like cross-validation when evaluating the performance of models. However, less focus is often spend on testing and validating data that is used for modelling or later in production for scoring. Assertions on data can help guiding the outcome of data wrangling processes and guaranteeing that only good quality data is fed into model building and scoring. In this presentation I am reviewing approaches to using assertions in data analytics and present BT's best practices efforts in data science.
As data science teams scale, they're constantly generating new insights and knowledge that aren’t often adequately captured, stored, or leveraged. This leads to re-work and missed opportunities for research breakthroughs that frustrate data scientists and can tarnish the team's ability to make a business impact. Data science teams are tasked with doing the work they love while implementing systems and processes that will help them do so at scale. In this session, we will discuss best practices for instilling knowledge management into the data science team's culture without slowing data scientists down. Attendees will leave with practical advice to help them build reproducibility into their everyday workflows that are sustainable over time.
Every company needs Artificial Intelligence in production to stay competitive. This is hard both technically but also from a process standpoint. Many engineering organizations struggle to provide their data scientists with the tools needed to effectively deploy models into production and continuously iterate. How does Salesforce manage to make data science an agile partner to over 100,000 customers? We will share the nuts and bolts of the platform and our agile process. The foundational elements of any platform are the ability for data scientists to experiment and rapidly deploy to production. With our open-source autoML library (TransmogrifAI), we make it easy for our data scientists to contribute new ways of solving challenging problems and evaluating them at-scale using experimentation frameworks. Our platform helps them ship the code to production to all customer simultaneously, automating the process of retraining 1000s of models and shipping billions of predictions per day. With modeling of course comes the need to detect issues and identify opportunities for improvements. We will cover how we use alerting and monitoring to keep track of the individual models that our 100,000+ customers can build in a completely automated way, and drive our data science backlog. Throughout, we will share lessons learned around rapid iteration and how to ensure data science innovation continues in a truly agile methodology.
Mindshare has over 110 offices, of which at least 60 have at least one person working on one or analytics type of analytics. Three hubs also hold significant analytical resources. With every office managing independent P&Ls and in many cases different clients, local pressures take priority over global coordination. Add a fast-moving competitive category to this background, and management strong desire to avoid duplication, share best practice and above all, to innovate and scale solutions rapidly. Mindshare's journey to success has included hits and misses. In this session, Giovanni will discuss the learnings that they've collected through this journey (including the don’ts that they've learned painfully) and thoughts on how to continue pushing forward with faster innovation and effective implementation of solutions.
In this mini workshop, we will walk through a framework for successfully managing data science in the enterprise that covers people, process, and technology. We will step through the key stages of the data science lifecycle, from ideation through to delivery and monitoring, discussing common pitfalls and best practices in each based on Domino’s experience working with leading data science teams. Attendees will be provided with examples of Domino’s Lifecycle Assessment and be guided through an interactive exercise to evaluate the bottlenecks in their own organizations. They will leave with a customized physical artifact that can be used to prioritize investment in hiring, process management, or technology acquisition.