This workshop will go way beyond, “Hey, you should use LIME.” Josh will share concrete tips for how to navigate the need to balance interpretable and predictive modeling in the wild. Specific goals for the workshop are to: 1) Explain the difference between interpretable and predictive models, and the difference between local and global interpretability. 2) Cover a few concepts about data collection and data preparation related to interpretable vs predictive models. 3) Dive into which algorithms predict well, which interpret well, and which do a bit of both. 4) Share a decision flow chart showing how to approach a project based on the need for interpretability and/or predictive power. 5) Review Python code examples of what it means to navigate between interpretability and predictive models (yes, LIME is included in the examples). Perhaps the most pervasive argument for model interpretability today is that model consumers and model stakeholders need to trust model recommendations and understand how to incorporate them into their decision making. Without trust and understanding, your model runs a real risk of becoming shelf-ware or being misused. Because of this reality, an understanding of the concepts discussed in this talk are absolutely vital to the success of a data scientist.
Cars.com is a two-sided automotive marketplace that creates meaningful connections between buyers and sellers. We believe that everyone can be swept off their feet by a car if a right connection is made. To achieve this and help 10MM+ active users make such connections we leverage Machine Learning in various ways. From topic modeling on vehicle and dealership reviews to automated taxonomy generation and fraud alerts, NLP is used in numerous components of our ML system. In this talk, I will review Cars.com's approach of using NLP as a tool to build smart yet simple products, our progress to achieving near-term goals and opportunities on the horizon of a transforming automotive industry.
Suppose you want to distribute information about Ebola in Nigeria. Seems easy enough. Print off some pamphlets and start handing them out. But, which of the 500+ languages used in Nigeria should you target? How do you know that the population you are targeting is literate? Can you distribute audio or digital media? If so, on what kinds of devices and in what language? As you can see, this gets complicated very quickly, and Wikipedia doesn't have all the answers. In this presentation, we will explore how data science and machine learning can help us better understand language situation. We will discuss how one can represent language and language engagement in a graph structure, and we will explore additional layers that can be built on this graph, which can have both humanitarian and commercial impact.
As larger quantities of data are being stored and managed by enterprises of all kinds, NoSQL storage solutions are becoming more popular. Elasticsearch is a popular, high-performance NoSQL data storage option, but it is often unfamiliar to end users and difficult to navigate for day to day analytic tasks. This presentation will briefly discuss the structure and benefits of Elasticsearch data storage, and describe in detail, with examples, how to efficiently and smoothly transfer data between R or Python and this kind of data storage. Attendees will be introduced to three packages designed for this work, elastic (R), elasticsearch-py (Python), and uptasticsearch (R and Python), and will see hands-on examples of how to use them.
Gogo is the inflight internet company whose worldwide inflight Wi-Fi services have made internet and video entertainment a regular part of flying. To provide connectivity on the flight, Gogo installs equipment on the aircraft, which when breaks, is replaced by the maintenance crew and sent to Gogo’s testing facility. Sometimes when this equipment is tested at the test bench, no faults are found with it. Based on the data (airborne logs) we have access to, we devised a method to identify which devices are truly broken/need replacement and validated it with real-cases by conducting a proof of concept. In the process, we built a fault isolation tree that recommends specific actions to be performed by the field technicians. To put this process into production, we built a cloud based solution that allows the technicians to answer a series of questions based on the data presented to them and presents recommendations based on their selections. Building this solution included identifying several data sources that contains information about different attributes of the tail, setting up a data pipeline in the AWS cloud to build features that were part of the fault isolation tree and creating reports to expose the data. We are also exploring the opportunity of automating this solution by applying machine learning techniques like Bayesian belief networks. From this talk, the audience will not only take away how to provide a solution to a real-world business problem but also learn the importance of identifying the right metrics and following a consistent approach to provide value to the business.
As data science teams scale, they're constantly generating new insights and knowledge that aren’t often adequately captured, stored, or leveraged. This leads to re-work and missed opportunities for research breakthroughs that frustrate data scientists and can tarnish the team's ability to make a business impact. The leaders of data science teams are tasked with building and retaining a team of rockstars, while implementing systems and processes that will help them deliver meaningful results at scale. They must figure out how to create a data science flywheel. In this panel, we will discuss best practices for instilling knowledge management into the data science team's culture. Attendees will leave with practical advice to help them build a team that accelerates its output with scale, rather than succumbing to complexity.
Can we trust being in a vehicle that is autonomous to get us from point A to point B? Can we train a model on a vehicle to interact in real-life unexpected driving situations? One of the main cognitive innovations in the world of AI and Machine Learning is self-driving vehicles. The idea of an autonomous vehicle is exciting. But how safe is this vehicle to be driving autonomously in a busy road on an urban way full of pedestrians, objects or even high-ways full of lorries, cars and motorbikes? Once, vehicles were just a way to get from A to B , but digital technology is moving fast, and the rapid growth of AI, ML and IoT is transforming vehicles into automated and intelligent devices. Vehicles will soon be equipped with a compressive end-end telematics platform. This will enable drivers be in touch with the outside world, accessing applications, as well as enjoying in-car entertainment at their fingertips. This presentation will focus on the concept of building a cloud-based system to provide recommendations to drivers to use Adaptive Cruise Control (ACC) functionality in a safe and effective manner. We will drill into specific modeling tools and technique that facilitate the power of AI, Machine Learning and IoT, with a demo towards the end. And we will discuss the data ethics considerations that need to be accounted for throughout the model development and deployment associated with building autonomous vehicles.
In this mini workshop, we will walk through a framework for successfully managing data science in the enterprise that covers people, process, and technology. We will step through the key stages of the data science lifecycle, from ideation through to delivery and monitoring, discussing common pitfalls and best practices in each based on Domino’s experience working with leading data science teams. Attendees will be provided with examples of Domino’s Lifecycle Assessment and be guided through an interactive exercise to evaluate the bottlenecks in their own organizations. They will leave with a customized physical artifact that can be used to prioritize investment in hiring, process management, or technology acquisition.