Boston Schedule

November 7, 2019

12:30 - 1:00pm

Registration

1:00 - 1:15pm

Introduction: The Era of the Model-Driven Business

1:15 - 1:45pm

Controlling for Confounding with Causal Random Forests

The xBK team is responsible for security trading at Mellon Bank. As a part of that we analyze execution costs. Execution data is observational rather than experimental. For a variety of practical, legal, and ethical reasons, it is not possible to perform randomized trading experiments. Without recourse to experiment, we must contend with confounding among the variables in our data. In this study, we use Causal Random Forests, an extension of tradition Random Forest algorithms, to control for the confounding relationships and to estimate treatment effects. Details of the motivation, algorithm, and outcome are presented.

1:45 - 2:45pm

Workshop: Turbo-charging Data Science with AutoML

Although there are an increasing number of commercial AutoML products, the open-source ecosystem has been innovating here as well. In the early days of the AutoML movement, the focus was on those looking to leverage the power of ML models without a background in data science – citizen data scientists. Today, however, AutoML tools have a lot to offer experts too. In this presentation, Domino Chief Data Scientist Josh Poduska will dive into popular open source AutoML tools such as auto-sklearn, TPOT, MLBox, and AutoKeras. He will walk through hands-on examples of how to install and use these tools, and highlight special features of each while providing Jupyter notebooks so you can start using these technologies in your work right away. Those who wish to follow along interactively during the presentation or download the notebooks can do so by signing into Domino’s trial version. Create a free trial account here.

2:45 - 2:55pm

Break

2:55 - 3:25pm

Trends in Applied Machine Learning

Over the last few years the application of Machine Learning (ML) has proliferated. The application of ML in multiple problem domains is a broad topic. In this session, together we will cover multi-variant applications of various ML models in the consumer and industrial space. At the very least, Manimala will discuss usage of conversational ML, image recognition and diagnostics ML. If there is more time, Manimala will add arbitrage ML, speech recognition, search and scheduling in the discussion. She will share a glimpse of use cases, problem-solution approach, tools, categories of problem types, commonalities and differences. You will leave with a bird’s-eye view of the applied ML in action.

3:25 - 3:55pm

Partnering with Stakeholders Across the Enterprise

The relationship between Data Science and Engineering, IT, and other organizations can be complicated. The end goal is aligned — to help the business win through model-driven innovation. Data scientists drive innovation by building models that automate or inform business processes. Technology groups provide and manage the technology landscape that makes it all possible. Business stakeholders must take Data Science outputs and make them actionable.

All stakeholders need to partner on the journey to become model-driven. In particular, Data Science and Technology groups must create a mutually beneficial environment — a shared space that brings together infrastructure, data and tooling to foster efficient model building, testing, validation, deployment, and monitoring. But there’s tension between fueling innovation with an open environment that embraces the most cutting edge tools, and providing a place to work that is safe, governed, cost-controlled, scalable, and compliant.

In this session, our speakers will share their experiences, lessons learned, and even some battle scars as they address questions around successes and failures they’ve seen partnering with stakeholders across the enterprise for Data Science programs. The goal of this engaging session is to help attendees learn from collective past experiences in order to establish clearer communication lines between Data Science and other groups moving forward.

3:55 - 4:05pm

Break

4:05 - 4:50pm

Workshop: Building a Data Science Center of Excellence

Determining the best organizational structure for data science teams is often debated. Should the data science team be an independent function, reporting into a head of Analytics or IT, providing support to various lines of business? Should data scientists be federated, reporting into business units and operating somewhat independently from each other? Or does a matrix organizational structure provide the best of both worlds? One approach that’s becoming increasingly common is to establish a Data Science Center of Excellence — a core team of data science experts that sets the course for a company’s data science strategy, best practices and operating procedures, and technology ecosystem. They can offer guidance to citizen data scientists, data scientists reporting directly into business functions, and other stakeholders leveraging data science across the organization. But how do you go about establishing at Data Science Center of Excellence? During this workshop, we’ll walk through organizational designs, a how-to guide for getting started, and will share best practices and lessons learned based on first-hand experience building a CoE in the field. We’ll discuss what capabilities belong in a COE; whether / how to balance deep ML expertise vs. broad analytics capabilities. Attendees will leave with an actionable plan to get started on this path.

4:45 -

Networking Reception