Determining the best organizational structure for data science teams is often debated. Should the data science team be an independent function, reporting into a head of Analytics or IT, providing support to various lines of business? Should data scientists be federated, reporting into business units and operating somewhat independently from each other? Or does a matrix organizational structure provide the best of both worlds? One approach that’s becoming increasingly common is to establish a Data Science Center of Excellence — a core team of data science experts that sets the course for a company’s data science strategy, best practices and operating procedures, and technology ecosystem. They can offer guidance to citizen data scientists, data scientists reporting directly into business functions, and other stakeholders leveraging data science across the organization. But how do you go about establishing at Data Science Center of Excellence? During this workshop, we’ll walk through organizational designs, a how-to guide for getting started, and will share best practices and lessons learned based on first-hand experience building a CoE in the field. We’ll discuss what capabilities belong in a COE; whether / how to balance deep ML expertise vs. broad analytics capabilities. Attendees will leave with an actionable plan to get started on this path.
Audubon applies data science in its mission to understand how climate change will affect birds, which species will be most vulnerable, and what places will suffer the most climate-related threats. This past summer, Audubon released a report summarizing research from an unprecedented, model-driven assessment of the potential future impacts of climate change on 38 species of grassland birds. All of which was performed on Domino. In this session, Dr. Chad Wilsey, Audubon’s interim chief scientist, will discuss:
A key driver to a successful data science platform is the ability to customize and extend beyond the default stock behavior. Domino allows this through its “open” approach that lets data scientists use their preferred languages, tools, and data sources. Additionally, a growing ask to accelerate data science is the ability to quickly obtain compute resources, especially with distributed and parallel processing. In this session, we will explore the data science platform’s ability to quickly and dynamically spin up distributed Spark and Dask clusters, and allow users to use the latest data science tools.
The relationship between Data Science and IT: It’s complicated. The end goal is aligned — to help the business win through model-driven innovation. Data scientists drive innovation by building models that automate or inform business processes. IT provides and manages the technology landscape that makes it all possible. IT and Data Science need to partner on the journey to create a mutually beneficial environment — a shared space that brings together infrastructure, data and tooling to foster efficient model building, testing, validation, deployment, and monitoring. But there’s tension between fueling innovation with an open environment that embraces the most cutting edge tools, and providing a place to work that is safe, governed, cost-controlled, scalable, and compliant. In this session, our speakers will share their experiences, lessons learned, and even some battle scars as they address questions around successes and failures they’ve seen partnering with IT for Data Science programs. The goal of this engaging session is to help attendees learn from collective past experiences in order to establish clearer communication lines between Data Science and IT moving forward.
The Climate Corporation provides decision-support tools for farmers. These tools can help farmers make data-driven decisions on what seed to plant given a soil type and climate, how much fertilizer to apply, and the best pest management strategy. Our FieldView product allows to collect, store, and visualize critical data coming from on-farm machinery or third-party data sources. Developing recommendations for growers requires processing large volumes of data effectively, taking into account environmental, genetic, and management factors. Domino has been a key component of our data science innovation pipeline. We combine Domino with Spark and AWS SageMaker to access and summarize large volumes of data and manage our distributed deep learning jobs. In this presentation, we will cover how we use Spark and Sagemaker within Domino and use satellite image-based yield prediction as a case study.