Paco Nathan

Computable content: Notebooks, containers, and data-centric organizational learning

Director, Learning Group at O’Reilly Media

@pacoid

Known as a “player/coach” data scientist, he has led innovative Data teams building large-scale apps for several years. As a recognized expert in distributed systems, machine learning, and Enterprise data workflows, Paco is also an advisor for Amplify Partners. He has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups. Newsletter and “official” web site: http://liber118.com/pxn/

Abstract

“Computable Content”:https://bids.berkeley.edu/events/computational-thinking-and-pedagogy-computable-content was described by Dr. Lorena Barba at a “2015 lecture”:https://bids.berkeley.edu/events/computational-thinking-and-pedagogy-computable-content at the UC Berkeley Institute for Data Science. That leverages “Jupyter notebooks”:https://jupyter.org/ to make learning materials more powerful by integrating compute engines, data sources, etc.

O’Reilly Media extended this approach, “publishing notebooks”:https://www.oreilly.com/ideas/jupyter-at-oreilly from authors along with video timelines to create a new “Oriole”:http://www.oreilly.com/oriole/index.html online tutorial medium. A free public tutorial, “Regex Golf”:https://www.oreilly.com/learning/regex-golf-with-peter-norvig by Peter Norvig demonstrates what is possible with this technology integration to create a new learning medium.

Each user session launches a “Docker container”:https://www.docker.com/ on a “Mesos cluster”:http://mesos.apache.org/ for fully personalized compute environments. The UX is entirely browser-based. It is also instrumented for data collection and analytics, for use as an _assessment_ platform.

Project Jupyter supports more than 50 different compute environments. By leveraging Docker, additional frameworks (such as Dato) and data services can be added. By leveraging HTML on the front-end, JavaScript and other browser-based technologies also add to the mix. Vital portions of this software architecture have been released as “Thebe”:https://github.com/oreillymedia/thebe on GitHub.

This talk will present:

  * the system architecture based on Jupyter as middleware, plus Thebe, Docker, Mesos, Nginx, etc.

  * data analytics and project experiences based on delivering _computable content_ at scale

  * supporting theory for this pedagogical approach, including Knuth’s _Literate Programming_

  * media production techniques that use the video as _subtext_

We will also consider the use of notebooks (Jupyter and others) in an organizational context: how do notebooks help teams share and learn? what impact might notebooks have on developer collaboration that is currently focused on IDEs? The resulting medium provides highly effective tooling for a data-centric organization.

back to top