Productionizing your ML code seamlessly


Upvotes: DownVotes:
Age: a year     Page Views: 107
Votes / View: 9    Wilson Score: 0.21

Data science and Machine Learning are hot topics right now for SoftwareEngineers and beyond. There are a lot of python tools that allow you tohack together a notebook to quickly get insight on your data, or train amodel to predict or classify. Or you might have inherited some datawrangling and modeling {Jupyter-Zeppelin} notebook code from someoneelse, like the resident data scientist.The code works on test data, when you run the cells in the right order(skipping cell 22), and you believe that the insight gained from thiswork would be a valuable game changer. But now how do you take thisexperimental code into production, and keep it up-to-date with a regularretraining schedule? And what do you need to do after that, to ensurethat it remains reliable and brings value in the long term?These will be the questions this talk will answer, focusing on 2 mainthemes: What does running an ML model in production involve? How toimprove your development workflow to make the path to production easier?This talk will draw examples from real projects at Yelp, like migratinga pandas-sklearn classification project into production with pyspark,while aiming to give advice that is not dependent on specificframeworks, or tools, and is useful for listeners from all backgrounds.