Solving Data Science Problems using a Jupyter Notebook and SAP HANA's in-database Machine Learning Libraries


Companies store their data in databases with highly restricted accessregulations. The latest regulatorily changes enforces the need to workon the datasets in this controlled environment without createdadditional external copies. However Data Scientists prefer to work withtools they are most familiar like Python, R and Jupyter Notebooks usingto a large amount of open- source packages (numpy, matplotlib, pandas,..). SAP HANA provides highly optimized in-database machine learninglibraries. In this talk we will present how a Data Scientist can work inan environment he-she is most familiar with and access the data storedin SAP HANA using SAP HANA machine learning libraries with ascikit-learn type interface. Data will remain in the database and willbe exposed as dataframes (similar to Pandas dataframes). We will explainthe software architecture and present a complete end-to-end use case byusing a Jupyter Notebook.