scikit learn and Tabular Data Closing the Gap


Follow to receive video recommendations   a   A

Scikit-learn traditionally centered its data model around numpy arrays. However, in an important subset of scikit-learn's use cases, the original data in the machine learning pipeline is tabular: heterogeneously typed and labeled. In the meantime, pandas has become very popular, and increasingly used to represent such tabular data, but scikit-learn does not always play well with heterogeneous DataFrames. This talk will give an overview of the challenges and current bottlenecks when working with tabular data and scikit-learn. Then it will show the ungoing developments in sckikit-learn to improve this situation and highlight some third-party libraries that try to ease those problems.

Editors Note:

I am looking for editors/curators to help with branches of the tree. Please send me an email  if you are interested.