Desktop Site (Beta)

Introductory Video

scikit learn and Tabular Data Closing the Gap


Follow to receive video recommendations   a   A
Speaker: Are you the speaker?

Scikit-learn traditionally centered its data model around numpy arrays. However, in an important subset of scikit-learn's use cases, the original data in the machine learning pipeline is tabular: heterogeneously typed and labeled. In the meantime, pandas has become very popular, and increasingly used to represent such tabular data, but scikit-learn does not always play well with heterogeneous DataFrames. This talk will give an overview of the challenges and current bottlenecks when working with tabular data and scikit-learn. Then it will show the ungoing developments in sckikit-learn to improve this situation and highlight some third-party libraries that try to ease those problems.

Editors Note:

I would like to work with open source projects to create a branch of the tree with all of the best videos for your open source project. Please send me an email if you are interested.  

Comment On Twitter