Desktop Site (Beta)

Introductory Video

Performing Dimension Reduction at Scale with Applications to Public Sentiment Models

Loading

Follow to receive video recommendations   a   A
Speaker: Are you the speaker?

We discuss our experience with dimension reduction for big datasets. We investigate the controlled performance decrease of our public sentiment models under transformations that reduce the number of features in the dataset. This feature reduction speeds up our real-time data science tools and helps to counter the curse of dimensionality. We outline the Python workflow that both produces and validates the quality of these transformations at scale in the AWS ecosystem, and we detail our programming and design choices, touching on the scikit-learn API, configuration versus code, SQL templatization, and our open source API client.

Editors Note:

I am looking for editors/curators to help with branches of the tree. Please send me an email  if you are interested.  

Comment On Twitter