We discuss our experience with dimension reduction for big datasets. We investigate the controlled performance decrease of our public sentiment models under transformations that reduce the number of features in the dataset. This feature reduction speeds up our real-time data science tools and helps to counter the curse of dimensionality. We outline the Python workflow that both produces and validates the quality of these transformations at scale in the AWS ecosystem, and we detail our programming and design choices, touching on the scikit-learn API, configuration versus code, SQL templatization, and our open source API client.
I would like to work with open source projects to create a branch of the tree with all
of the best videos for your open source project. Please
send me an email if you are interested.