The Central Pacific Ocean is a looking glass into both the past and future ocean. The recent ProteOMZ expedition provides greater than 6 million datapoints on the distribution of microbial proteins in this region, coupled with greater than 6000 chemical and physical measurements. We demonstrate our approach to analysis of this multivariate data using the Pandas, Scikit-learn, Biopython, Bokeh, and METATRYP Python tools. We found interactivity to be key in human lead pattern discovery. We discuss challenges and solutions to piecing together multiple data types and Python tools, and discuss our experience as scientists learning to apply Python packages in our research.
I am looking for editors/curators to help with branches of the tree. Please send me an email if you are interested.