AnacondaCon 2018. One of the challenges when working with high-dimensional data is quickly being able to find the independent variables that most strongly influence the dependent variable. Unfortunately, the higher the dimensionality, the more likely that any influential independent variable selected may be influential by random luck. Enter Monte Carlo Permutation Testing (MCPT). By permuting the dependent variable many times, calculating information measures on these permutations, and comparing these measures to the actual information measure, a practitioner can be more confident that a selected variable truly will be informative. Until recently, the ability to execute something like this efficiently within Python was rather challenging. However, with the recently added ParallelAccelerator functionality within Numba, this can be executed in a single memory space at a blisteringly fast pace all in native Python. The goal of this talk is not only to introduce the concept of MCPT, but to inspire others to explore using the ParallelAccelerator functionality within Numba for significant speed advantages.
I am looking for editors/curators to help with branches of the tree. Please send me an email if you are interested.