Reproducibility, and Selection Bias in Machine Learning


Follow to receive video recommendations   a   A

*Reproducibility* - the ability to recompute results — and*replicability* — the chances other experimenters will achieve aconsistent result[1]- are among the main important beliefs of the*scientific method*.Surprisingly, these two aspects are often underestimated or not evenconsidered when setting up scientific experimental pipelines. In this,one of the main threat to replicability is the *selection bias* , thatis the error in choosing the individuals or groups to take part in astudy. Selection bias may come in different flavours: the selection ofthe population of samples in the dataset ( *sample bias* ); theselection of features used by the learning models, particularly sensiblein case of high dimensionality; the selection of hyper parameter bestperforming on specific dataset(s). If not properly considered, theselection bias may strongly affect the validity of derived conclusions,as well as the reliability of the learning model.In this talk I will provide a solid introduction to the topics ofreproducibility and selection bias, with examples taken from thebiomedical research, in which reliability is paramount.From a more technological perspective, to date the scientific Pythonecosystem still misses tools to consolidate the experimental pipelinesin in research, that can be used together with Machine and Deep learningframeworks (e.g. ``sklearn`` and ``keras``). In this talk, I willpresent ``reproducible-lern``, a new Python frameworks for reproducibleresearch to be used for machine and deep learning.During the talk, the main features of the framework will be presented,along with several examples, technical insights and implementationchoices to be discussed with the audience.The talk is intended for *intermediate* PyData researchers andpractitioners. Basic prior knowledge of the main Machine Learningconcepts is assumed for the first part of the talk. On the other hand,good proficiency with the Python language and with scientific pythonlibraries (e.g. ``numpy``, ``sklearn``) are required for the secondpart.-- `1 `__ *Reproducibleresearch can still be wrong: Adopting a prevention approach* by JeffreyT. Leek, and Roger D. Peng`2 `__Dictionary of Cancer Terms -> 'selection bias'

Editors Note:

I am looking for editors/curators to help with branches of the tree. Please send me an email  if you are interested.