AnacondaCon 2018. Jenny X. Lin. This session deals with how to conscientiously approach causal inference in large, messy data sets common in tech, in the absence of an experiment (or when experimental setup was not ideal). In the real world, correlation is sometimes not enough basis for a million dollar business decision. That's where causal inference comes in. Causal inference establishes a causal link between effect X and outcome Y and is often necessary for making critical and expensive business choices. Many pitfalls exist that render "simple" causal analyses entirely misleading and potentially costly. Here, Jenny will discuss some of the approaches taken at Yelp in determining causality when faced with a common question across tech firms: how do we know that our implementation of feature X caused an effect on metric Y and what was the size of the effect? Factors to correct for when extrapolating causality include: selection bias into the comparison groups, time trends in the outcome feature, time period mismatches across observations, addressing multicollinearity, clustering standard errors, and more! Jenny will walk through a stylized example of a causal inference problem she ran into at Yelp and showcase how one can easily arrive at a very misleading conclusion when not correcting for the aforementioned issues.
I am looking for editors/curators to help with branches of the tree. Please send me an email if you are interested.