I’m Paul, a pug-loving scientist living in San Francisco. A main interest of mine is the language and statistics of causation. Does A cause B? With what strength? Through what path? My PhD dissertation explored a sub-field of causal inference known as quantitative bias analysis. Bias analysis in causal inference research involves applying quantitative assumptions of various potential sources of bias and seeing how this would impact the final causal estimate of effect. I found this area particularly attractive because it allows for greater transparency in Epidemiological studies. This allows researchers to go beyond mere speculation of potential biases. It also allows for a much more realistic estimate of the uncertainty in the results by quantifying the combined impact of both random error and systematic error.
In the world of programming, I fell in love with the idea of open-source R & python packages that can be downloaded by anyone throughout the world. I made it my mission during my PhD to somehow incorporate package development work into my dissertation. The result of this work was a tool called multibias
that allows for the simultaneous adjustment of multiple biases in causal inference. A few years later, I gave it a new coat of paint and finally got the package accepted into CRAN (The Comprehensive R Archive Network). I am continually maintaining and promoting this package with the goal of getting more mainstream adoption of bias analysis.
Another emergent passion of mine is the world of blockchains, particularly ethereum, the world’s first programmable blockchain. Having an entire decentralized network of permissionless applications is endlessly fascinating to me, in much the same way that open source R & python packages are. There’s something special about being able to make an application that essentially lasts forever. As a data scientist, I’m exploring ways to make it easier to query useful information from these different chains. I may also dig deeper into quantifying the decentralization of each network.
Throughout my journey, I’ve maintained a passion for baseball analytics. I have been doing competitive fantasy baseball for over 15 years and thrive in the competitive, analytical environment. During graduate school I always chose baseball as the subject for biostatistics class projects. I’ve also contributed a bit to Pitcher List, which has a very fun and welcoming community. I will continually return to baseball data as an outlet to explore and learn new data science concepts. If this work happens to create insights that help my fantasy baseball team - that’s a win-win!