I realize this post is over a year old but it seems like it might need some editing. I have loaded the data from cattaneo2.dta and ran a basic logistic regression to estimate the probability of observing mbsmoke=='smoker' conditional on age and found the coefficient on mbage to be significantly negative. This estimated results agree with a basic visual inspection of the data. Older women in this data set appear less likely to smoke during pregnancy. This observation is at odds with what is stated in the blog post (namely that problems arise when estimating the impact of smoking on birthweight because older women tended to give birth ot heavier babies regardless of the smoker/non smoker characteristic and that older mothers were more likely to smoke during pregnancy). I realize that this claim was made for illustrative purposes but it is very confusing to try and follow along with a post that 1) suggests we load the cattaneo2.dta data set then 2) proceeds to claim the existence of patterns that do not appear in that data set.
↧