Hi Bruce,
Thanks for sharing.
I was interested in looking at what happened with "small" samples and heteroskedasticity in linear models.
I started playing with some simulations and found that the statement that HC3 was unambiguously best was not the case, specially when I added discrete covariates to my simulations.
I then explored asymptotic metaphors that described the situation I found best. The Cattaneo et al. paper fit the bill. I found its relation to the Chesher work interesting also. At the time, I had not seen your paper. I was aware of what Matt Webb and his coauthors are working on but the asymptotic metaphor did not seem as adequate as the other.
In particular, I wanted to emphasize the idea that a small number of observations per parameter is problematic, even if N is large. I am not making any specific recommendations as to which method is best. I wanted to share something I found with heteroskedasticity and "small" samples. I was particularly concerned with the idea that there was a threshold number of observations that determines when to use one method or another.