r - Clustered Standard Errors with data containing NAs -
i'm unable cluster standard errors using r , guidance based on post. cl function returns error:
error in tapply(x, cluster1, sum) : arguments must have same length
after reading on tapply
i'm still not sure why cluster argument wrong length, , causing error.
here link data set i'm using.
https://www.dropbox.com/s/y2od7um9pp4vn0s/ec%201820%20-%20dd%20data%20with%20controls.csv
here r code:
# read in data charter<-read.csv(file.choose()) view(charter) colnames(charter) # standardize naep scores charter$naep.standardized <- (charter$naep - mean(charter$naep, na.rm=t))/sd(charter$naep, na.rm=t) # change nas in year.passed column 2014 charter$year.passed[is.na(charter$year.passed)]<-2014 # add column indicator in treatment (passed legislation) charter$treatment<-ifelse(charter$year.passed<=charter$year,1,0) # fit model charter.model<-lm(naep ~ factor(year) + factor(state) + treatment, data = charter) summary(charter.model) # account clustered standard errors state cl(dat=charter, fm=charter.model, cluster=charter$state) # accounting controls charter.model.controls<-lm(naep~factor) # clustered standard errors # --------- # function calculates clustered standard errors # source: http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ cl <- function(dat, fm, cluster){ require(sandwich, quietly = true) require(lmtest, quietly = true) m <- length(unique(cluster)) n <- length(cluster) k <- fm$rank dfc <- (m/(m-1))*((n-1)/(n-k)) print(k) uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum)); vcovcl <- dfc*sandwich(fm, meat=crossprod(uj)/n) coeftest(fm, vcovcl) } # calculate clustered standard errors cl(charter, charter.model, charter$state)
the inner workings of function little on head.
when execute code, notice there missing observations in linear model:
> summary(charter.model) call: lm(formula = naep ~ factor(year) + factor(state) + treatment, data = charter) residuals: min 1q median 3q max -15.2420 -1.6740 -0.2024 1.8345 12.3580 coefficients: estimate std. error t value pr(>|t|) (intercept) 250.4983 1.2115 206.767 < 2e-16 *** factor(year)1992 3.7970 0.7198 5.275 2.17e-07 *** factor(year)1996 7.0436 0.8607 8.183 3.64e-15 *** [..] residual standard error: 3.128 on 404 degrees of freedom (759 observations deleted due missingness) multiple r-squared: 0.9337, adjusted r-squared: 0.9239 f-statistic: 94.85 on 60 , 404 df, p-value: < 2.2e-16
this causing error in tapply(x, cluster1, sum) : arguments must have same length
error message see.
in cl(dat=charter, fm=charter.model, cluster=charter$state)
cluster variable charter$state
should have exact same length number of observations used in regression estimation (which due nas not same number of rows in original data frame).
to work around can following.
first off you're using older version of arai's function (
cl
) (see fama-macbeth , cluster-robust (by firm , time) standard errors in r references both old or new versions, latter being calledclx
).second think arai's original approach function bit convoluted, , doesn't follow standard interface of
vcov*
functionssandwich
. that's why came modified version ofclx
. made code bit more readable, , interface more expectsandwich
vcov*
function:vcovcl <- function(x, cluster.by, type="sss", dfcw=1){ # r-codes (www.r-project.org) computing # clustered-standard errors. mahmood arai, jan 26, 2008. # arguments of function are: # fitted model, cluster1 , cluster2 # need install libraries `sandwich' , `lmtest' # reweighting var-cov matrix within model require(sandwich) cluster <- cluster.by m <- length(unique(cluster)) n <- length(cluster) stopifnot(n == length(x$residuals)) k <- x$rank ##only stata small-sample correction supported right ##see plm >= 1.5-4 stopifnot(type=="sss") if(type=="sss"){ dfc <- (m/(m-1))*((n-1)/(n-k)) } uj <- apply(estfun(x), 2, function(y) tapply(y, cluster, sum)) mycov <- dfc * sandwich(x, meat=crossprod(uj)/n) * dfcw return(mycov) }
if try function on data see catches specific issue:
> coeftest(charter.model, vcov=function(x) vcovcl(x, charter$state)) error: n == length(x$residuals) not true
to avoid issue proceed follows:
> charter.x <- na.omit(charter[ , c("state", all.vars(formula(charter.model)))]) > coeftest(charter.model, vcov=function(x) vcovcl(x, charter.x$state)) t test of coefficients: estimate std. error t value pr(>|t|) (intercept) 2.5050e+02 9.3781e-01 2.6711e+02 < 2.2e-16 *** factor(year)1992 3.7970e+00 5.6019e-01 6.7780e+00 4.330e-11 *** factor(year)1996 7.0436e+00 8.8574e-01 7.9522e+00 1.856e-14 *** factor(year)2000 8.4313e+00 1.0906e+00 7.7311e+00 8.560e-14 *** factor(year)2003 1.2392e+01 1.1670e+00 1.0619e+01 < 2.2e-16 *** factor(year)2005 1.3490e+01 1.1747e+00 1.1484e+01 < 2.2e-16 *** factor(year)2007 1.6334e+01 1.2469e+00 1.3100e+01 < 2.2e-16 *** factor(year)2009 1.8118e+01 1.2556e+00 1.4430e+01 < 2.2e-16 *** factor(year)2011 1.9110e+01 1.3459e+00 1.4199e+01 < 2.2e-16 *** factor(year)2013 1.9301e+01 1.4896e+00 1.2957e+01 < 2.2e-16 *** factor(state)alaska 1.4178e+01 8.7686e-01 1.6169e+01 < 2.2e-16 *** factor(state)arizona 8.6313e+00 8.1439e-01 1.0598e+01 < 2.2e-16 *** factor(state)arkansas 4.3313e+00 8.1439e-01 5.3185e+00 1.736e-07 *** factor(state)california 3.1103e+00 9.1619e-01 3.3948e+00 0.0007549 *** factor(state)colorado 1.7939e+01 7.9736e-01 2.2498e+01 < 2.2e-16 *** factor(state)connecticut 1.8031e+01 8.1439e-01 2.2141e+01 < 2.2e-16 *** factor(state)d.c. -1.8369e+01 8.1439e-01 -2.2555e+01 < 2.2e-16 *** factor(state)delaware 1.2050e+01 7.9736e-01 1.5113e+01 < 2.2e-16 *** factor(state)florida 7.3838e+00 7.9736e-01 9.2602e+00 < 2.2e-16 *** factor(state)georgia 6.4313e+00 8.1439e-01 7.8971e+00 2.724e-14 *** factor(state)hawaii 3.3313e+00 8.1439e-01 4.0906e+00 5.196e-05 *** factor(state)idaho 1.7118e+01 7.8321e-01 2.1857e+01 < 2.2e-16 *** factor(state)illinois 1.2670e+01 8.2224e-01 1.5409e+01 < 2.2e-16 *** factor(state)indianna 1.7174e+01 6.1079e-01 2.8117e+01 < 2.2e-16 *** factor(state)iowa 2.0074e+01 6.8460e-01 2.9322e+01 < 2.2e-16 *** factor(state)kansas 2.0123e+01 8.6796e-01 2.3184e+01 < 2.2e-16 *** factor(state)kentucky 1.0200e+01 4.1999e-14 2.4287e+14 < 2.2e-16 *** factor(state)louisiana -1.6866e-01 8.1439e-01 -2.0710e-01 0.8360322 factor(state)maine 2.0231e+01 1.7564e-01 1.1518e+02 < 2.2e-16 *** factor(state)maryland 1.4274e+01 6.1079e-01 2.3369e+01 < 2.2e-16 *** factor(state)massachusetts 2.4868e+01 8.3960e-01 2.9619e+01 < 2.2e-16 *** factor(state)michigan 1.2031e+01 8.1439e-01 1.4773e+01 < 2.2e-16 *** factor(state)minnesota 2.5110e+01 9.1619e-01 2.7407e+01 < 2.2e-16 *** factor(state)mississippi -3.5470e+00 1.7564e-01 -2.0195e+01 < 2.2e-16 *** factor(state)missouri 1.3447e+01 7.2706e-01 1.8495e+01 < 2.2e-16 *** factor(state)montana 2.2512e+01 8.4814e-01 2.6543e+01 < 2.2e-16 *** factor(state)nebraska 1.9600e+01 4.3105e-14 4.5471e+14 < 2.2e-16 *** factor(state)nevada 4.9800e+00 8.6796e-01 5.7375e+00 1.887e-08 *** factor(state)new hampshire 2.2026e+01 7.6338e-01 2.8853e+01 < 2.2e-16 *** factor(state)new jersey 2.0651e+01 7.6338e-01 2.7052e+01 < 2.2e-16 *** factor(state)new mexico 1.5313e+00 8.1439e-01 1.8803e+00 0.0607809 . factor(state)new york 1.2152e+01 7.1259e-01 1.7054e+01 < 2.2e-16 *** factor(state)north carolina 1.2231e+01 8.1439e-01 1.5019e+01 < 2.2e-16 *** factor(state)north dakota 2.4278e+01 1.0420e-01 2.3299e+02 < 2.2e-16 *** factor(state)ohio 1.7118e+01 7.8321e-01 2.1857e+01 < 2.2e-16 *** factor(state)oklahoma 8.4518e+00 7.8321e-01 1.0791e+01 < 2.2e-16 *** factor(state)oregon 1.6535e+01 7.3538e-01 2.2486e+01 < 2.2e-16 *** factor(state)pennsylvania 1.6651e+01 7.6338e-01 2.1812e+01 < 2.2e-16 *** factor(state)rhode island 9.5313e+00 8.1439e-01 1.1704e+01 < 2.2e-16 *** factor(state)south carolina 9.5346e+00 8.3960e-01 1.1356e+01 < 2.2e-16 *** factor(state)south dakota 2.1211e+01 3.5103e-01 6.0425e+01 < 2.2e-16 *** factor(state)tennessee 4.9148e+00 6.1473e-01 7.9951e+00 1.375e-14 *** factor(state)texas 1.4231e+01 8.1439e-01 1.7475e+01 < 2.2e-16 *** factor(state)utah 1.5114e+01 7.2706e-01 2.0787e+01 < 2.2e-16 *** factor(state)vermont 2.3474e+01 2.0299e-01 1.1564e+02 < 2.2e-16 *** factor(state)virginia 1.6252e+01 7.1259e-01 2.2807e+01 < 2.2e-16 *** factor(state)washington 1.9073e+01 1.8183e-01 1.0489e+02 < 2.2e-16 *** factor(state)west virginia 5.0000e+00 4.2022e-14 1.1899e+14 < 2.2e-16 *** factor(state)wisconsin 1.9994e+01 8.2447e-01 2.4251e+01 < 2.2e-16 *** factor(state)wyoming 1.8231e+01 8.1439e-01 2.2386e+01 < 2.2e-16 *** treatment 1.2108e+00 1.0180e+00 1.1894e+00 0.2349682 --- signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
it's not nice, gets job done. cl
work fine , yield same result above:
cl(dat=charter, fm=charter.model, cluster=charter.x$state)
a better way go use multiwayvcov
package. per packages's website, improvement upon arai's code:
transparent handling of observations dropped due missingness
using petersen data simulated nas , cluster.vcov()
:
library("lmtest") library("multiwayvcov") data(petersen) set.seed(123) petersen[ sample(1:5000, 15), 3] <- na m1 <- lm(y ~ x, data = petersen) summary(m1) ## ## call: ## lm(formula = y ~ x, data = petersen) ## ## residuals: ## min 1q median 3q max ## -6.759 -1.371 -0.018 1.340 8.680 ## ## coefficients: ## estimate std. error t value pr(>|t|) ## (intercept) 0.02793 0.02842 0.983 0.326 ## x 1.03635 0.02865 36.175 <2e-16 *** ## --- ## signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ## ## residual standard error: 2.007 on 4983 degrees of freedom ## (15 observations deleted due missingness) ## multiple r-squared: 0.208, adjusted r-squared: 0.2078 ## f-statistic: 1309 on 1 , 4983 df, p-value: < 2.2e-16 coeftest(m1, vcov=function(x) cluster.vcov(x, petersen$firmid)) ## ## t test of coefficients: ## ## estimate std. error t value pr(>|t|) ## (intercept) 0.027932 0.067198 0.4157 0.6777 ## x 1.036354 0.050700 20.4407 <2e-16 *** ## --- ## signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
for different approach using plm
package see:
Comments
Post a Comment