· Explain in your own words how to do the tests and whether or not you reject the hypotheses. For all tests, report p-values.
· Cut and paste your software output for all commands that you execute. In Stata, an easy way to copy is to highlight, right-click, and choose ‘Copy as Picture.’
· . You don’t need to provide any data, but we do need to see the regression estimates and the commands that you executed.
Use the data in the file mlbsal17f. The file contains a cross section of 353 observations on salaries and performance measures for baseball players. (When importing the data into Stata, make sure to check ‘first row is variable names.’)
Suppose you want to model a player’s salary as a function of the player’s total games played, at-bats, runs scored, hits, runs batted in, home runs, doubles, and triples. The units of salary are thousands of dollars.
The log-linear model takes the form:
ln(salary) = β0 + β1games + β2ab + β3hits + β4runs + β5rbi + β6doubles + β7triples + β8hr + u.
The linear model takes the form:
salary = b0 + b1games + b2ab + b3hits + b4runs + b5rbi + b6doubles + b7triples + b8hr + v
The double-log model takes the form:
ln(salary)=B0 + B1ln(games) + B2ln(ab) + B3ln(hits) + B4ln(runs) + B5ln(rbi) + B6doubles + B7triples + B8hr + w.
(We don’t take the logs of doubles, triples, or hr because some observations are zero.)
a. For all three models, report and interpret the estimated coefficient for runs.
b. Which of the three models fits the data best?
c. Consider the 2nd player in the sample, with 918 games played, 3333 at-bats, etc. What salary does the double-log model predict for such a player?
d. Compute a test statistic for testing the hypothesis that the double-log model has no explanatory power.
H0: R2 = 0.
State the distribution of the test statistic (Remember also for every test in this exam to report the p-value.)
e. For the double-log model, test at the .10 level the hypothesis that, all else equal, doubles has no influence on ln(salary).
H0: B6 = 0.
Now do the same for hr.
H0: B8 = 0.
The test statistics have how many degrees of freedom?
f. Can you use the results from part e to make any inferences about the hypothesis that doubles and hr together have no effect on ln(salary),
H0: B6 = B8 = 0?
Perform an F-test of the above hypothesis at the .10 level. State the degrees of freedom for the test, and show how the test statistic can be computed from sums of squared residuals. Can you reject H0?
g. In the presence of heteroscedasticity, are the coefficient estimates still BLUE? Are conventional test statistics still valid? Describe the properties of the estimators of the coefficients and the standard errors under heteroscedasticity.
h. Use a Breusch-Pagan test to test for heteroscedasticity in the double-log model. After estimating the model in Stata, you can perform the test using estat hettest followed by the list of explanatory variables. Do you find evidence of heteroscedasticity?
i. Can you show how the same Breusch-Pagan test statistic in part h can be computed from a regression involving the squared (and scaled) residuals?
j. Perform a heteroscedasticity-consistent test of the hypothesis that, in the double-log model, the coefficient on ln(ab) equals zero
H0: B2 = 0.
Does correcting for heteroscedasticity change your inference on this coefficient in this model?
k. Find the double-log model that maximizes the adjusted R2. State the rule that gives a necessary condition for the adjusted R2 to be maximized. Does the adjusted R2 have interpretation as a percentage of variation?
l. Stata reports the Standard Error of the Regression (aka Standard Error of the Estimate) under “Root MSE.” Show for the double-log model how this statistic can be computed from the sum of squared residuals.
m. Show that the bias in the coefficient on ln(games) due to omitting from the model ln(ab) equals the product of two terms: (1) the coefficient on ln(ab) and (2) the coefficient on ln(games) from a regression of ln(ab) on all the other explanatory variables.