Exercises
11.6. Exercises#
1. Expected minimum and maximum of i.i.d. uniform variables
Let \(U_1, U_2, \ldots, U_n\) be i.i.d. uniform on the interval \((0, 1)\), and let \(L_n = \max(U_1, U_2, \ldots, U_n)\). That is, let \(L_n\) be the largest of \(U_1, U_2, \ldots, U_n\).
[It is a fact about independent continuous random variables that the chance that they are equal is \(0\). So you don’t have to worry about “ties”. That is, you can assume that \(U_1, U_2, \ldots, U_n\) are \(n\) distinct values.]
a) For \(0 < x < 1\), find \(P(U_1 \le x)\). Hence find \(P(L_n \le x)\).
b) Find the density of \(L_n\).
c) Find \(E(L_n)\).
d) To interpret the answer to Part c, let \(n=2\) for a start. Imagine marking the two values \(U_1\) and \(U_2\) on the unit interval. These two random values split the unit interval \((0, 1)\) into 3 pieces of random lengths. It is a fact (and makes intuitive sense) that the lengths of the 3 pieces are identically distributed. Use this to interpret your answer to \(E(L_2)\), and then generalize the interpretation to \(E(L_n)\).
e) Now let \(M_n = \min(U_1, U_2, \ldots, U_n)\) be the smallest of \(U_1, U_2, \ldots, U_n\). Use the idea in Part d to find \(E(M_n)\).
2. Range of a uniform sample
[This problem will go faster if you have done the previous one.]
Let \(\theta_1 < \theta_2\) and suppose \(X_1, X_2, \ldots, X_n\) are i.i.d. uniform on the interval \((\theta_1, \theta_2)\). Let \(\theta = \theta_2 - \theta_1\) be the length of the interval.
a) Let \(M_n = \min(X_1, X_2, \ldots, X_n)\) be the sample minimum and \(L_n = \max(X_1, X_2, \ldots, X_n)\) the sample maximum. The statistic \(R_n = L_n - M_n\) is called the range of the sample and is a natural estimator of \(\theta\). Without calculation, explain why \(R_n\) is biased, and say whether it underestimates or overestimates \(\theta\).
b) Find the bias of \(R_n\) and confirm that its sign is consistent with your answer to Part a. For large \(n\), is the size of the bias large or small?
c) Use \(R_n\) to construct \(T_n\), an unbiased estimator of \(\theta\).
d) Compare \(SD(R_n)\) and \(SD(T_n)\). Which one is bigger? For large \(n\), is it a lot bigger or just a bit bigger?
3. Regression estimates
For a random mother-daughter pair, let \(X\) be the height of the mother and \(Y\) the height of the daughter. In the notation of Section 11.3 suppose \(\mu_X = 63.5\), \(\mu_Y = 63.7\), \(\sigma_X = \sigma_Y = 2\), and \(r(X, Y) = 0.6\).
a) Find the equation of the regression line for estimating \(Y\) based on \(X\).
b) Find the regression estimate of \(Y\) given that \(X = 62\) inches.
c) Find the regression estimate of \(Y\) given that \(X\) is \(2\) standard deviations above \(\mu_X\). You should be able to do this without finding the value of \(X\) in inches.
4. Estimating percentile ranks
It can be shown that for football shaped scatter plots it is OK to assume that each of the two variables is normally distributed.
Suppose that a large number of students take two tests (like the Math and Verbal SAT), and suppose that the scatter plot of the two scores is football shaped with a correlation of 0.6.
a) Let \((X, Y)\) be the scores of a randomly picked student, and suppose \(X\) is on the the 90th percentile. Estimate the percentile rank of \(Y\).
b) Let \((X, Y)\) be the score of a randomly picked student, and suppose \(Y\) is on the 78th percentile. Estimate the percentile rank of \(X\).
5. Least squares constant predictor
Let \(X\) be a random variable with expectation \(\mu_X\) and SD \(\sigma_X\). Suppose you are going to use a constant \(c\) as your predictor of \(X\).
a) Let \(MSE(c)\) be the mean squared error of the predictor \(c\). Write a formula for \(MSE(c)\).
b) Guess the value of \(\hat{c}\), the least squares constant predictor. Then prove that it is the least squares constant predictor.
c) Find \(MSE(\hat{c})\).
6. No-intercept regression
Sometimes data scientists want to fit a linear model that has no intercept term. For example, this might be the case when the data are from a scientific experiement in which the attribute \(X\) can have values near \(0\) and there is a physical reason why the response \(Y\) must be \(0\) when \(X=0\).
So let \((X, Y)\) be a random point and suppose you want to predict \(Y\) by an estimator of the form \(aX\) for some \(a\). Find the least squares predictor \(\hat{Y}\) among all predictors of this form.
7. Uncorrelated versus independent
Let \(X\) have the uniform distribution on the three points \(-1\), \(0\), and \(1\). Let \(Y = X^2\).
a) Show that \(X\) and \(Y\) are uncorrelated.
b) Are \(X\) and \(Y\) independent?
8. Regression equation
The regression equation can be written in multiple forms. For any particular purpose, one of the forms might be more convenient than the others. So it is a good idea to recognize them.
For \(a^* = r\frac{\sigma_Y}{\sigma_X}\), which of the following is the equation of the regression line for estimating \(Y\) based on \(X\)? More than one is correct.
(i) \(Y = a^*X + (\mu_Y - a^*\mu_X)\)
(ii) \(\hat{Y} = a^*X + (\mu_Y - a^*\mu_X)\)
(iii) \(\hat{Y} = a^*(X - \mu_X) + \mu_Y\)
(iv) \(\displaystyle{\hat{Y} = r\frac{X - \mu_X}{\sigma_X}}\)
(v) \(\displaystyle{\frac{\hat{Y} - \mu_Y}{\sigma_Y} = r\frac{X - \mu_X}{\sigma_X}}\)
9. Average of the residuals
a) In Data 8 we say that the regression line passes through the point of averages. Show this by setting \(X = \mu_X\) and finding the corresponding value of \(\hat{Y}\).
b) Find \(E(\hat{Y})\). In Data 8 language, this is the average of the fitted values.
c) Let \(D = Y - \hat{Y}\) be the residual as in Section 11.5 Find the expectation of the residual and confirm that the answer justifies the following statement from Data 8:
“No matter what the shape of the scatter diagram, the average of the residuals is 0.”
10. Variance decomposition
In this exercise you will find the relation between the variances of \(Y\), its regression estimate \(\hat{Y}\), and the residual \(D = Y - \hat{Y}\).
a) Find \(Var(\hat{Y})\).
b) Show that the answer to Part a justifies the following statement from Data 8:
Note: Usually, the result above is stated in terms of variances instead of SDs, and hence \(r^2\) is sometimes called “the proportion of variability explained by the linear model”.
c) Justify the decomposition of variance formula \(Var(Y) = Var(\hat{Y}) + Var(D)\).
11. Regression accuracy
For a random mother-daughter pair, let \(X\) be the height of the mother and \(Y\) the height of the daughter. Suppose the correlation is \(r(X, Y) = 0.6\) and let \(\sigma_Y = 2\) inches.
Let \(\hat{Y}\) be the regression estimate of the daughter’s height \(Y\) based on the mother’s height \(X\), and let \(D = Y - \hat{Y}\) be the residual or error in the regression estimate.
a) Find \(\sigma_D\).
b) Fill in the blank with a percentage: There is at least \(\underline{~~~~~~~~~~}\) chance that the estimate \(\hat{Y}\) is correct to within \(3.2\) inches.
Find the best bound you can, and justify your answer.