# Exercises

# 11.6. Exercises#

**1. Expected minimum and maximum of i.i.d. uniform variables**

Let \(U_1, U_2, \ldots, U_n\) be i.i.d. uniform on the interval \((0, 1)\), and let \(L_n = \max(U_1, U_2, \ldots, U_n)\). That is, let \(L_n\) be the largest of \(U_1, U_2, \ldots, U_n\).

[It is a fact about independent continuous random variables that the chance that they are equal is \(0\). So you don’t have to worry about “ties”. That is, you can assume that \(U_1, U_2, \ldots, U_n\) are \(n\) distinct values.]

**a)** For \(0 < x < 1\), find \(P(U_1 \le x)\). Hence find \(P(L_n \le x)\).

**b)** Find the density of \(L_n\).

**c)** Find \(E(L_n)\).

**d)** To interpret the answer to Part **c**, let \(n=2\) for a start. Imagine marking the two values \(U_1\) and \(U_2\) on the unit interval. These two random values split the unit interval \((0, 1)\) into 3 pieces of random lengths. It is a fact (and makes intuitive sense) that the lengths of the 3 pieces are identically distributed. Use this to interpret your answer to \(E(L_2)\), and then generalize the interpretation to \(E(L_n)\).

**e)** Now let \(M_n = \min(U_1, U_2, \ldots, U_n)\) be the smallest of \(U_1, U_2, \ldots, U_n\). Use the idea in Part **d** to find \(E(M_n)\).

**2. Range of a uniform sample**

[This problem will go faster if you have done the previous one.]

Let \(\theta_1 < \theta_2\) and suppose \(X_1, X_2, \ldots, X_n\) are i.i.d. uniform on the interval \((\theta_1, \theta_2)\). Let \(\theta = \theta_2 - \theta_1\) be the length of the interval.

**a)** Let \(M_n = \min(X_1, X_2, \ldots, X_n)\) be the sample minimum and \(L_n = \max(X_1, X_2, \ldots, X_n)\) the sample maximum. The statistic \(R_n = L_n - M_n\) is called the *range* of the sample and is a natural estimator of \(\theta\). Without calculation, explain why \(R_n\) is biased, and say whether it underestimates or overestimates \(\theta\).

**b)** Find the bias of \(R_n\) and confirm that its sign is consistent with your answer to Part **a**. For large \(n\), is the size of the bias large or small?

**c)** Use \(R_n\) to construct \(T_n\), an unbiased estimator of \(\theta\).

**d)** Compare \(SD(R_n)\) and \(SD(T_n)\). Which one is bigger? For large \(n\), is it a lot bigger or just a bit bigger?

**3. Regression estimates**

For a random mother-daughter pair, let \(X\) be the height of the mother and \(Y\) the height of the daughter. In the notation of Section 11.3 suppose \(\mu_X = 63.5\), \(\mu_Y = 63.7\), \(\sigma_X = \sigma_Y = 2\), and \(r(X, Y) = 0.6\).

**a)** Find the equation of the regression line for estimating \(Y\) based on \(X\).

**b)** Find the regression estimate of \(Y\) given that \(X = 62\) inches.

**c)** Find the regression estimate of \(Y\) given that \(X\) is \(2\) standard deviations above \(\mu_X\). You should be able to do this without finding the value of \(X\) in inches.

**4. Estimating percentile ranks**

It can be shown that for football shaped scatter plots it is OK to assume that each of the two variables is normally distributed.

Suppose that a large number of students take two tests (like the Math and Verbal SAT), and suppose that the scatter plot of the two scores is football shaped with a correlation of 0.6.

**a)** Let \((X, Y)\) be the scores of a randomly picked student, and suppose \(X\) is on the the 90th percentile. Estimate the percentile rank of \(Y\).

**b)** Let \((X, Y)\) be the score of a randomly picked student, and suppose \(Y\) is on the 78th percentile. Estimate the percentile rank of \(X\).

**5. Least squares constant predictor**

Let \(X\) be a random variable with expectation \(\mu_X\) and SD \(\sigma_X\). Suppose you are going to use a constant \(c\) as your predictor of \(X\).

**a)** Let \(MSE(c)\) be the mean squared error of the predictor \(c\). Write a formula for \(MSE(c)\).

**b)** Guess the value of \(\hat{c}\), the least squares constant predictor. Then prove that it is the least squares constant predictor.

**c)** Find \(MSE(\hat{c})\).

**6. No-intercept regression**

Sometimes data scientists want to fit a linear model that has no intercept term. For example, this might be the case when the data are from a scientific experiement in which the attribute \(X\) can have values near \(0\) and there is a physical reason why the response \(Y\) must be \(0\) when \(X=0\).

So let \((X, Y)\) be a random point and suppose you want to predict \(Y\) by an estimator of the form \(aX\) for some \(a\). Find the least squares predictor \(\hat{Y}\) among all predictors of this form.

**7. Uncorrelated versus independent**

Let \(X\) have the uniform distribution on the three points \(-1\), \(0\), and \(1\). Let \(Y = X^2\).

**a)** Show that \(X\) and \(Y\) are uncorrelated.

**b)** Are \(X\) and \(Y\) independent?

**8. Regression equation**

The regression equation can be written in multiple forms. For any particular purpose, one of the forms might be more convenient than the others. So it is a good idea to recognize them.

For \(a^* = r\frac{\sigma_Y}{\sigma_X}\), which of the following is the equation of the regression line for estimating \(Y\) based on \(X\)? More than one is correct.

(i) \(Y = a^*X + (\mu_Y - a^*\mu_X)\)

(ii) \(\hat{Y} = a^*X + (\mu_Y - a^*\mu_X)\)

(iii) \(\hat{Y} = a^*(X - \mu_X) + \mu_Y\)

(iv) \(\displaystyle{\hat{Y} = r\frac{X - \mu_X}{\sigma_X}}\)

(v) \(\displaystyle{\frac{\hat{Y} - \mu_Y}{\sigma_Y} = r\frac{X - \mu_X}{\sigma_X}}\)

**9. Average of the residuals**

**a)** In Data 8 we say that the regression line passes through the point of averages. Show this by setting \(X = \mu_X\) and finding the corresponding value of \(\hat{Y}\).

**b)** Find \(E(\hat{Y})\). In Data 8 language, this is the average of the fitted values.

**c)** Let \(D = Y - \hat{Y}\) be the residual as in Section 11.5 Find the expectation of the residual and confirm that the answer justifies the following statement from Data 8:

“No matter what the shape of the scatter diagram, the average of the residuals is 0.”

**10. Variance decomposition**

In this exercise you will find the relation between the variances of \(Y\), its regression estimate \(\hat{Y}\), and the residual \(D = Y - \hat{Y}\).

**a)** Find \(Var(\hat{Y})\).

**b)** Show that the answer to Part **a** justifies the following statement from Data 8:

Note: Usually, the result above is stated in terms of variances instead of SDs, and hence \(r^2\) is sometimes called “the proportion of variability explained by the linear model”.

**c)** Justify the *decomposition of variance* formula \(Var(Y) = Var(\hat{Y}) + Var(D)\).

**11. Regression accuracy**

For a random mother-daughter pair, let \(X\) be the height of the mother and \(Y\) the height of the daughter. Suppose the correlation is \(r(X, Y) = 0.6\) and let \(\sigma_Y = 2\) inches.

Let \(\hat{Y}\) be the regression estimate of the daughter’s height \(Y\) based on the mother’s height \(X\), and let \(D = Y - \hat{Y}\) be the residual or error in the regression estimate.

**a)** Find \(\sigma_D\).

**b)** Fill in the blank with a percentage: There is at least \(\underline{~~~~~~~~~~}\) chance that the estimate \(\hat{Y}\) is correct to within \(3.2\) inches.

Find the best bound you can, and justify your answer.