EC3380
UNIVERSITY OF WARWICK
Summer Examinations 2015/16
Econometrics 2: Microeconometrics
Time Allowed: 1.5 Hours
Answer ALL SIX questions in Section A (72 marks) and ONE question in Section B (28 marks).
Answer Section A questions in one booklet and Section B questions in a separate booklet.
Approved pocket calculators are allowed.
Read carefully the instructions on the answer book provided and make sure that the particulars
required are entered on each answer book. If you answer more questions than are required and
do not indicate which answers should be ignored, we will mark the requisite number of answers
in the order in which they appear in the answer book(s): answers beyond that number will not
be considered.
Section A: Answer ALL SIX questions
1. Imagine we want to estimate the impact of receiving some treatment (e.g. participating in
a job training programme) on a continuous outcome (e.g. wages).
(a)
Explain what is meant by “selection on observables” or the “conditional independence
assumption” in this context.
(3 marks)
(b) Explain briefly how matching works and discuss the situations in which matching
might be more appropriate than OLS (without matching) for estimating this treat-
ment effect.
(3 marks)
(c)
Which of the estimators covered on the course would be inconsistent if there was
“selection on unobservables” or “unobserved heterogeneity”?
(2 marks)
(d) Which of the estimators covered on the course would be consistent if the only source
of unobserved heterogeneity was constant within units (e.g. did not vary within in-
dividuals over time)?
(2 marks)
(Continued overleaf)
1
EC3380
2. Consider a panel data model of the following form:
yit =
x0
β +
c
)
,
it
i +
uit,
uit ∼
IID(0
, σ2
u
where
i indexes individuals (
i = 1
, · · ·
, N ),
t indexes time periods (
t = 1
, · · ·
, T ) and
the
uit are assumed to be independently and identically distributed over
i and
t, with
mean zero and variance
σ2.
x
u
it is a vector of explanatory variables and
β is a vector of
parameters.
(a)
State the additional assumptions you would need to make in order to consistently
and efficiently estimate the above model using the random effects (RE) estimator.
(2 marks)
(b) Write down the RE estimator in a way that makes clear it is a weighted average of
the within and between estimators.
(2 marks)
(c)
Define the weighting parameter and discuss the circumstances under which the ran-
dom effects estimator would approach the within estimator or the pooled OLS esti-
mator.
(8 marks)
3. During the late 1980s and early 1990s, the government gave all secondary schools in
England the option of becoming “grant maintained” (GM). GM schools were granted in-
dependence from the local authority, offering greater control over their land and buildings,
and power to set their own admissions policies. To become grant maintained, schools had
to secure a majority vote amongst parents of children at the school (i.e. more than 50%
had to vote in favour). Assume that the vote is binding, i.e. that the school had to become
grant maintained if parents voted in favour, and had to stay under local authority control
if they voted against.
(a)
How might you estimate the causal effect of a school becoming grant maintained
on the exam results of its pupils in this scenario? Describe the intuition behind the
method you would use. State and explain the assumptions required for this method to
consistently estimate the treatment effect, and whether you believe these assumptions
are likely to hold in this case.
(9 marks)
The results of this evaluation suggest that becoming grant maintained significantly in-
creases the average test scores of children in the school. On the basis of these results, the
government decides to force all schools to become grant maintained.
(b) Discuss any concerns you might have about the results being used in this way.
(5 marks)
(Continued overleaf)
2
EC3380
4. Let
Y1
, Y2
, · · ·
, Yn be
n independent and identically
N (
µ, 1) distributed random variables,
and let
y1
, . . . , yn denote their realisations.
(a)
Write down the log-likelihood function for observation
i.
(1 mark)
Hint:
Y ∼
N (
µ, 1) if:
1
1
f
√
Y (
y;
µ) =
exp − (
y −
µ)2
.
2
π
2
(b) Briefly motivating each step, derive the Maximum Likelihood (ML) estimator of
µ.
(4 marks)
5. Suppose we are interested in studying married women’s labour force participation. For this
purpose, we observe a random sample of married women in the UK {(
xi, yi) :
i = 1
, . . . , N },
where
yi takes value 1 if woman
i is working and 0 otherwise, and
xi is a 1 ×
K vector
of explanatory variables including, among other things, an intercept, husband’s income,
age, experience, years of education, number of children of 4 years of age or younger, and
number of children above age 4. Suppose
exiβ
P (
yi = 1 |
xi) = Λ(
xiβ) ≡
.
1 +
exiβ
(a)
Derive the log-likelihood function for individual
i.
(1 mark)
(b) Let
xj be one of the continuous variables in
x. Write both the “average partial effect”
(APE) and the “partial effect at the average” (PEA) of
xj on
y. If you were asked to
report only one of them, which one would you choose in this empirical application?
Briefly motivate your answer.
(4 marks)
(c)
Let
xK be a discrete variable in
x. Write both APE and PEA of
xK on
y. If you
were asked to report only one of them, which one would you choose in this empirical
application? Briefly motivate your answer.
(4 marks)
(d) Let
xiβ =
wiδ +
ziγ, where
δ and
γ are two vectors of length
H and
Q, respectively;
H +
Q =
K. Describe how would you test the null hypothesis
H0 :
γ =
0 against the
alternative
H1 :
γ 6=
0 (4 marks)
(Question 5 continued overleaf)
3
EC3380
Let ˆ
β
be the ML estimate of
β obtained by using the observations {(
x
N
i, yi) :
i = 1
, . . . , N }.
For each
i, you could compute the predictions as
ˆ
1
if Λ(
xiβ ) ≥ 1
/2
˜
y
N
i =
.
0
otherwise
The
percent correctly predicted is the percentage of times that ˜
yi =
yi,
i = 1
, · · ·
, N . Some
have criticised this prediction rule for two reasons: firstly, because the threshold value 1
/2
is arbitrary. Secondly, because it is possible to get high correctly predicted percentages
even when the least likely outcome is poorly predicted.
(e)
Show formally why the second concern may arise.
(7 marks)
6. Some response variables in economics can come in the form of a
duration, which is the time
elapsed before or since a certain event occurs. A few examples include weeks unemployed,
months spent on welfare, and days until next arrest after incarceration.
Let
T ∗ denote the duration of some event, such as unemployment, measured in continuous
time. Consider the following model:
T ∗ = exp(
xβ +
u)
u |
x ∼
N (0
, σ2)
,
where
x is a vector of observable explanatory variables,
β is a vector of unknown param-
eters of interest, and
u is the unobservable error term.
Suppose
T ∗ is not fully observable; instead of observing
T ∗, we are only able to observe
T = min(
T ∗
, c), where
c > 0 is a censoring constant.
(a)
Briefly motivating each step, derive
P (
T =
c |
x) in terms of the standard normal
CDF Φ(·),
c,
x,
β and
σ.
(8 marks)
(b) What happens to
P (
T =
c |
x) when
c → ∞? Explain briefly the intuition behind
this result.
(3 marks)
(Continued overleaf)
4
EC3380
Section B: Answer ONE question
7. (Inspired by Ashenfelter and Krueger 1994, and Bonjour et al., 2003). In this exercise, we
are interested in estimating returns to education using a data set of female, genetically
identical U.K. twins. Each twin was asked questions about, among other things, their
earnings, their own schooling, and the schooling of the other twin.
Suppose the (log of) wage of twin
i in family
f is determined by
yif =
α +
βSif +
Aif +
εif ,
(
i = 1
, 2;
f = 1
, · · ·
, n)
,
where
Sif is years of schooling,
Aif is “ability”, broadly defined as all other variables
affecting wages outside those of schooling (intelligence, motivation, family background,
access to educational funds, etc.), and
εif is an independently and identically distributed
(i.i.d.) error.
uif =
Aif +
εif is unobserved.
n is the number of twin-pairs, and
N = 2
n is
the total number of observations in the dataset. From now onwards, we will assume that,
for all
i, j = 1
, 2,
cov(
εif , Sjf ) = 0
(A1)
cov(
εif , Ajf ) = 0
.
(A2)
It is possible to show (but you are not asked to do it) that, as
n → ∞,
ˆ
ˆ
cov(
S
p
cov(
S
β(OLS
,1) =
if , yif ) −→
if , yif )
.
N
ˆ
V (
S
V (
S
if )
if )
(a)
Show that, unless
cov(
Sif , Aif ) = 0, ˆ
β(OLS
,1) is an inconsistent estimator of
β. Would
N
you expect ˆ
β(OLS
,1) to be larger or smaller than
β? Explain.
(4 marks)
N
p
Hint 1 : Recall that an estimator ˆ
β is said to be inconsistent for
β when ˆ
β −→
c 6=
β.
Hint 2 : Recall that, for any two random variables
X and
Y , and any two scalars
a
and
b,
cov(
Y, a +
bX) =
bcov(
Y, X)
, V (
aY +
bX) =
a2
V (
Y ) +
b2
V (
X) + 2
abcov(
Y, X)
.
Given that the data set consists of identical twins, one could assume that
A1
,f =
A2
,f for
all
f = 1
, . . . , n, and estimate
β by using variables in differences instead of in levels:
∆
yf =
β∆
Sf + ∆
εf ,
(
f = 1
, . . . , n)
,
(Question 7 continued overleaf)
5
EC3380
where ∆
Sf =
S1
,f −
S2
,f (∆
yf and ∆
εf are defined similarly). It is possible to show (but
you are not asked to do it) that, as
n → ∞,
ˆ
ˆ
cov(∆
S
p
cov(∆
S
β(OLS
,2) =
f , ∆
yf ) −→
f , ∆
yf )
.
n
ˆ
V (∆
S
V (∆
S
f )
f )
(b) Explain briefly what
A1
,f =
A2
,f means and why it is useful in this context.
(3 marks)
(c)
Do you think that
A1
,f =
A2
,f ,
f = 1
, . . . , n, is a reasonable assumption? Would you
be able to provide at least one argument in favour, and one against it?
(3 marks)
(d) Show that, if
A1
,f =
A2
,f ,
f = 1
, . . . , n, ˆ
β(OLS
,2) is a consistent estimator of
β.
n
(2 marks)
The variable “years of schooling” (
Sif ) is often observed with error. In other words, we
would like to observe
Sif , whereas we actually observe
S0 , where
if
S0 =
S
,
(
j = 1
, 2;
f = 1
, · · ·
, n)
,
if
if +
v0
if
and
v0 is an unobserved
measurement error. From here onwards, we will assume that, for
if
all
i, j = 1
, 2,
cov(
v0
, S
if
jf ) = 0
(B1)
cov(
v0
, A
if
jf ) = 0
(B2)
cov(
v0
, ε
if
jf ) = 0
(B3)
and
cov(
v0
, v0 ) = 0
.
(B4)
1
,f
2
,f
The OLS estimator in a regression of
yif on a constant and
S0 is equal to
if
ˆ
cov(
S0
, y
cov(
S0
, y
ˆ
if )
p
if )
β(OLS
,3) =
if
−→
if
.
N
ˆ
V (
S0 )
V (
S0 )
if
if
(e)
Show that, unless
cov(
Sif , Aif ) = 0 and
V (
v0 ) = 0, ˆ
β(OLS
,3) is an inconsistent estima-
if
N
tor of
β. What happens to ˆ
β(OLS
,3) when
V (
v0 ) becomes larger and larger? Provide
N
if
some intuition for why this is the case.
(10 marks)
Hint 3 : You may want to substitute
Sif =
S0 −
v0 in the equation of
y
if
if
if .
(f)
Assume
V (
v0 ) 6= 0, i.e. that there is unobserved measurement error. How could you
if
obtain a consistent estimator of
β using the data at your disposal in this exercise?
Explain the method you would use and how this helps to overcome the measurement
error problem.
(6 marks)
(Continued overleaf)
6
EC3380
8. Some corner solution responses can take on two or more values with positive probability.
For instance, when the response variable is a fraction or a percent, the corners are usually
at zero and one, or zero and 100, respectively. Another example is when institutional con-
straints impose corners at other values. For instance, if workers are allowed to contribute
at most 15% of their earnings to a tax-deferred pension plan, and
yi is the fraction of
income contributed for worker
i, the corners are at zero and 0
.15.
Generally, let
a1
< a2, and consider the following model:
a
1
if
y∗ ≤
a1
y =
y∗
if
a1
< y∗
< a2
a2
if
y∗ ≥
a2
y∗ =
xβ +
u,
u |
x ∼
N (0
, σ2)
.
This specification ensures that
P (
y =
a1)
> 0 and
P (
y =
a2)
> 0 but
P (
y =
a) = 0 for
a1
< a < a2. Therefore, this model is applicable only when we actually see pileups at the
two endpoints and then a (roughly) continuous distribution in between.
(a)
Briefly motivating each step, derive
P (
y =
a1 |
x) and
P (
y =
a2 |
x) in terms of
the standard normal CDF Φ(·),
x,
β and
σ. Also, derive the PDF of
y given
x for
realisations of
y in the interval (
a1
, a2).
(7 marks)
(b) Briefly motivating each step, derive
E(
y |
x, a1
< y < a2) and
E(
y |
x).
(14 marks)
Hint: If
Z ∼
N (0
, 1), then for
c1
< c2 we have that
φ(
c
E(
Z |
c
1) −
φ(
c2)
1
< Z < c2) =
.
Φ(
c2) − Φ(
c1)
(c)
Consider the following method for estimating
β. Using only the nonlimit observa-
tions, that is, observations for which
a1
< yi < a2, run the OLS regression of
yi on
xi. Explain why this does not generally produce a consistent estimator of
β.
(3 marks)
(d) How would you estimate
E(
y |
x, a1
< y < a2) and
E(
y |
x)?
(4 marks)
(End)
7