Problem 1: Monte Carlo Study In this study, you are required to investigate how the OLS estimators behave under different assumptions about the error distribution and varying sample sizes. The population data are generated from the linear
Applied Econometrics Project With R
- There are four problems with a total of 100 The points awarded for each problem are indicated in squared brackets.
- The workload depends on the actual number of persons in a project team:
- One-person team: Answer two problems one from Problem 1 or 2, and one from Problem 3 or 4.
- Two-person team: Answer three problems. Follow the same rules as for a one-person team and select one of the remaining two
- Three-or four-person team: Answer all four problems.
- Please either use the designed cover-page for Microsoft Word or the R Markdown template, available on PANDA.
- Writing the report using R Markdown is only recommended, if you have pre- vious experience using it.
- Please either use the designed template for Microsoft Word or the R Markdown template, available on PANDA.
- Please submit your digital sources according to the guidelines on
- Submit your project and all sources into the provided upload section on
- Important: On the cover page of your report, please indicate the contribution of each group member in bullet points.
W4479 - Econometrics, Project, SS 2025 Prof. Dr. Y. Feng / O. K. Ayensu / H. Pham
Problem 1: Monte Carlo Study
In this study, you are required to investigate how the OLS estimators behave under different assumptions about the error distribution and varying sample sizes.
The population data are generated from the linear regression model
Yi = 2.75 + 0.85Xi + ui, for i = 1, . . . , 10000,
where Xi are non stochastic and real-valued over the interval [10, 200]. The error term ui
is drawn from one of the following two distributions:
i.i.d.
5(ei ? 5)
i.i.d. 2
(i) ui ? N (0, 25), (ii) ui = ?10 with ei ? ? (5)
Note that in both cases, E[ui] = 0 and Var(ui) = 25. For each error distribution, 1000 random samples are drawn at each of the sample sizes n = 20, 80, 320, and the regression coefficients are estimated for each sample.
The accompanying R file W4479-Project-Problem 1-SS2025-Group XY.R produces a data frame named sim_df containing all coefficient estimates, labeled by sample size and error distribution. You are required to continue working in the R file after line 90 using your own R code to answer the following questions based on sim_df:
- For each combination of sample size and error distribution, calculate the sample mean and the sample variance of the 1000 intercept estimates, and separately for the 1000 slope estimates. Present all results in a single table, and comment on your findings relative to the true parameter [8]
- Create Q-Q plots for the intercept and slope estimates under all settings (i.e., for each error distribution and sample size).Based on these plots, assess whether the OLS estimates appear approximately normally [6]
- Perform Jarque-Bera tests for normality on the intercept and slope estimates for all combinations of error distribution and sample size. Summarize and interpret your [5]
- Using your results from parts a), b) and c), discuss how the distribution of the OLS estimators depends on sample size and the error distribution. Do your empirical results align with classical econometric theory? [6]
[Total: 25 points]
W4479 - Econometrics, Project, SS 2025 Prof. Dr. Y. Feng / O. K. Ayensu / H. Pham
Problem 2: Simple Linear Regression
Complete the following tasks and write your own R code in the designated script file named W4479-Project-Problem 2-SS2025-Group XY.R.
- Find an economic dataset containing at least 52 observations and two relevant variables. One variable should serve as the dependent variableY , and the other as the independent variable X, such that a linear relationship between the two is plausible. Clearly state the source of your data, provide a brief description of both variables, and construct a summary table that includes, for each variable: number of observations, minimum value, maximum value, sample mean, sample variance. Briefly comment on the key features revealed by the summary [6]
- Fit a simple linear regression model using OLS and state the estimated regression equation. Plot the data points and overlay the fitted regression line. Comment on the visual [6]
- Provide a logical interpretation of the estimated intercept and slope coefficients in
b). [3]
- Assess the statistical significance of the slope coefficient ?2.Calculate and report the 99% confidence intervals for the intercept ?1 and slope ?2. [5]
- Formulate a meaningful one-tailed hypothesis test regarding the slope coefficient?2 at a 5% significance level. Clearly state the null and alternative hypotheses and justify their choice in the context of your data. Perform the test and interpret the [5]
[Total: 25 points]
W4479 - Econometrics, Project, SS 2025 Prof. Dr. Y. Feng / O. K. Ayensu / H. Pham
Problem 3: Modelling Nonlinear Relationships
Complete the following tasks and write your own R code in the designated script file named W4479-Project-Problem 3-SS2025-Group XY.R.
- Find a suitable economic dataset comprising two strictly positive variables with at least 30 observations, and use it to illustrate the application of the simple linear regression (SLR) model following an appropriate nonlinear transformation. The relationship between the two variables should be clearly State the source of the data, briefly describe the two variables, and report the sample size. [4]
- Present a graphical representation of the original data. Justify why an SLR model may not be suitable for the original [2]
- Define the transformed variables asX? = log(Xi), X?? = 1 , and Y ?
= log(Yi).
i i Xi i
Display scatter plots of the variable pairs (X?, Yi), (X??, Yi), (Xi, Y ?), and (X?, Y ?).
i i i i i
Provide concise commentary on the characteristics observed in each plot, particularly in relation to potential linearity. [3]
- Fit a simple linear regression model to each of the following data combinations: (Xi, Yi), (X?, Yi), (X??, Yi), (Xi, Y ?), and (X?, Y ?). Present the fitted models in ap-
i i i i i
propriate table(s), including the estimated coefficients, the corresponding standard errors, the t-values, and the coefficients of determination R2. [4]
- For the transformed models (Xi, Y ?) and (X?, Y ?), state the corresponding re-
i i i
transformed regression equations. [2]
- Compute the fitted valuesYij , the residuals uij = Yi ? Yij , and the residual sum
of squares RSSj = ? u2 for all five models, where Yij denotes the fitted value
of observation i under model j, for j = 1, 2, . . . , 5. Identify the preferred model according to the RSS criterion. [3]
- Display the fitted regression lines from all five models in a single figure. Comment on the quality of fit across models. Critically assess whether the preferred model in
- offers a significantly better fit than the original SLR [4]
- For the preferred model in f), construct the 95% prediction interval for the individual observations. Display the transformed (or original, as appropriate) data, the fitted regression line, and the prediction interval in a single [3]
[Total: 25 points]
W4479 - Econometrics, Project, SS 2025 Prof. Dr. Y. Feng / O. K. Ayensu / H. Pham
Problem 4: Multiple Linear Regression
Complete the following tasks and write your own R code in the designated script file named W4479-Project-Problem 4-SS2025-Group XY.R.
- Find an economic dataset with four variables (one regress and Y and three regressors X2, X3, and X4), and at least 50 observations. A multiple linear regression model should be suitable for modeling Y using X2, X3, and X4. Provide the source of your data, describe each variable, and present a table showing, for each variable: number of observations, minimum value, maximum value, sample mean, and sample Briefly comment on the table. [5]
- Based on your understanding of the variables and economic theory, state your prior expectations regarding the direction and nature of the relationships between the dependent variable and each of the independent [2]
- Regress Yon the following sets of regressors using OLS (comprising a total of seven fitted models): (X2), (X3), (X4), (X2, X3), (X2, X4), (X3, X4), and (X2, X3, X4).
Present the estimated coefficients (including the intercept) and their corresponding
p-values in a table. [5]
- Using the table from c), comment on which of the models is preferred.1Clearly state the selected fitted [2]
- For all of the models fitted in c), report the corresponding R2and adjusted R2 (R2) in a table. Comment on the table and explain which model is preferred based on these Clearly state the selected fitted model. [4]
- For the preferred model identified in e), calculate the 95% and 99% confidence intervals for the estimated model [3]
- Assuming normality of the errors, calculate and report the point predictionY0 when X2, X3, and X4 are set to their respective sample Calculate and present the corresponding 99%-prediction intervals for the individual observation. [4]
[Total: 25 points]
1For your argumentation, consider the insignificance of some effects in the models.