This assessment is an essay in applied econometrics based on an empirical project. In economics, theory often suggests there is a set of relationships between variables. In this assessment you are asked to find an empirical specification corresponding to these theories or hypotheses. The goal of this assessment is to help you to continue the transition from “student” to “researcher”. This assignment builds on research work already undertaken in CDA Assignment 1 (Appendix A has some Advice on writing an essay). In this research project you will be responsible for deciding what is the interesting question you are asking and then organizing your analysis in order to best answer that question.

In part, this assessment tries to prepare you for the next step along the way to independent research. For most students, that will be the final year module, Research in Applied Economics (RAE, EC331). This assignment enables you to use some of the econometric techniques that have been taught in this module and to link that to economic theory. In part, it will act as a good basis for revision of the module as you will learn to understand and apply techniques that may have appeared only as abstract ideas

This assessment is a group project. Groups are required to have no more than FIVE members. You need to have signed up to your group of FIVE people by 12th January 2024. Each member of the group should be contributing equally to the project, but this requires careful management and planning by the whole of the group to ensure all individuals have the ability to contribute to the group and no individual is excluded from activities, there are some tips on group work in Appendix B of this document under and you should read these before embarking on the work. However, we understand some individuals might try and free-ride on the efforts of others, therefore group members should submit a Group Project Evaluation Form that will allow you to assess the contribution of other individuals within your group and this evaluation score will impact the final mark individuals are given in the project. The details of how this works is described in the section entitled Marking Scheme below.

This assessment cannot be more than 2,500 words long (excluding the appendix and references). An accompanying appendix (of 8 pages or less) should contain the results of your regressions, any plots/figures you wish, and a list of the variable names (please use variables names that have some obvious interpretation) should be provided. The assessment MUST be written in LATEXand you should submit both the tex and PDF documents electronically. Note that you will also submit a Stata DO/R file as text document as well as the Stata/R data file. The deadline for submission is 1 May 2024. Please note that only ONE person per group should upload the THREE files needed for this assignment, the person submitting is submitting on behalf of the group the whole group should get reassurance that the files have been uploaded correctly.

Where you replace XX with you actual group number. Please ensure you do this very carefully. As you are submitting multiple files into Tabula you must upload these files together. To submit all of the files together make sure all files are in the same folder on your computer and then hold down the Ctrl key and click on each file until they are all highlighted and then upload them. There are two approaches to starting a project:

1. Find a dataset and identify the dependent variable and the main explanatory variable within the dataset. You will then need to find empirical/theoretical articles which deal with your (or a closely related) topic in order to give you a starting point for the key variables you are analysing (as well as other variables which may be important). The articles will also provides a motivation for why this is an interesting question to analyse.

2. Use published articles to decide on an empirical research question, and then find data to answer your chosen question. The problem with this approach is you may struggle to find appropriate data to answer the research question. If you are using time series data (or even macro panel data), this approach might be more appropriate since datasets (in e.g. macroeconomics, finance exist many series for different geographic locations (countries) and different time periods.

However you start the project is is important to understand that you should not be simply replicating the analysis and arguments of someone else (indeed this is plagiarism). Instead read papers analyzing the same question with different data.

After reading about the subject you should then estimate your chosen equation(s) by appropriate estimation techniques covered in lectures, e.g. OLS, IV, Dynamic Models, Limited Dependent Variable (LDV), Difference in Difference, Regression Discontinuity, or Panel Data methods. You should use as many of the variables in the dataset that are relevant, but focus on the research question as these are the important variables in your model (the other variables are control variables - and are there to ensure that the model is not suffering from obvious omitted relevant variable bias). How you specify your equation is left entirely up to you, but you should investigate the validity of your preferred equation.


The essay should have the following elements. Use your judgment for how much space to give each part. The right answer depends on how important each section is for your specific project.

You should give a motivation for your work and discuss the policy (or other) issues which you are trying to address. Given this context, you should then describe your detailed research question(s). You should lay out the crucial parts of your argument. It is often nice to finish the introduction with a road map which tells the reader the structure of the essay and gives very brief summary of your findings.

A Literature Survey

You should give an overview of the relevant literature. Explain how your work differs from and fits into of other research in this area

The Data Set

You should start describing the institutional details of the industry/market/country you are looking at. Do not, however, inundate the reader with facts; simply tell them the aspects that are relevant.

Give a brief description of the data set that you are using in the analysis. Be sure to give definitions of the variables you will be using. Report summary statistics. Provide tables and/or figures of the interesting features that you want to model with your data.

Datasets/Research Question

One of the most difficult decisions will be one of your first ones: what dataset should you use? Or even harder which research question? You should look at places like the UK Data Archive (or similar resource, such as the FRED dataset at the St. Louis Fed) – other useful sources of data are listed on the EC226 website under Coursework Material within the Sources for Data sets section. The UK Data Archive provides an enormous number of datasets that you can explore, but there are also other sources of data from different countries. The FRED data set is a collection of time series variables for the US and OECD countries and for time series/panel data this might be very useful as would be the IMF, World Bank.

Data is easily downloadable in Excel, and then it is straightforward to import this data into STATA/R (and both packages enables you to import other data formats). However, to figure out which series you need to download, you have first to decide which model and/or hypothesis tests you want to do. I strongly encourage you to try to find a dataset that allows you to investigate a topic of personal interest to the majority of members of your group.

Stata Do/R files

It is good practice for future project work if you get into the habit of writing all your instructions into a Stata DO/R file The reason for this is that in the very likely event you make a mistake you can easily rectify this by correcting the mistake and rerunning the code. Therefore you must submit your Stata DO/R file (along with your data and your project report).