Multivariate Data Analysis Instructions: One of the more interesting topics for laypeople and academics alike is adolescent substance use. Suffice it to say,

Julian Cross

23 Apr 2026 • 7 min read

Multivariate Data Analysis

Instructions:

One of the more interesting topics for laypeople and academics alike is adolescent substance use. Suffice it to say, illuminating the many covariates of this behavior is a topic of great relevance and concern for scholars, community leaders, and most importantly, parents. Based on this consideration, you have been tasked wish uncovering the most consequential correlates of adolescent substance use.

There is probably no better inventory of adolescent substance use in the United States than the Monitoring the Future (MTF) survey – collected by sociologists and criminologists at the University of Michigan’s Institute for Social Research. This time-series study (first wave of data collected in the 1970’s) follows trajectories in substance use, along with the many factors related to this behavior. Now that UNA is an ICPSR member institution, we have access to leading data sets in the field of sociology, political science, and criminology – including MTF. In this project, you will directly analyze data from one of the most recent waves of the 12 grade sample of the MTF. In particular, you will be looking for causal relationships between a host of independent/control variables, and the frequency of adolescent substance abuse, across 7 different drug types.

Going back to the 1960’s – and perhaps even beyond – there has been a pervasive fear of the encroachment of drug use into the suburbs. The fear of free thought and the Hippie movement bred an active distrust of certain types of music – and even certain brands of politicians. One would imagine that the left-leaning tendencies of progressives and liberals would possess a more permissive view of drug use, and in turn, will be more likely to engage in a variety of personal drug use. Hence, the opening assumption that students will examine is that individuals that identify themselves as politically liberal will be more likely to use drugs.

That being said, perhaps there is more to the story than one’s personal leanings. For instance, your instructor has an undying passion for the scientific study of religion and personal religiosity. Both his Master’s Thesis and Doctoral Dissertation positioned religiosity as a potential correlate of criminal behavior (the former), and a conditioning factor or buffer in the relationship between negative life events and criminal coping (the latter). Unfortunately, both projects led to the ultimate conclusion that religiosity isn’t particularly consequential when it comes to the prevention of serious criminal behavior, but perhaps it does have more of an impact on acts that are somewhat analogous to crime (e.g., sexual deviance, suicidal tendencies, substance use). I passionately contend that if religiosity in fact “matters”, it is most likely to matter here. Thus, your instructor strongly contends that religiosity will add a degree of explanatory power to the regression model predicting frequency of drug use across 7 different types of drugs. There will be a host of other demographic (e.g., sex, race, paternal education – a proxy for social class) and theoretically relevant (e.g., religiosity, school abilities, employment) control variables added to the model.

So, let’s get busy. I have provided students with a reduced version of the 2012 MTF survey in Canvas (Module V/Class Projects/reduced MTF for class project VIII). Students will be using this data to answer a series of questions relating to output on two separate regression equations. The first equation will be a simple regression between political ideology and drug use frequency; and the second will be a fully nested model predicting drug use with the following control variables: sex, race, father’s education, religiosity (church attendance frequency), school abilities, and hours worked. A description of the variables that you will be using is provided below.

R_DRUG_FREQ: This is a measure of the frequency of drug use across seven different types of drugs (pot; LSD; other hallucinogens, like PCP/shroms; amphetamines; barbiturates or sedatives; tranquilizers like Xanax or Valium; narcotics like cocaine, heroin) over the past year. Higher scores indicate greater frequency.

POLITICS: 6-item measure of respondent’s political views (1 = Extremely Conservative, 6 = Radical Liberal).

V2169: This is a measure of respondent’s religiosity (attendance at religious service), with scores ranging from 1 = NEVER, to 4 = ONCE A WEEK OR MORE.

CONTROLS

V2150: Sex of Respondent (1 = Male, 2 = Female)

NONWHITE: Dummy Variable for Nonwhite (0 = “White”, 1 = “Nonwhite”

PA_EDUC: Highest Education attained by Father (1 = “Grade School, 6 = “Grad School”)

V2173: Respondent’s assessment of school abilities, compared to others (1 = “Far Below Average….7 = “Far Above Average”

V2191: On Average, How many hours per week do you work during the school year? (1 = “None”, 8 =”Over 30 Hours”

Exercise I

Students will first be asked to perform a baseline, simple regression between political ideology (POLITICS) and frequent drug use (R_DRUG_FREQ). Since you will be performing a fully specified/nested model after this simple regression, students should choose the NEXT option in the regression window (above and slightly to the right of the Independent box in regression).

Based on the regression output, students are to answer the following questions.

1. What is the independent and dependent variable in this regression equation?

2. Does the simple regression model significantly predict our dependent variable? How were you able to determine this – i.e., what statistical test did you use to make this decision?

a. How good of a “fit” does our simple regression model offer of our dependent variable? Offer a specific explanation to this answer (interpret how good of a fit this model offers).

b. What did you use (coefficient) to make this determination?

3. Does the independent variable significantly predict the dependent variable?

a. Indicate the direction and strength of this association.

b. What coefficients did you use to determine strength?

4. What can you say about the relationship between political orientation and frequent drug use?

a. Based on the analysis performed (simple regression), can we say this relationship is causal in nature? Why or why not?

Exercise II

Due to the methodological limitations of your first regression equation (it was simple, after all), you now wish to offer a more comprehensive explanation of the correlates of frequent drug use – and to additionally test the hypothesis that the relationship between political ideology and frequent drug use is causal in nature – by performing a fully specified regression model that contains a host of theoretically relevant and demographic controls. You also wish to simultaneously assess your professor’s claims that religiosity will only significantly affect less serious, analogous measures of crime/deviance – e.g., drug use. In particular, your job is to (a) assess whether adding these theoretical/demographic control variables to the model significantly improves the ability of our model to predict or fit frequent drug use; (b) to assess what those significant predictors are – and their direction; and lastly (c) to assess the strongest predictors of frequent drug use, based on the regression results. In order to do this, students need to be sure to use this as block 2 of their regression analysis (the simple regression was block 1). Also, students should select the option R-Square Change in the statistics tab of the regression analysis. (Hint: Students are more than welcome to use the PowerPoint presentation that was delivered in class – Chapter 16 on Multiple Regression – to help you out with the proper procedures to be used here).

Based on your regression output, please answer the following questions?

1. Does the fully specified regression model significantly predict our dependent variable? How were you able to determine this – i.e., what statistical test did you use to make this decision?

a. How good of a “fit” does our fully-specified regression model offer of our dependent variable? Offer a specific explanation to this answer (interpret how good of a fit this model offers).

b. What did you use (coefficient) to make this determination?

2. Does this full regression model, with all of the demographic (race, sex, paternal education) and theoretical controls (religiosity, employment, intelligence) SIGNIFICANTLY (Remember the R-Square change test) improve our ability to predict the dependent variable, when compared to the simple regression model in Exercise I?

a. Offer a full explanation of this significance in change (if relevant) in the two regression models.

b. What coefficients/significance tests did you use to determine strength?

3. What variables SIFNIFICANTLY predict frequent drug use?

a. Specify the direction of these relationships – i.e., talk a little about the relationship.

b. Discuss the coefficients used to make this determination.

4. Discuss the STRENGTH or substantive significance of the predictors.

a. i.e., which variable exerts the strongest effect on frequent drug use?

b. What coefficients did you use to make this determination?

5. Now that you have done the relevant analyses, speak to the primary predictors of frequent drug use, offering justification for your choices.

a. How important is political orientation?

b. What about your professor’s fascination with religiosity as an inhibitor of ascetic deviance – is he on to something?

6. Based on the analysis performed (simple regression), can we say these relationships properly address nomothetic causal criterial #3, and are causal in nature? Why or why not?

Format:

Assignments MUST be typed, with 1” margins throughout the document. All students must use 12-point, Times New Roman Font. Students should single-space within each answer (including the letters – e.g., 1a, 1b), but double space between answers (e.g., 1, 2, 3).

Sign up for more like this.