Re-Assessment Details: Title: Assessment 2- Final project Style: Coursework and academic report Rationale: This assessment provides a unique opportunity for the students to develop an end-to-end project in social media analytics
Re-Assessment Details: Title: Assessment 2- Final project
Style: Coursework and academic report
Rationale: This assessment provides a unique opportunity for the students to develop an end-to-end project in social media analytics, starting from data collection and aiming to extract insights and drive conclusions. The project handles social media analytics lifecycle which mimics industry project's setup.
Description: Assessment 2 is a group assessment undertaken in pairs (groups of two students) which tests students' ability to analyze social media data using Natural Language Processing (NLP) techniques and statistical methods.
The deliveries for this assessment:
Final project code and report for both parts A and B. (65%) Challenge Title
Understanding Sentiment, Topics, and Influence in Construction Discourse
Background and Motivation
Online social media platforms such as Reddit have become important spaces where professionals, practitioners, and the public discuss complex socio-technical issues. In the construction domain, Reddit users frequently discuss topics such as cost overruns, delays, regulation, sustainability, and the adoption of AI and digital technologies.
However, these discussions are:
Learning Outcomes to be Assessed:
Utilize various Application Programming Interface (API) services to collect data from different social media sources. Conduct basic social network and statistical analysis to render network visualisations and to understand network characteristics. Derive insights and discover patterns in structured social media data using methods such as correlation, regression, and classification. Extrapolate and analyse trends in unstructured-text data using natural language processing methods such as sentiment analysis and topic classification Large-scale and unstructured Emotionally charged Socially influenced, where certain users shape narratives more than others The challenge is to determine what people are talking about, how they feel, and who influences the conversation, using computational social media analytics techniques.
Solving this challenge helps organisations and researchers:
Understand public and professional sentiment Identify emerging themes and concerns Detect influential voices and communities Reflect on the societal implications of digital discourse Challenge Questions Students should frame their analysis around the following core challenge questions:
What are the dominant topics discussed in construction-related Reddit communities? What sentiments and emotions are associated with these topics, and how do they vary across time and communities? Who are the most influential users shaping these discussions, and how are communities structured? (Advanced / distinction level) How do LLM-based interpretations compare with traditional statistical and NLP-based findings? This assessment adopts a Data Study Group-inspired approach, in which ethical framing, responsible data use, and reflective practice are treated as integral components of the analytical process rather than as post-hoc considerations.
Dataset Data Provided
Students will work with a Reddit dataset containing approximately 10,000 comments, collected from construction-related subreddits.
The dataset includes (but is not limited to):
body – comment text author – Reddit username created_utc – timestamp subreddit – community name score – upvotes/downvotes comment_id, parent_id, link_id – discussion structure Data is collected programmatically using a provided Reddit API script, demonstrating ethical and reproducible data access.
Data Considerations
Data is publicly available Usernames must not be deanonymized Ethical use and limitations of social media data must be discussed The Challenge Tasks (Assessment Structure) The challenge is divided into two parts, aligned with CMP7202 learning outcomes.
Part A: Statistical analysis Challenge Task A1: Data Collection and Exploration
Students must:
Run the provided Reddit data collection script Load and inspect the dataset Perform descriptive statistical analysis (volume, activity, scores, time trends) Challenge Task A2: Network and Graph Analysis
Students must:
Construct a user interaction graph Apply centrality measures to identify influential users Detect and visualise communities Interpret what network structure reveals about discourse dynamics Challenge focus:
Influence is not measured by opinion alone, but by position in the network.
Students must support their statistical and network analysis with appropriate visualizations, such as temporal plots, distributions, and network graphs. All key analytical findings in Part A must be visually represented and clearly explained.
Part B – Understanding Meaning and Emotion Challenge Task B1: Sentiment Analysis
Students must:
Apply sentiment analysis to Reddit comments Analyse sentiment distribution: overall by subreddit over time Discuss limitations of automated sentiment detection Challenge Task B2: Topic Modelling
Students must:
Apply topic modelling (LDA, NMF, or BERTopic) Identify and label dominant themes Analyse how topics evolve and co-exist Challenge focus:
Topics are not fixed — they emerge, shift, and overlap.
Challenge Task B3: LLM-Assisted Interpretation (Advanced) Students may use LLMs to:
Label topics Summarise clusters of discussion Reflect on framing, stance, or narrative patterns LLM use must be:
Clearly documented Critically evaluated Used for analysis, not report writing All analyses in Part B must be accompanied by clear and interpretable visualisations, including but not limited to sentiment distributions, topic frequency plots, topic evolution over time, and thematic representations. Visualisations should be used to support interpretation and discussion, not merely presented without explanation.
Expected output:
Students must submit a single PDF report of a maximum of 2,000 words that presents the challenge background, methodology, results, discussion, limitations, and conclusion. In addition, students must submit an executable codebase or notebook that demonstrates the data collection process, analytical methods applied, and the visualisations used to support the findings.
For advice on writing style, referencing and academic skills, please make use of the Centre for Academic Success: Centre for Academic Success - student support | Birmingham City University (bcu.ac.uk)
Workload: 30 hours for 2000 words report and a presentation of 1000 words.
Transferable skills: The student will benefit from doing these assessments in developing both technical and transferable skills, which include:
Problem solving Programming skills Analytical skills Time management Project management Verbal and written communication skills Marking Criteria