Research Projects for MSc AI and Data Science

Current cohort: Sep 2023

Welcome to the MSc research project on urban big data analytics, I'm delighted to have you on board as we delve into intriguing topics in urban science and crime science and I believe you will have confidence in your proficiency across most aspects of urban big data analytics and meet the requirements for the research reports upon completing this learning journey. While uni emphasizes the importance of avoiding academic misconduct each academic year, I must also reemphasise that any form of misconduct is unacceptable for this project. Please ensure thorough preparation and patience as you focus on each stage, which will lead to the completion of the report with high quality, ultimately contributing to the attainment of your MSc degree.

1 Research proposal

A research proposal outlines a roadmap for the research project, including the objectives, methodology, timeline, and expected outcomes. In urban analytics, a research proposal aims to delve into urban issues by employing data science and AI methodologies to extract valuable insights pertaining to the multifaceted nature of urban environments.

Here are the typical sections included in a research proposal:

Specifically, Table 1 lists all the important research elements for research proposals in urban analytics. Please incorporate all the elements in your research proposal and clearly deliver this table as Appdedix in the proposal.

Table 1 The checklist of the important elements in research proposal.

Element Description Example
Data type Data types refer to the formats of information collected. In urban science, data can be categorised into various types for different intentions. For example, Crime data can be collected by crime survey data, policing recorded data, sel-report data and so on. Urban mobility data can be categorised into mobile phone call detail records, mobile phone GPS data, underground smart card data, WiFi data, social media data and so on. Underground smart card data, policing recorded data
Data resource Data resources are the various sources you collected the data under different usage licences (e.g., education and research licenses). Please read the data use policy carefully if you get access to the open data. London datastore
Independents/features/predictors(X) Features or independent variables are the attributes of the data that are used to predict the outcome. Population mobility variables (measured by travel behaviours from smart card data)
Dependents/targets/responses(y) Target variable or dependent variable predict or understand based on the feature variables. For supervised learning tasks, y typically consists of the labels or responses (e.g., can be a column) associated with each set of feature variables in the dataset. Theft counts
Spatial unit of analysis The spatial unit of analysis refers to the geographic level or scale in the research analysis. It defines the spatial resolution or granularity in understanding spatial patterns. Clearly defining the geographical unit of analysis can help to avoid the Ecological Fallacy in the research findings. Lower Super Output Area (LSOA)
Temporal unit of analysis Temporal unit of analysis refers to the time scale or interval in analysis, e.g., the examination of temporal patterns, trends, or relationships. It can categorised into hourly, daily, weekly, monthly or yearly. In some prediction tasks, it emphasises the predicting power in the time scale of the trained model, e.g., the model can predict the next week (week-level) for each LSOA. Monthly
Study area and city It means the specific urban areas of some select cities as the case study in the analysis, e.g., City of London areas in Greater London. All urban areas in Greater London
Observation period It means the temporal period of the observation in the experimental analysis, e.g., the observation period covers 2021 to 2022 (two years). 2021 year
Model/method The main method/model will be used or trained for solving the research questions, such as some statistical models or machine learning models. Random Forest regressor

The starting date of the research project for this cohort is May 30 2024.

2 Recommended reading

The recommended reading section offers review papers and empirical works to aid in grasping the fundamental concepts, methodologies, and data types employed within specific topics. However, due to the proliferation of advanced methods (especially AI and Data Science) and emerging data types in these hot research topics, it may not cover all relevant literature in each particular research domain. It is strongly advised to further explore specific topics through Google Scholar or the university library for comprehensive understanding. To clarify, each number following the literature reference corresponds to the related topics index, which can be located in the appendix section.

3 Data sources

Crime data

Nowadays, crime data is readily available through numerous resources for various purposes. Several resources from different cities in the UK and US are listed below for your reference. Please take note of the data usage license and consider the data quality, particularly regarding spatial and temporal resolution issues, as detailed in the provided resources. Additionally, some city open data portals offer additional urban data that can be linked to the crime data for further analysis.

Urban data

Urban data is accessible through various resources, encompassing socio-economic data, population statistics, transportation data, geographical boundaries, and other urban environmental data. Below are several urban data repositories. Please be aware that some data sources require registration and obtaining an educational license for usage. You are encouraged to delve deeper into additional data resources or utilise your own data sets.

4 Analysing tools

Several tools can be employed for analyzing urban data, particularly focusing on geospatial and temporal data processing and modelling. Selected Python packages and software are provided for your reference:

5 Project management

It is highly recommended to utilise GitHub for project management and code writing with version control. Further guidance on GitHub usage can be found at Github Docs. You can also find some online courses at Linkedin Learning, Udemy or Coursera. The simplest method involves utilizing GitHub Desktop to commit and push your local Jupyter Notebook project.

Appendix

Table A1 The information on current research topics

No Title Description
1 Geospatial analysis for urban crimes Geospatial analysis for urban crimes can help to develop tailored place management strategies to prevent potential crime in urban areas. This project focuses on detecting crime patterns (e.g., crime hotspots and concentration) or exploring how urban socio-economic and environmental factors influence crime patterns in urban areas approached by geospatial analysis and techniques, from statistical geospatial models to advanced machine learning models.
2 Spatio-temporal analysis for urban crimes Comprehending spatial and temporal patterns of urban crime is essential not only for unravelling the mechanisms behind when and where crimes occur but also for gaining valuable insights to formulate targeted intervention strategies. Spatio-temporal analysis for urban crimes aims to employ advanced data science methods to investigate the spatial and temporal patterns of crimes in urban areas. In the context of urban complexity, it can help to identify specific crime patterns such as near-repeat victimisation in burglary (via spatiotemporal clustering) in neighbourhood areas.
3 Crime prediction using machine learning/ deep learning This project aims to develop an advanced crime prediction framework/method by leveraging advanced machine learning and deep learning techniques. By integrating historical crime data with relevant socio-economic and environmental factors, the project seeks to enhance the accuracy and efficiency of crime prediction in space and time.  The implementation will be designed for existing law enforcement systems to improve effective public safety.
4 Causal inference for crime pattern shifting This project aims to delve into causal inference with a specific focus on understanding and addressing crime pattern shifting or replacement across urban areas. Crime patterns can undergo shifts due to various factors such as changes in law enforcement interventions (e.g., policing patrolling), and disruption of social conditions (e.g., pandemic, natural hazards). By revealing the interactional associations between neighbourhood characteristics and the resulting shifts/replacements in crime patterns via causal inference model, it can provide insightful intervention assessments/strategies for public policy.
5 Exploring the urban mobility patterns using big data analytics Understanding and optimising mobility are crucial for sustainable and efficient modern urban development. The objective of this project is to explore the complex patterns in urban mobility by analysing diverse big data sources, such as public transportation records (e.g., smart card data), social media big data and mobile phone big data. Other tasks of this project can focus on identifying what key factors (urban facilities and functional land use) influence the population’s mobility patterns (e.g., commuting behaviours), or predicting the volume of the population’s mobility trends (e.g., origin and destination flows) across different urban areas. Employing alternative methods such as machine learning and geospatial analysis will be pivotal in extracting meaningful insights for urban planning or public resource management.
6 Exploring urban inequality using big data analytics The project aims to reveal the multifaceted dimensions of urban inequality through the lens of advanced big data analytics. With the development of advanced geospatial analytic techniques, it is possible to utilise emerged big data sets to explore various inequality topics in the urban context, such as the inequality of residents’ travelling behaviours across different neighbourhood areas, or the accessibility of greenspace of residents across neighbourhood areas. The main task focuses on uncovering the spatial distribution of disparities across UK city areas (or region areas) and underlying the factors contributing to specific types of urban inequality (for building the prediction model).  To achieve these goals, alternative methods such as machine learning algorithms, and spatial network/graph analysis will be explored to enhance the accuracy of the findings, ensuring a nuanced examination of urban inequality in contemporary societies.
7 Urban transport analytics The project aims to analyse the heterogeneity in urban transportation demand, usage/ridership, or model choices across urban areas to understand the diverse patterns of mobility within urban settings. By employing advanced geospatial data analytics and machine learning techniques, different types of transport mode patterns (e.g., taxi, cycling, public transportation) can be sensed, visualised, and analysed from various geo big data, such as smart card data, bike-sharing docking station data and mobile phone data.
8 Urban traffic accident analytics This project aims to analyse and predict the traffic accidents (car, bike or pedestrian involved) within urban areas. With the increasing complexities of modern urban transportation networks, this project seeks to employ advanced AI techniques to extract meaningful insights from vast datasets related to traffic accidents. The primary task is to identify patterns, trends, and contributing factors (e.g., road features) leading to traffic accidents, thereby facilitating evidence-based decision-making for city planners. As an alternative to the traditional method, this project focuses on the integration of explainable machine learning and geospatial analysis to provide a comprehensive and dynamic approach to accident prevention and contribute to a safer urban environment.
9 Evaluating urban vitality /vibrancy using geo-big data This project aims to evaluate the urban vitality across retail areas (or high streets) through the footfall traffic sensed from geo-big data. The primary task of this project is to explore the daily rhythms of vitality (represented by footfall traffic at place venues) across different urban land use areas. Second, it seeks to identify key factors influencing the vitality, encompassing aspects such as economic revitalization, social cohesion, and environmental sustainability. To achieve this objective, alternative research methods, including spatial and temporal analyses, and machine learning models will be explored to provide a comprehensive understanding of the dynamics of urban areas.
10 Sensing urban functions through big data analysis In the context of dynamic urban environments, this project endeavours to employ AI tools to sense and comprehend various urban functions from geo big data.  Against the backdrop of rapidly evolving cities, understanding the intricate interplay of urban population interacting with diverse functions in the urban landscape is crucial for effective urban planning and management. The primary objective of this project is to detect and portray the dynamic urban function zones from human activity patterns sensed from geo big data (e.g., social media data, mobile phone data, smart card data, street view data and remote sensing data).

© 2024 Tongxin Chen. All rights reserved.

Contact: Tongxin.Chen@hull.ac.uk