Skip to content

rendzina/BigDataAndVisualisation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data and Visualisation Project

Overview

This repository contains comprehensive examples and tutorials for the Big Data and Visualisation module at MK:U (Milton Keynes University). The project demonstrates various big data processing techniques using different platforms and tools, including Apache Spark, MongoDB, and various visualisation libraries.

Course Information: MK:U Apprenticeships - Big Data and Visualisation

Project Structure

This repository is organised into several key directories, each focusing on different aspects of big data processing and visualisation:

📁 Colab/ - Google Colaboratory Notebooks

Interactive Jupyter notebooks designed to run in Google Colab environment, featuring:

  • Spark data processing examples
  • Environmental data analysis
  • Geographic mapping and visualisation
  • API data integration

📁 HDInsight/ - Microsoft Azure HDInsight Notebooks

Specialised notebooks for Azure HDInsight clusters, including:

  • Spark-based data processing solutions
  • Cloud-based big data analytics
  • Enterprise-grade data processing workflows

📁 Python/ - MongoDB and Python Integration

Local Python development environment featuring:

  • MongoDB database operations
  • Noise mapping data analysis
  • Environmental data processing
  • Database querying and visualisation

📁 Zeppelin/ - Apache Zeppelin Notebooks

Apache Zeppelin notebook examples for:

  • Interactive data analysis
  • Real-time data processing
  • Collaborative data science workflows

Key Features

  • Multi-Platform Support: Examples for Google Colab, Azure HDInsight, and local development
  • Real-World Data: Practical examples using environmental, property, and fuel price datasets
  • Interactive Visualisations: Maps, charts, and graphs using various plotting libraries
  • Database Integration: MongoDB operations and data persistence
  • API Integration: Real-time data fetching and processing
  • Educational Focus: Step-by-step tutorials with comprehensive documentation

Getting Started

Prerequisites

  • Python 3.7+: Required for local development
  • MongoDB: For database examples (local installation)
  • Google Colab Account: For cloud-based notebooks
  • Azure Subscription: For HDInsight examples (optional)
  • Apache Zeppelin: For Zeppelin notebook examples (optional)

Quick Start

  1. For Google Colab: Navigate to the Colab/ directory and open notebooks directly in Colab
  2. For Local Development: Set up the Python environment in the Python/ directory
  3. For Azure HDInsight: Use notebooks from the HDInsight/ directory
  4. For Zeppelin: Import notebooks from the Zeppelin/ directory

Educational Objectives

This project supports learning objectives in:

  • Big Data Processing: Apache Spark, data transformation, and analysis
  • Data Visualisation: Creating meaningful charts, graphs, and maps
  • Database Operations: MongoDB integration and querying
  • Cloud Computing: Working with cloud-based big data platforms
  • Real-Time Data: API integration and streaming data processing

Contributing

This is an educational project designed for students at MK:U. Contributions that enhance learning outcomes are welcome, including:

  • Additional examples and tutorials
  • Improved documentation
  • Bug fixes and code improvements
  • New visualisation techniques

License

This project is for educational purposes. Please ensure you have appropriate permissions for any external data sources used.

Author

S. Hallett
Course: MK:U, Big Data and Visualisation
Date: 18/06/2025


This project uses UK spelling conventions throughout and follows PEP 8 coding standards for Python code.

About

Matters pertaining to the Big Data and Visualisation module on the MK:U Data Scientist course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •