This repository contains comprehensive examples and tutorials for the Big Data and Visualisation module at MK:U (Milton Keynes University). The project demonstrates various big data processing techniques using different platforms and tools, including Apache Spark, MongoDB, and various visualisation libraries.
Course Information: MK:U Apprenticeships - Big Data and Visualisation
This repository is organised into several key directories, each focusing on different aspects of big data processing and visualisation:
Interactive Jupyter notebooks designed to run in Google Colab environment, featuring:
- Spark data processing examples
- Environmental data analysis
- Geographic mapping and visualisation
- API data integration
Specialised notebooks for Azure HDInsight clusters, including:
- Spark-based data processing solutions
- Cloud-based big data analytics
- Enterprise-grade data processing workflows
Local Python development environment featuring:
- MongoDB database operations
- Noise mapping data analysis
- Environmental data processing
- Database querying and visualisation
Apache Zeppelin notebook examples for:
- Interactive data analysis
- Real-time data processing
- Collaborative data science workflows
- Multi-Platform Support: Examples for Google Colab, Azure HDInsight, and local development
- Real-World Data: Practical examples using environmental, property, and fuel price datasets
- Interactive Visualisations: Maps, charts, and graphs using various plotting libraries
- Database Integration: MongoDB operations and data persistence
- API Integration: Real-time data fetching and processing
- Educational Focus: Step-by-step tutorials with comprehensive documentation
- Python 3.7+: Required for local development
- MongoDB: For database examples (local installation)
- Google Colab Account: For cloud-based notebooks
- Azure Subscription: For HDInsight examples (optional)
- Apache Zeppelin: For Zeppelin notebook examples (optional)
- For Google Colab: Navigate to the
Colab/
directory and open notebooks directly in Colab - For Local Development: Set up the Python environment in the
Python/
directory - For Azure HDInsight: Use notebooks from the
HDInsight/
directory - For Zeppelin: Import notebooks from the
Zeppelin/
directory
This project supports learning objectives in:
- Big Data Processing: Apache Spark, data transformation, and analysis
- Data Visualisation: Creating meaningful charts, graphs, and maps
- Database Operations: MongoDB integration and querying
- Cloud Computing: Working with cloud-based big data platforms
- Real-Time Data: API integration and streaming data processing
This is an educational project designed for students at MK:U. Contributions that enhance learning outcomes are welcome, including:
- Additional examples and tutorials
- Improved documentation
- Bug fixes and code improvements
- New visualisation techniques
This project is for educational purposes. Please ensure you have appropriate permissions for any external data sources used.
S. Hallett
Course: MK:U, Big Data and Visualisation
Date: 18/06/2025
This project uses UK spelling conventions throughout and follows PEP 8 coding standards for Python code.