This project was part of the Computer Engineering and Informatics Department (CEID) of University of Patras curriculum.
The goal of this project was to write queries for a big dataset and calculate the time elapsed for each query to return the results. The queries run in a local machine and a virtual one (with different configurations) that was setup by the University. Apache Spark was used to execute the queries. PySpark was used to write the queries.
- Java
- Python
- Pyspark
- Apache Spark
- Jupyter Notebook