Skip to content

This Python project was used to deploy several machine learning classifiers on the MMTF-14K dataset, which consists of visual, audio and metadata of films. The aim of the task was to predict the film's genre based off this data. See the report PDF for further detail.

Notifications You must be signed in to change notification settings

jockharkness/Movie-Genre-Classification---COMP90049-Introduction-to-Machine-Learning-Assignment-2

Repository files navigation

{\rtf1\ansi\ansicpg1252\cocoartf2513
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww6780\viewh11380\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0

\f0\fs24 \cf0 Below is a description for each of the files submitted:\
\
\
get_data.py: extracts the data from the tsv files\
\
baseline.py: performs the Zero-R baseline\
\
feature_extraction.py: this is where the preprocessing occurs. The datasets are concatenated, and processing is performed on the joint dataset as to make the vectorisation consistent. The tags and titles features are lemmatised and stop words are removed. A there is a getter for each feature used in analysis\
\
classifiers.py: this file was used in the preliminary testing of the features. The program iterates through four classifiers, outputting results for each of them.\
\
decisiontree.py:   this file contains the code to implement the decision tree and its respective testing. It also contains the code for the pruning component. The figures plot the effects of pruning on accuracy.\
\
neuralnet_gridsearch.py:   this file was used when testing which parameters may increase accuracy for the MLP classifier. The CV GridSearch functionality from the sklearn package was used to iterate through different parameter settings.\
\
\
}

About

This Python project was used to deploy several machine learning classifiers on the MMTF-14K dataset, which consists of visual, audio and metadata of films. The aim of the task was to predict the film's genre based off this data. See the report PDF for further detail.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages