GitHub - jockharkness/Movie-Genre-Classification---COMP90049-Introduction-to-Machine-Learning-Assignment-2: This Python project was used to deploy several machine learning classifiers on the MMTF-14K dataset, which consists of visual, audio and metadata of films. The aim of the task was to predict the film's genre based off this data. See the report PDF for further detail.

jockharkness / Movie-Genre-Classification---COMP90049-Introduction-to-Machine-Learning-Assignment-2 Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

This Python project was used to deploy several machine learning classifiers on the MMTF-14K dataset, which consists of visual, audio and metadata of films. The aim of the task was to predict the film's genre based off this data. See the report PDF for further detail.

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
COMP90049_2020S1_proj2-spec (1).pdf		COMP90049_2020S1_proj2-spec (1).pdf
README.rtf		README.rtf
README_dataset		README_dataset
Report.pdf		Report.pdf
baseline.py		baseline.py
classifiers.py		classifiers.py
dataprocessor2.py		dataprocessor2.py
decisiontree.py		decisiontree.py
feature_extraction.py		feature_extraction.py
get_data.py		get_data.py
neuralnet_gridsearch.py		neuralnet_gridsearch.py
precision_recall.py		precision_recall.py
test_features.tsv		test_features.tsv
train_features.tsv		train_features.tsv
train_labels.tsv		train_labels.tsv
valid_features.tsv		valid_features.tsv
valid_labels.tsv		valid_labels.tsv

Repository files navigation

{\rtf1\ansi\ansicpg1252\cocoartf2513
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww6780\viewh11380\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0

\f0\fs24 \cf0 Below is a description for each of the files submitted:\
\
\
get_data.py: extracts the data from the tsv files\
\
baseline.py: performs the Zero-R baseline\
\
feature_extraction.py: this is where the preprocessing occurs. The datasets are concatenated, and processing is performed on the joint dataset as to make the vectorisation consistent. The tags and titles features are lemmatised and stop words are removed. A there is a getter for each feature used in analysis\
\
classifiers.py: this file was used in the preliminary testing of the features. The program iterates through four classifiers, outputting results for each of them.\
\
decisiontree.py:   this file contains the code to implement the decision tree and its respective testing. It also contains the code for the pruning component. The figures plot the effects of pruning on accuracy.\
\
neuralnet_gridsearch.py:   this file was used when testing which parameters may increase accuracy for the MLP classifier. The CV GridSearch functionality from the sklearn package was used to iterate through different parameter settings.\
\
\
}