Analyzing News Headlines with SpaCy

SpaCy wraps industrial-strength natural language processing capabilites into a Python library with an elegant and powerful API. The notebook in this repo demonstrates its use for Named Entity Recognition (NER) on a real world news dataset.

We take a public domain dataset of Reuters news headlines and use spaCy to extract named entities. We demonstrate three example downstream use cases:

investigating the organizations that appeared most often in Reuters in 2020
viewing the mentions of any given organization over time
inspecting which organizations appear in headlines together

Deploying on Cloudera Machine Learning (CML)

There are three ways to launch this notebook on CML:

From Prototype Catalog - Navigate to the Prototype Catalog in a CML workspace, select the "Analyzing News Headlines with SpaCy" tile, click "Launch as Project", click "Configure Project"
As ML Prototype - In a CML workspace, click "New Project", add a Project Name, select "ML Prototype" as the Initial Setup option, copy in the repo URL, click "Create Project", click "Configure Project"
Manual Setup - In a CML workspace, click "New Project", add a Project Name, select "Git" as the Initial Setup option, copy in the repo URL, click "Create Project".

Once the project has been initialized in a CML workspace, run the notebook by starting a Python 3 Jupyter notebook server session. All library and model dependencies are installed inline in the notebook.

Happy hacking!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs/images		docs/images
.project-metadata.yaml		.project-metadata.yaml
LICENSE.txt		LICENSE.txt
README.md		README.md
analyzing-headlines-with-spacy.ipynb		analyzing-headlines-with-spacy.ipynb
reuters_headlines.csv		reuters_headlines.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analyzing News Headlines with SpaCy

Deploying on Cloudera Machine Learning (CML)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

cloudera/CML_AMP_SpaCy_Entity_Extraction

Folders and files

Latest commit

History

Repository files navigation

Analyzing News Headlines with SpaCy

Deploying on Cloudera Machine Learning (CML)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages