Skip to content

akseljoonas/biotech-news-sentiment

Repository files navigation

Press Release Sentiment Analysis

In the U.S., 62% of adults own various investments like stocks, treasury bonds, or commodities. Information about public companies—such as earnings reports, product launches, and financial metrics—is widely accessible online.

Although these updates are often objective, they frequently include analysts' opinions, allowing for clear sentiment analysis. This project leverages Large Language Models to perform automated sentiment analysis, aiming to anticipate market reactions faster than competitors such as investment banks or private investors. Predicting market responses to news seeks to financially benefit the users through strategic buying or shorting of stocks.

🏃‍♂️ Running Source Code

🛠️ Set-Up

Clone the Repository: Start by cloning the repository to your local machine.

git clone https://github.com/akseljoonas/news-sentiment.git
cd news-sentiment

Tip

Before downloading the requirements as seen in the next step, we recommend creating a virtual environment and setting it up as there are a lot of dependencies in this project.

Make sure all dependencies are installed by running the following command:

pip install -r requirements.txt

🏋️‍♂️ Training the Models

To train the models, run the desired notebook from the notebooks folder. See the next section for more details.

🌳Project Structure

.
├── data                          <- Folder with the datasets developed by the team
│   ├── processed                 <- Ready-to-use-datasets
│   |    ├── finetuning_3_labels..<- Dataset with 3 labels used for training
│   |    ├── finetuning_5_labels..<- Dataset with 5 labels used for training
│   |    ├── news_prices-new-2    <- Dataset with labels and prices used for evaluation and label creation
│   |    └── rest                 <- legacy version of the datasets
│   └── raw                       <- Raw data (not recommended to use)
│   |    └── ...                  
├── notebooks                     <- Jupyter notebooks for analysis
│   ├── OLD-3-LABELS              <- Legacy file with discontinued functionality, not recommended to use
│   ├── fine_tuning_3_labels      <- File for full fine-tuning on 3 labels, main file used in research
│   ├── fine_tuning_5_labels      <- File for full fine-tuning on 5 labels (not used in research due to poor performance)
│   └── lora_tuning               <- PEFT tuning on 5 labels (not used in research due to low full tuning times)
├── papers                        <- Main papers we build upon + our own
│   ├── ...                       
│   └── THIS-RESEARCH-PAPER       <- the paper we wrote while working on the project
├── src/data_pipeline             <- Code used to process the datasets
│   ├── biotech_validated         <- File with stock market tickers of the companies used in the research
│   ├── ib_import_price           <- Used to import prices to our news articles from IBKR. Will not work for external users, but should be used as a reference.
│   ├── import_news               <- Used to import news articles into the CSV files. Will not work for external users, but should be used as a reference.
│   └── reformat_fine_tune        <- Used for early preprocessing such as topic pruning or labeling.
├── README.md                     <- Repository documentation
└── requirements.txt              <- Dependency list

🤗 Huggingface models used in this research:

  • Best f1 score: mrm8488/deberta-v3-ft-financial-news-sentiment-analysis
  • Best Gross Profit: ProsusAI/finbert
  • Base model: google-bert/bert-base-uncased
  • mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
  • nickmuchi/distilroberta-finetuned-financial-text-classification
  • ncbi/MedCPT-Article-Encoder
  • dmis-lab/biobert-v1.1
  • microsoft/deberta-v3-base
  • marcev/financebert
  • ahmedrachid/FinancialBERT-Sentiment-Analysis
  • yiyanghkust/finbert-tone
  • Narsil/finbert2
  • StephanAkkerman/FinTwitBERT-sentiment
  • nickmuchi/sec-bert-finetuned-finance-classification
  • FacebookAI/roberta-base

📈 Areas for improvement of the project

Read our future research section to find out how to improve the project :))

⚠️ Disclosure of copyright

Important

This research project has been conducted during the Language Technology Practical course at the University of Groningen but has been developed independently by the research team and hence should be referenced using the link of this GitHub repo. The team only allows for non-commercial use of the models and methodologies assuming standard citation practices. Furthermore, the users of our code take full responsibility for model outputs.

📚 Papers we recommend to read for the curious project viewer

💡 Interesting stuff

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •