Skip to content
View anopsy's full-sized avatar

Organizations

@narwhals-dev

Block or report anopsy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
anopsy/README.md

Hi ๐Ÿ‘‹, I'm Magdalena Kowalczuk

data & ml fan with a soft spot for OSS

Driven by curiosity and a passion for solving real-world problems with data. Comfortable across the full data lifecycle โ€” from ingestion and preprocessing to modeling, deployment, and insight delivery. Proficient in Python for data analysis and machine learning, with experience in cloud platforms (AWS, Azure, GCP) and data visualization for clear communication.

Currently seeking a Data Scientist or Data Engineering role where I can apply my analytical, programming, and communication skills.


๐Ÿšง Iโ€™m currently working on

๐Ÿ›ฐ๏ธ๐Ÿค– Flow-Based Bot Detection Pipeline

Building a scalable pipeline for processing and analyzing network flow data, with a focus on anomaly detection and bot activity.


๐Ÿ”ง Key Components

  • โš™๏ธ Ingest and sample large network datasets with Polars
  • ๐Ÿงฑ Transform raw flow logs into feature-rich tabular format
  • ๐Ÿ› ๏ธ Develop modular ETL pipeline for local or streamed flow data
  • ๐Ÿง  Integrate anomaly detection and classification models
    (e.g. Isolation Forest, LOF, Random Forest, LGBM)
  • ๐Ÿงช Evaluate under real-world class imbalance

๐Ÿง  Skills Showcased

  • ๐Ÿ“Š Working with large tabular datasets
  • ๐Ÿงฎ Handling class imbalance in cybersecurity contexts
  • ๐Ÿ” Anomaly detection techniques
  • ๐Ÿ“ˆ Practical ML evaluation
  • ๐Ÿงฐ Prototyping realistic data pipelines

๐ŸŒฑ Iโ€™m currently learning ๐Ÿปโ€โ„๏ธ Polars

and that's Ritchie Vink - creator of Polars with my graffiti:

Ritchie Vink

๐ŸŽจ Selected Projects
โ”ฃโ”โ” Data Science Content Intern at NannyML:
โ”ƒ   โ”ฃโ”โ” ๐Ÿ“ˆPost-Deployment Data Science blogs
โ”ƒ   โ”ƒ   โ”ฃโ”โ” ๐Ÿ“‰Data Quality and Covariate Shift 
โ”ƒ   โ”ƒ   โ”—โ”โ” ๐ŸŒ€Models aren't Forever 
โ”ƒ   โ”ฃโ”โ” contributed to the Research team on Anomaly Detection by evaluating multiple detection algorithms and generating synthetic datasets  
โ”ƒ   โ”—โ”โ” contributed to docs  
โ”ƒ 
โ”ฃโ”โ” PyData and PyLadies Con speaker and volunteer at:
โ”ƒ   โ”ฃโ”โ” ๐Ÿ’ฝPyData Amsterdam 2024 Talk-Alice in Open Source Land
โ”ƒ   โ”ฃโ”โ” ๐Ÿค–PyLadiesCon 2024 Talk
โ”ƒ   โ”—โ”โ” ๐ŸƒPyData Open Source Sprint 
โ”ƒ 
โ”ฃโ”โ” Contributed to OSS at:
โ”ƒ   โ”ฃโ”โ” ๐Ÿงฑscikit-lego
โ”ƒ   โ”ƒ   โ”ฃโ”โ” contributed to docs  
โ”ƒ   โ”ƒ   โ”—โ”โ” made ColumnSelector dataframe agnostic using Narwhals 
โ”ƒ   โ”—โ”โ” ๐Ÿณ๐Ÿฆ„narwhals 
โ”ƒ   โ”ƒ   โ”ฃโ”โ” worked on pyarrow/dask backend implementation  
โ”ƒ   โ”ƒ   โ”—โ”โ” contributed to docs and tests   
โ”ƒ   โ”—โ”โ” ๐Ÿ’กembetter
โ”ƒ       โ”ฃโ”โ” deprecated a method  
โ”ƒ       โ”—โ”โ” added pre-commit hooks  
โ”ƒ 
โ”ฃโ”โ” Juniors_vs_ChatGPT 
โ”ƒ   - Did ChatGPT replaced Juniors and Interns? 
โ”ƒ   โ”ฃโ”โ” data cleaning
โ”ƒ   โ”ฃโ”โ” data wrangling
โ”ƒ   โ”ฃโ”โ” data analysis
โ”ƒ   โ”ฃโ”โ” modeling
โ”ƒ   โ”—โ”โ” python๐Ÿ/API/polars๐Ÿปโ€โ„๏ธ/hvplot๐Ÿ“Š
โ”ƒ 
โ”ฃโ”โ” Compensation Prediction 
โ”ƒ   - How much do Engineers earn? 
โ”ƒ   โ”ฃโ”โ” data modeling
โ”ƒ   โ”ฃโ”โ” model evaluation
โ”ƒ   โ”ฃโ”โ” containerization using docker
โ”ƒ   โ”ฃโ”โ” building streamlit app
โ”ƒ   โ”—โ”โ” python๐Ÿ/scikit-learn/streamlit๐Ÿ“ˆ/docker๐Ÿ“ฆ
โ”ƒ  
โ”ฃโ”โ” MaskMap: Decoding the Hidden Spectrum  
โ”ƒ   - Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking
โ”ƒ   โ”ฃโ”โ” data scraping
โ”ƒ   โ”ฃโ”โ” data cleaning
โ”ƒ   โ”ฃโ”โ” modeling
โ”ƒ   โ”ฃโ”โ” deploying
โ”ƒ   โ”—โ”โ” python๐Ÿ/pandas๐Ÿผ/FastAPI
โ”ƒ  
โ”ฃโ”โ” Equity in Healthcare: Women in Data Science Datathon 2024 
โ”ƒ   - WIDS Datathon Project predicting a timely diagnosis of Metastatic Cancer
โ”ƒ   โ”ฃโ”โ” data cleaning
โ”ƒ   โ”ฃโ”โ” data wrangling
โ”ƒ   โ”ฃโ”โ” data analysis
โ”ƒ   โ”ฃโ”โ” modeling
โ”ƒ   โ”—โ”โ” python๐Ÿ/pandas๐Ÿผ/ensemble๐ŸŒณ/keras๐Ÿง 
โ”ƒ  
โ”ฃโ”โ” Relative Search Volumes Analysis  
โ”ƒ   - Search Volumes for Autism vs Autism Spectrum Disorder around the world
โ”ƒ   โ”ฃโ”โ” data scraping
โ”ƒ   โ”ฃโ”โ” data cleaning
โ”ƒ   โ”ฃโ”โ” modeling WIP
โ”ƒ   โ”—โ”โ” python๐Ÿ/pandas๐Ÿผ
โ”ƒ  
โ”ฃโ”โ” Steelplate Defect Visual EDA  
โ”ƒ   - Colorful joyplots for Visual EDA
โ”ƒ   โ”ฃโ”โ” data visualization
โ”ƒ   โ”ฃโ”โ” ensemble
โ”ƒ   โ”—โ”โ” python๐Ÿ/pandas๐Ÿผ/xgb๐ŸŒณ/seaborn๐ŸŽจ
โ”ƒ  
โ”ฃโ”โ” hossenfelder - ๐ŸฆบWIP  
โ”ƒ - Data Analysis and Prediction of views on Sabine Hossenfelder YT channel
โ”ƒ   โ”ฃโ”โ” data scraping
โ”ƒ   โ”ฃโ”โ” data cleaning
โ”ƒ   โ”ฃโ”โ” modeling WIP
โ”ƒ   โ”—โ”โ” python๐Ÿ/pandas๐Ÿผ
โ”ƒ  
โ”—โ”โ” MyFalaClassifier - ๐ŸฆบWIP  
- Detector of surfable waves
    โ”ฃโ”โ” live-stream scraping
    โ”ฃโ”โ” image processing
    โ”ฃโ”โ” transfer learning
    โ”ฃโ”โ” deploying
    โ”—โ”โ” python๐Ÿ/keras๐Ÿง 

Languages and Tools:

pandas polars scikit_learn python seaborn bash git postgresql tensorflow go gcp

Connect with me:

anopsy madkowalczuk anopsy anopsy_amsterdam @anopsy28

anopsy

ย anopsy

anopsy

anopsy

anopsy

Pinned Loading

  1. Juniors_vs_ChatGPT Juniors_vs_ChatGPT Public

    Inspired by personal curiosity and a 2023 Hackathon challenge (won in the โ€˜Most Polishedโ€™ category). This project investigates the impact of large language models like ChatGPT on entry-level roles โ€ฆ

    Jupyter Notebook 1

  2. Compensation-prediction Compensation-prediction Public

    An integrated data modeling and model experimentation project, packaged as a Streamlit app for predicting estimated compensation in engineering jobs

    Jupyter Notebook 2

  3. MaskMap MaskMap Public

    Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking (AM) and help medical staff and patients differentiate between anxiety, depression, and the lonโ€ฆ

    Jupyter Notebook

  4. Equity_in_Healthcare Equity_in_Healthcare Public

    Predicitng a timely diagnosis in metastatic cancer patients. Data cleaning, feature engineering and hyperparams tuning of classification model ensemble

    Jupyter Notebook 1

  5. koaning/scikit-lego koaning/scikit-lego Public

    Extra blocks for scikit-learn pipelines.

    Python 1.4k 121

  6. narwhals-dev/narwhals narwhals-dev/narwhals Public

    Lightweight and extensible compatibility layer between dataframe libraries!

    Python 1.2k 161