Driven by curiosity and a passion for solving real-world problems with data. Comfortable across the full data lifecycle โ from ingestion and preprocessing to modeling, deployment, and insight delivery. Proficient in Python for data analysis and machine learning, with experience in cloud platforms (AWS, Azure, GCP) and data visualization for clear communication.
Currently seeking a Data Scientist or Data Engineering role where I can apply my analytical, programming, and communication skills.
Building a scalable pipeline for processing and analyzing network flow data, with a focus on anomaly detection and bot activity.
- โ๏ธ Ingest and sample large network datasets with Polars
- ๐งฑ Transform raw flow logs into feature-rich tabular format
- ๐ ๏ธ Develop modular ETL pipeline for local or streamed flow data
- ๐ง Integrate anomaly detection and classification models
(e.g. Isolation Forest, LOF, Random Forest, LGBM) - ๐งช Evaluate under real-world class imbalance
- ๐ Working with large tabular datasets
- ๐งฎ Handling class imbalance in cybersecurity contexts
- ๐ Anomaly detection techniques
- ๐ Practical ML evaluation
- ๐งฐ Prototyping realistic data pipelines
and that's Ritchie Vink - creator of Polars with my graffiti:

-
๐จโ๐ป All of my projects are available at https://github.com/anopsy
-
๐If you'd like to hire me, check my CV
-
๐ซ How to reach me [email protected]
-
โก Fun fact ๐จ I paint graffiti portraits
๐จ Selected Projects โฃโโ Data Science Content Intern at NannyML: โ โฃโโ ๐Post-Deployment Data Science blogs โ โ โฃโโ ๐Data Quality and Covariate Shift โ โ โโโ ๐Models aren't Forever โ โฃโโ contributed to the Research team on Anomaly Detection by evaluating multiple detection algorithms and generating synthetic datasets โ โโโ contributed to docs โ โฃโโ PyData and PyLadies Con speaker and volunteer at: โ โฃโโ ๐ฝPyData Amsterdam 2024 Talk-Alice in Open Source Land โ โฃโโ ๐คPyLadiesCon 2024 Talk โ โโโ ๐PyData Open Source Sprint โ โฃโโ Contributed to OSS at: โ โฃโโ ๐งฑscikit-lego โ โ โฃโโ contributed to docs โ โ โโโ made ColumnSelector dataframe agnostic using Narwhals โ โโโ ๐ณ๐ฆnarwhals โ โ โฃโโ worked on pyarrow/dask backend implementation โ โ โโโ contributed to docs and tests โ โโโ ๐กembetter โ โฃโโ deprecated a method โ โโโ added pre-commit hooks โ โฃโโ Juniors_vs_ChatGPT โ - Did ChatGPT replaced Juniors and Interns? โ โฃโโ data cleaning โ โฃโโ data wrangling โ โฃโโ data analysis โ โฃโโ modeling โ โโโ python๐/API/polars๐ปโโ๏ธ/hvplot๐ โ โฃโโ Compensation Prediction โ - How much do Engineers earn? โ โฃโโ data modeling โ โฃโโ model evaluation โ โฃโโ containerization using docker โ โฃโโ building streamlit app โ โโโ python๐/scikit-learn/streamlit๐/docker๐ฆ โ โฃโโ MaskMap: Decoding the Hidden Spectrum โ - Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking โ โฃโโ data scraping โ โฃโโ data cleaning โ โฃโโ modeling โ โฃโโ deploying โ โโโ python๐/pandas๐ผ/FastAPI โ โฃโโ Equity in Healthcare: Women in Data Science Datathon 2024 โ - WIDS Datathon Project predicting a timely diagnosis of Metastatic Cancer โ โฃโโ data cleaning โ โฃโโ data wrangling โ โฃโโ data analysis โ โฃโโ modeling โ โโโ python๐/pandas๐ผ/ensemble๐ณ/keras๐ง โ โฃโโ Relative Search Volumes Analysis โ - Search Volumes for Autism vs Autism Spectrum Disorder around the world โ โฃโโ data scraping โ โฃโโ data cleaning โ โฃโโ modeling WIP โ โโโ python๐/pandas๐ผ โ โฃโโ Steelplate Defect Visual EDA โ - Colorful joyplots for Visual EDA โ โฃโโ data visualization โ โฃโโ ensemble โ โโโ python๐/pandas๐ผ/xgb๐ณ/seaborn๐จ โ โฃโโ hossenfelder - ๐ฆบWIP โ - Data Analysis and Prediction of views on Sabine Hossenfelder YT channel โ โฃโโ data scraping โ โฃโโ data cleaning โ โฃโโ modeling WIP โ โโโ python๐/pandas๐ผ โ โโโ MyFalaClassifier - ๐ฆบWIP - Detector of surfable waves โฃโโ live-stream scraping โฃโโ image processing โฃโโ transfer learning โฃโโ deploying โโโ python๐/keras๐ง