GitHub - manvirag/2025-ai-engineer

Ai usecases 2025:

Perception What it does: Extract features from images/videos. Industries: Healthcare (medical imaging), Automotive (autonomous driving), Security (surveillance). Tech: CNNs, Vision Transformers (ViT), Image Segmentation (U-Net); Deep Learning (Computer Vision).
Speech Recognition What it does: Convert spoken audio to text. Industries: Customer Support, Consumer Electronics (voice assistants), Automotive (voice commands). Tech: RNNs, LSTMs, Transformers, CTC loss; Deep Learning (Speech Processing/NLP).
Text Understanding What it does: Comprehend text intent, entities, sentiment. Industries: Finance (document analysis), Legal (contract review), Customer Service. Tech: Transformers (BERT, RoBERTa), Named Entity Recognition (NER); Deep Learning (NLP).
Text Generation What it does: Produce coherent language output. Industries: Marketing (content creation), Media (summarization), Education (tutoring). Tech: Autoregressive Transformers (GPT family), Seq2Seq models; Deep Learning (NLP).
Knowledge Retrieval What it does: Retrieve relevant external info for tasks. Industries: Tech Support, Research, Healthcare. Tech: Dense vector retrieval with k-NN, embedding models (BERT embeddings), combined with LLMs (RAG); ML + DL (Information Retrieval + NLP).
Multimodal Fusion What it does: Align and integrate multiple data types. Industries: Retail (visual search), Entertainment (video captioning), Autonomous Systems. Tech: Multimodal Transformers, Cross-Attention; Deep Learning (Multimodal AI).
Prediction What it does: Forecast or detect anomalies from data. Industries: Finance (fraud detection), Manufacturing (predictive maintenance), Energy (demand forecasting). Tech: Regression, Random Forest, Gradient Boosting, LSTM; Machine Learning + Deep Learning (Time-Series Analysis).
Decision Making What it does: Optimize actions/plans based on goals. Industries: Logistics (route planning), Robotics, Gaming. Tech: Reinforcement Learning (Q-learning, Policy Gradients), Heuristic Search; Machine Learning (Reinforcement Learning).
Generative Content Creation What it does: Create new images, audio, code, etc. Industries: Advertising, Software Dev, Arts & Music. Tech: GANs, Diffusion Models, Autoregressive models (Codex); Deep Learning (Generative Models).
Autonomous Agents What it does: Autonomous perception, reasoning, and action. Industries: Autonomous Vehicles, Virtual Assistants, Industrial Automation. Tech: Integration of CNNs, Transformers, RL, Planning Algorithms; AI Systems + ML + DL (Agent-based AI).

Different section in AI:

Artificial Intelligence (AI)

AI is the broad field of creating intelligent systems that can mimic human behavior.

1. Symbolic AI / Classical AI

Rule-Based Systems
Knowledge Graphs
Expert Systems

2. Machine Learning (ML)

ML is a subset of AI focused on systems that learn from data.

2.1 Learning Paradigms

Supervised Learning
- Tasks: Regression, Classification
- Algorithms:
  - Linear Regression
  - Logistic Regression
  - Decision Trees
  - Random Forest
  - Support Vector Machine (SVM)
Unsupervised Learning
- Tasks: Clustering, Dimensionality Reduction
- Algorithms:
  - K-Means
  - DBSCAN
  - PCA
  - t-SNE
Semi-Supervised Learning
Self-Supervised Learning
Reinforcement Learning (RL)
- Algorithms:
  - Q-Learning
  - SARSA
  - Deep Q-Network (DQN)
  - Proximal Policy Optimization (PPO)
  - A3C, DDPG, etc.

2.2 ML Techniques

Classical ML – Uses above algorithms
Deep Learning (DL) – Uses neural networks:
- Feedforward Neural Networks (FNN / DNN)
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
  - LSTM, GRU
- Autoencoders (AE, VAE)
- Generative Adversarial Networks (GANs)
- Transformers
  - Used in NLP, LLMs, Vision, Speech

Application Domains

3.1 Natural Language Processing (NLP)

Processes and understands human language.

Tasks:
- Text Classification
- Named Entity Recognition (NER)
- Machine Translation
- Summarization, QA
Models:
- RNN, LSTM
- BERT, RoBERTa
- GPT Series
- T5, XLNet
Includes: Large Language Models (LLMs)
- GPT-3, GPT-4, GPT-4o
- LLaMA, Claude, PaLM
- Used in chatbots, agents, RAG systems

3.2 Computer Vision (CV)

Processes and understands visual data (images, videos).

Tasks:
- Image Classification
- Object Detection
- Image Segmentation
- Image Generation
Models:
- CNNs: VGG, ResNet, EfficientNet
- Vision Transformers (ViT, DINO)
- GANs: StyleGAN, CycleGAN
Applications:
- Facial Recognition, OCR, Medical Imaging

3.3 Speech / Audio Processing

Processes audio and speech.

Tasks:
- ASR (Automatic Speech Recognition)
- TTS (Text-to-Speech)
- Speaker Identification
Models:
- RNNs, CNNs
- Transformers (e.g., Whisper)
- WaveNet, Tacotron

3.4 Multimodal AI

Combines multiple input types: text + image + audio + video.

Examples:
- CLIP (text + image)
- Whisper (speech + text)
- Flamingo, GPT-4o, Gemini, Sora

3.5 Retrieval-Augmented Generation (RAG)

Combines LLMs with external data sources.

Components:
- Embedding Models
- Vector Databases (e.g., FAISS, Pinecone)
- LLMs for answer generation
Use Cases:
- Chat over documents
- Internal knowledge bots
- QA over web, PDFs, databases

NLP vs LLM (Brief)

NLP is the science of understanding and working with language.
LLMs are advanced tools (like ChatGPT) used within NLP to understand and generate text.
Gen AI is the bigger umbrella that includes LLMs and also tools that make:
- Images (like DALL·E, Midjourney)
- Music (like Suno)
- Videos (like Sora)
- Code (like GitHub Copilot)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
image.png		image.png
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ai usecases 2025:

Different section in AI:

Artificial Intelligence (AI)

1. Symbolic AI / Classical AI

2. Machine Learning (ML)

2.1 Learning Paradigms

2.2 ML Techniques

Application Domains

3.1 Natural Language Processing (NLP)

3.2 Computer Vision (CV)

3.3 Speech / Audio Processing

3.4 Multimodal AI

3.5 Retrieval-Augmented Generation (RAG)

NLP vs LLM (Brief)

RAG Deep dive:

AI agents deep dive:

MCP server:

High level understanding of multi modal:

LLM architecture deep dive:

About

Uh oh!

Releases

Packages

manvirag/2025-ai-engineer

Folders and files

Latest commit

History

Repository files navigation

Ai usecases 2025:

Different section in AI:

Artificial Intelligence (AI)

1. Symbolic AI / Classical AI

2. Machine Learning (ML)

2.1 Learning Paradigms

2.2 ML Techniques

Application Domains

3.1 Natural Language Processing (NLP)

3.2 Computer Vision (CV)

3.3 Speech / Audio Processing

3.4 Multimodal AI

3.5 Retrieval-Augmented Generation (RAG)

NLP vs LLM (Brief)

RAG Deep dive:

AI agents deep dive:

MCP server:

High level understanding of multi modal:

LLM architecture deep dive:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages