Skip to content

manvirag/2025-ai-engineer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Ai usecases 2025:

  1. Perception What it does: Extract features from images/videos. Industries: Healthcare (medical imaging), Automotive (autonomous driving), Security (surveillance). Tech: CNNs, Vision Transformers (ViT), Image Segmentation (U-Net); Deep Learning (Computer Vision).

  2. Speech Recognition What it does: Convert spoken audio to text. Industries: Customer Support, Consumer Electronics (voice assistants), Automotive (voice commands). Tech: RNNs, LSTMs, Transformers, CTC loss; Deep Learning (Speech Processing/NLP).

  3. Text Understanding What it does: Comprehend text intent, entities, sentiment. Industries: Finance (document analysis), Legal (contract review), Customer Service. Tech: Transformers (BERT, RoBERTa), Named Entity Recognition (NER); Deep Learning (NLP).

  4. Text Generation What it does: Produce coherent language output. Industries: Marketing (content creation), Media (summarization), Education (tutoring). Tech: Autoregressive Transformers (GPT family), Seq2Seq models; Deep Learning (NLP).

  5. Knowledge Retrieval What it does: Retrieve relevant external info for tasks. Industries: Tech Support, Research, Healthcare. Tech: Dense vector retrieval with k-NN, embedding models (BERT embeddings), combined with LLMs (RAG); ML + DL (Information Retrieval + NLP).

  6. Multimodal Fusion What it does: Align and integrate multiple data types. Industries: Retail (visual search), Entertainment (video captioning), Autonomous Systems. Tech: Multimodal Transformers, Cross-Attention; Deep Learning (Multimodal AI).

  7. Prediction What it does: Forecast or detect anomalies from data. Industries: Finance (fraud detection), Manufacturing (predictive maintenance), Energy (demand forecasting). Tech: Regression, Random Forest, Gradient Boosting, LSTM; Machine Learning + Deep Learning (Time-Series Analysis).

  8. Decision Making What it does: Optimize actions/plans based on goals. Industries: Logistics (route planning), Robotics, Gaming. Tech: Reinforcement Learning (Q-learning, Policy Gradients), Heuristic Search; Machine Learning (Reinforcement Learning).

  9. Generative Content Creation What it does: Create new images, audio, code, etc. Industries: Advertising, Software Dev, Arts & Music. Tech: GANs, Diffusion Models, Autoregressive models (Codex); Deep Learning (Generative Models).

  10. Autonomous Agents What it does: Autonomous perception, reasoning, and action. Industries: Autonomous Vehicles, Virtual Assistants, Industrial Automation. Tech: Integration of CNNs, Transformers, RL, Planning Algorithms; AI Systems + ML + DL (Agent-based AI).

Different section in AI:

Artificial Intelligence (AI)

AI is the broad field of creating intelligent systems that can mimic human behavior.

1. Symbolic AI / Classical AI

  • Rule-Based Systems
  • Knowledge Graphs
  • Expert Systems

2. Machine Learning (ML)

ML is a subset of AI focused on systems that learn from data.

2.1 Learning Paradigms

  • Supervised Learning

    • Tasks: Regression, Classification
    • Algorithms:
      • Linear Regression
      • Logistic Regression
      • Decision Trees
      • Random Forest
      • Support Vector Machine (SVM)
  • Unsupervised Learning

    • Tasks: Clustering, Dimensionality Reduction
    • Algorithms:
      • K-Means
      • DBSCAN
      • PCA
      • t-SNE
  • Semi-Supervised Learning

  • Self-Supervised Learning

  • Reinforcement Learning (RL)

    • Algorithms:
      • Q-Learning
      • SARSA
      • Deep Q-Network (DQN)
      • Proximal Policy Optimization (PPO)
      • A3C, DDPG, etc.

2.2 ML Techniques

  • Classical ML – Uses above algorithms
  • Deep Learning (DL) – Uses neural networks:
    • Feedforward Neural Networks (FNN / DNN)
    • Convolutional Neural Networks (CNN)
    • Recurrent Neural Networks (RNN)
      • LSTM, GRU
    • Autoencoders (AE, VAE)
    • Generative Adversarial Networks (GANs)
    • Transformers
      • Used in NLP, LLMs, Vision, Speech

Application Domains

3.1 Natural Language Processing (NLP)

Processes and understands human language.

  • Tasks:

    • Text Classification
    • Named Entity Recognition (NER)
    • Machine Translation
    • Summarization, QA
  • Models:

    • RNN, LSTM
    • BERT, RoBERTa
    • GPT Series
    • T5, XLNet
  • Includes: Large Language Models (LLMs)

    • GPT-3, GPT-4, GPT-4o
    • LLaMA, Claude, PaLM
    • Used in chatbots, agents, RAG systems

3.2 Computer Vision (CV)

Processes and understands visual data (images, videos).

  • Tasks:

    • Image Classification
    • Object Detection
    • Image Segmentation
    • Image Generation
  • Models:

    • CNNs: VGG, ResNet, EfficientNet
    • Vision Transformers (ViT, DINO)
    • GANs: StyleGAN, CycleGAN
  • Applications:

    • Facial Recognition, OCR, Medical Imaging

3.3 Speech / Audio Processing

Processes audio and speech.

  • Tasks:

    • ASR (Automatic Speech Recognition)
    • TTS (Text-to-Speech)
    • Speaker Identification
  • Models:

    • RNNs, CNNs
    • Transformers (e.g., Whisper)
    • WaveNet, Tacotron

3.4 Multimodal AI

Combines multiple input types: text + image + audio + video.

  • Examples:
    • CLIP (text + image)
    • Whisper (speech + text)
    • Flamingo, GPT-4o, Gemini, Sora

3.5 Retrieval-Augmented Generation (RAG)

Combines LLMs with external data sources.

  • Components:

    • Embedding Models
    • Vector Databases (e.g., FAISS, Pinecone)
    • LLMs for answer generation
  • Use Cases:

    • Chat over documents
    • Internal knowledge bots
    • QA over web, PDFs, databases

NLP vs LLM (Brief)

  • NLP is the science of understanding and working with language.
  • LLMs are advanced tools (like ChatGPT) used within NLP to understand and generate text.
  • Gen AI is the bigger umbrella that includes LLMs and also tools that make:
    • Images (like DALL·E, Midjourney)
    • Music (like Suno)
    • Videos (like Sora)
    • Code (like GitHub Copilot)

RAG Deep dive:

AI agents deep dive:

MCP server:

High level understanding of multi modal:

LLM architecture deep dive:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published