quarter04-assignment-1

The History of Artificial Intelligence

From Early Foundations to Agentic AI

Governor House IT Initiative Programme
Quarter 4 - Prompt Engineering Assignment
Date: October 29, 2025


Presentation Agenda

  1. Introduction to Artificial Intelligence
  2. Early Foundations (1940s-1970s)
  3. Classical AI Era (1980s-1990s)
  4. Modern AI Renaissance (2000s-2010s)
  5. How Large Language Models Work
  6. Major Breakthroughs Enabling LLMs
  7. The LLM Revolution (2017-2023)
  8. Agentic AI Era (2023-Present)
  9. Current Landscape and Future
  10. Conclusion and Key Takeaways

SECTION 1: INTRODUCTION


What is Artificial Intelligence?

Definition:

The science and engineering of creating intelligent machines that can perform tasks typically requiring human intelligence.

Key Capabilities:


Why Study AI History?

Understanding Evolution = Better Implementation

  1. Learn from Past Failures: AI winters and overhyping
  2. Appreciate Current Capabilities: How we got here
  3. Predict Future Trends: Where we’re heading
  4. Make Informed Decisions: Strategic AI adoption
  5. Understand Limitations: What AI can and cannot do

AI Evolution Timeline Overview

1950s: Birth of AI (Turing Test, Dartmouth Conference)
1960s-70s: Early Optimism & First AI Winter
1980s: Expert Systems Boom
1987-93: Second AI Winter
1990s-2000s: Machine Learning Renaissance
2012: Deep Learning Revolution
2017: Transformer Architecture
2022: ChatGPT & Mass Adoption
2023+: Agentic AI Era

SECTION 2: EARLY FOUNDATIONS (1940s-1970s)


Pre-AI Era: Theoretical Foundations

1940s - The Mathematical Groundwork

Key Insight:

“Intelligence could be described precisely enough that a machine could simulate it”


1950: The Turing Test

Alan Turing: “Computing Machinery and Intelligence”

The Imitation Game:

Turing’s Question:

“Can machines think?” → “Can machines do what we (as thinking entities) can do?”


1956: Birth of AI - Dartmouth Conference

The Founding Moment of AI as a Field

Organizers:

Bold Claim:

“Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it”


Early AI Programs (1957-1960)

Logic Theorist (1956)

General Problem Solver (1957)

Perceptron (1958)


1960s: Early Achievements

Major Developments:

  1. ELIZA (1964) - First chatbot by Joseph Weizenbaum
  2. DENDRAL (1965) - First expert system for chemistry
  3. Shakey Robot (1969) - First mobile robot with reasoning
  4. MYCIN (1972) - Medical diagnosis expert system

Problem: Overconfidence led to unrealistic expectations


First AI Winter (1974-1980)

Why Did AI Fail?

❄️ Computational Limitations

❄️ Lack of Data

❄️ Overpromising

❄️ Fundamental Issues


SECTION 3: CLASSICAL AI ERA (1980s-1990s)


Expert Systems Boom (1980-1987)

The Golden Age of Rule-Based AI

Success Stories:

How Expert Systems Worked:

IF condition1 AND condition2 THEN action
IF patient has fever AND cough THEN likely flu

1986: Backpropagation Returns

Rumelhart, Hinton & Williams Popularize Backpropagation

Breakthrough:

Impact:


Second AI Winter (1987-1993)

The Crash of Expert Systems

Failures:

Funding Dried Up:


SECTION 4: MODERN RENAISSANCE (1990s-2010s)


Statistical AI Emerges (1990s)

Shift from Symbolic to Statistical Approaches

Key Developments:

Philosophy Change:

From “Program intelligence” to “Learn intelligence from data”


Landmark Achievements (1997-2011)

1997: Deep Blue Defeats Kasparov

2005: DARPA Grand Challenge

2011: IBM Watson Wins Jeopardy!


2012: The Deep Learning Revolution

AlexNet Wins ImageNet Competition

Why It Matters:

Result: Deep learning becomes dominant paradigm


Deep Learning Success (2012-2016)

Breakthroughs Across Domains:

Key Enabler: GPUs + Big Data + Better Algorithms


SECTION 5: HOW LARGE LANGUAGE MODELS WORK


What Are Large Language Models?

Definition:

Sophisticated prediction engines that generate text by predicting the next most likely word (token) based on patterns learned from massive datasets.

Not Intelligence, But:

Key Insight: They don’t “understand” - they predict patterns


The Fundamental Process: Tokens

Token-by-Token Generation

What is a Token?

Example Tokenization:

Text: "ChatGPT is amazing!"
Tokens: ["Chat", "G", "PT", " is", " amazing", "!"]

How LLMs Generate Text

Step-by-Step Process:

  1. Input Processing: Convert prompt to tokens → embeddings
  2. Context Analysis: Self-attention analyzes relationships
  3. Prediction: Generate probability for each possible next token
  4. Selection: Choose token based on strategy (greedy/sampling)
  5. Iteration: Add selected token and repeat
  6. Stopping: Continue until end-of-sequence or max length

Key Point: Each token depends on ALL previous tokens


LLM Training: Two Phases

Phase 1: Pre-training (Learning Language)

Phase 2: Fine-tuning (Alignment)


Key Configuration Parameters

Temperature (0.0 - 1.0)

Top-K / Top-P

Context Window


LLM Capabilities

What LLMs Excel At: ✅ Language understanding and generation ✅ Pattern recognition and completion ✅ Translation and summarization ✅ Code generation ✅ Question answering ✅ Creative writing ✅ Few-shot learning


LLM Limitations

Critical Limitations:

Hallucinations: Generate plausible but false information ❌ No Real Understanding: Pattern matching, not comprehension ❌ Knowledge Cutoff: Only knows training data ❌ No Learning: Cannot update from corrections ❌ Context Limits: Finite memory window ❌ Biases: Reflect training data biases ❌ No Verification: Cannot check own outputs


10 Essential Questions About LLMs

From MIT Sloan Review:

  1. How do LLMs decide when to stop generating?
  2. Can LLMs update from corrections immediately?
  3. How do LLMs “remember” past conversations?
  4. How do they answer questions after training cutoff?
  5. Can we force LLMs to use only provided documents?
  6. Can we trust LLM citations?
  7. Is RAG still necessary with long contexts?
  8. Can hallucinations be eliminated?
  9. How to efficiently check LLM outputs?
  10. Can we guarantee identical answers?

Answer to most: Partially, with proper engineering


SECTION 6: MAJOR BREAKTHROUGHS ENABLING LLMs


Breakthrough #1: The Transformer (2017)

“Attention Is All You Need” - Vaswani et al.

Revolutionary Changes:

Impact:

Foundation for GPT, BERT, T5, and all modern LLMs


The Self-Attention Mechanism

Key Innovation: Words Attend to All Other Words

How It Works:

  1. Each word gets Query, Key, Value representations
  2. Calculate attention scores between all word pairs
  3. Weighted sum produces contextualized representation

Result:

Formula: Attention(Q,K,V) = softmax(QK^T/√d_k)V


Breakthrough #2: Scaling Laws (2020)

Kaplan et al.: “Scaling Laws for Neural Language Models”

Key Discovery:

Model performance scales predictably with size, data, and compute

Power Laws:

Strategic Impact:


Model Size Evolution

The Exponential Growth:

Moore’s Law for AI:

Model sizes doubling every ~6 months


Breakthrough #3: Transfer Learning

Pre-train, Then Fine-tune Paradigm

Two-Stage Training:

Stage 1: Pre-training

Stage 2: Fine-tuning

Efficiency Gain: 1 expensive pre-training → Many cheap fine-tunings


Breakthrough #4: Hardware Revolution

GPUs Transform AI Training

NVIDIA CUDA Ecosystem:

Specialized AI Hardware:

Cost Impact: What took years now takes weeks


Breakthrough #5: Massive Datasets

Data is the New Oil

Dataset Evolution:

Data Quality Matters:


Breakthrough #6: Attention Mechanisms

Evolution of Attention:

2014: Bahdanau Attention (for RNNs)

2015: Luong Attention

2017: Self-Attention (Transformer)

Impact: Made long-range understanding possible


Breakthrough #7: Mixture of Experts (MoE)

Sparse Activation for Efficiency

Concept:

Benefits:

Examples: GPT-4, Switch Transformer, Grok


Breakthrough #8: Advanced Training Techniques

Algorithmic Innovations:

  1. Adam Optimizer: Adaptive learning rates
  2. Layer Normalization: Stable deep network training
  3. Gradient Clipping: Prevent exploding gradients
  4. Mixed Precision: FP16/FP32 for speed
  5. Learning Rate Scheduling: Cosine annealing, warmup

Result: Reliable training of 100B+ parameter models


Breakthrough #9: RLHF (Reinforcement Learning from Human Feedback)

Aligning AI with Human Intent

Three-Step Process:

  1. Supervised Fine-tuning: Train on quality examples
  2. Reward Modeling: Learn human preferences
  3. PPO Training: Optimize using reward model

Impact:

Key to: ChatGPT’s success and user-friendliness


Breakthrough #10: Software Ecosystem

Tools That Enabled the Revolution

Deep Learning Frameworks:

LLM Tools:

Impact: Lowered barriers to AI development


SECTION 7: THE LLM REVOLUTION (2017-2023)


2017-2018: Transformer Models Emerge

BERT (2018) - Google

GPT-1 (2018) - OpenAI

Impact: Transformers dominate NLP research


2019: GPT-2 Controversy

1.5B Parameters - “Too Dangerous to Release”

OpenAI’s Concerns:

Reality Check:

Lesson: Balance innovation with responsibility


2020: GPT-3 Breakthrough

175B Parameters - Emergence of New Capabilities

Surprising Abilities:

Industry Impact:


2021-2022: Multimodal and Specialized Models

Expanding Beyond Text:

CLIP (2021): Text-image understanding Codex (2021): Code generation (GitHub Copilot) DALL-E (2022): Text-to-image generation Whisper (2022): Speech recognition

Trend: From narrow to general-purpose AI


November 30, 2022: ChatGPT Launch

The Moment AI Went Mainstream

Unprecedented Growth:

Why It Succeeded:

Impact: AI became household term


2023: GPT-4 and Multimodal AI

GPT-4 - Most Capable Model Yet

Capabilities:

Other Developments:


The Open vs. Closed Debate

Two Philosophies:

Closed/Proprietary (OpenAI, Anthropic, Google)

Open Source (Meta, Mistral, Stability AI)

Current State: Hybrid approaches emerging


SECTION 8: AGENTIC AI ERA (2023-PRESENT)


What is Agentic AI?

From Reactive to Proactive Intelligence

Traditional LLMs (Reactive):

Agentic AI (Proactive):

Paradigm Shift: From chatbot to autonomous agent


Core Components of Agentic AI

1. Advanced Reasoning

2. Tool Use

3. Memory Systems

4. Multi-Agent Collaboration


Advanced Reasoning Techniques

Chain-of-Thought (CoT)

“Let’s think step by step”

Tree of Thoughts (ToT)

ReAct (Reasoning + Acting)


Tool Use and Function Calling

LLMs Can Now Use External Tools

Tool Categories:

  1. Information Retrieval: Web search, databases
  2. Computation: Calculators, code execution
  3. Communication: Email, messaging, APIs
  4. Creative Tools: Image generation, editing
  5. Business Systems: CRM, analytics, automation

Example:

{
  "tool": "web_search",
  "query": "latest AI developments 2024",
  "num_results": 10
}

Multi-Agent Systems

Multiple AI Agents Working Together

Coordination Approaches:

  1. Orchestration: Central coordinator
  2. Peer-to-peer: Direct agent communication
  3. Hierarchical: Manager and worker agents

Benefits:

Frameworks: AutoGPT, CrewAI, LangChain Agents


Prompt Engineering for Agentic AI

Context Engineering vs. Prompt Engineering

Prompt Engineering:

Context Engineering:

Best Practice: Combine both for optimal results


Mixture of Experts (MoE) in Agentic AI

Modern Models Use Specialized Experts

How MoE Works:

  1. Router analyzes input
  2. Selects relevant experts (typically 2 out of 8-64)
  3. Activates only chosen experts
  4. Combines expert outputs

Prompt Engineering for MoE:


Current Agentic AI Models (2024-2025)

Leading Systems:

GPT-4 Turbo & GPT-4o

Claude 3.5 Sonnet & Claude 4

Gemini 2.5 Pro

Grok 4 (xAI)


Real-World Agentic Applications

Business Automation:

Personal Productivity:

Scientific Discovery:


Challenges in Agentic AI

Technical Challenges:

Safety Concerns:

Solution: Robust oversight and safety measures


SECTION 9: CURRENT LANDSCAPE & FUTURE


Leading AI Models Comparison (2025)

Model Leaderboard (by capabilities):

Model Parameters Context Strengths
GPT-5 ~2T (MoE) 128K Reasoning, coding
Gemini 2.5 Pro ~2T (MoE) 2M Multimodal, long context
Claude 4 ~400B 200K Safety, helpfulness
Grok 4 ~500B (MoE) 128K Real-time data
DeepSeek-V3 671B (37B active) 64K Cost-effective

Trend: MoE dominates, context lengths growing


Key Players in AI

Major Organizations:

🏢 OpenAI: GPT series, ChatGPT, DALL-E 🏢 Google DeepMind: Gemini, AlphaGo, AlphaFold 🏢 Anthropic: Claude, Constitutional AI 🏢 Meta: LLaMA, open-source focus 🏢 Microsoft: Copilot integration, Azure AI 🏢 xAI: Grok, Twitter integration 🏢 Mistral: European, open-source commercial

Market Size: $200B+ by 2030 (projected)


Ethical Considerations

Critical Issues:

1. Bias and Fairness

2. Privacy and Security

3. Job Displacement

4. Misinformation

5. Environmental Impact


AI Safety and Alignment

Ensuring Beneficial AI

Technical Approaches:

Governance:

Open Challenge: Aligning AGI if achieved


Future Directions (2025-2030)

Near-Term Innovations:

  1. Efficiency Improvements: Smaller, faster models
  2. Better Reasoning: Causal and logical thinking
  3. Persistent Memory: Long-term learning
  4. Multimodal Integration: Seamless cross-modal understanding
  5. Edge Deployment: Running LLMs locally
  6. Embodied AI: Physical world integration

The Path to AGI?

Artificial General Intelligence

Definition:

AI systems matching or exceeding human cognitive abilities across all domains

Current Gaps:

Timeline Predictions:

Reality: Uncertainty remains high


SECTION 10: CONCLUSION & KEY TAKEAWAYS


The AI Journey: Key Milestones

1950s: Birth of AI concept (Turing, Dartmouth) 1960s-70s: Early optimism and first winter 1980s: Expert systems boom and bust 1990s-2000s: Statistical ML renaissance 2012: Deep learning revolution (AlexNet) 2017: Transformer architecture 2020: Scaling laws validated (GPT-3) 2022: Mass adoption (ChatGPT) 2023+: Agentic AI emerges

Lesson: Progress isn’t linear - expect ups and downs


Major Paradigm Shifts

1. Symbolic → Statistical (1990s)

2. Shallow → Deep (2010s)

3. Narrow → General (2020s)

4. Passive → Agentic (2023+)

Each shift: Unlocked new capabilities and applications


Key Breakthroughs Summary

Top 10 Breakthroughs:

  1. ⚡ Transformer architecture (2017)
  2. 📈 Scaling laws discovery (2020)
  3. 🎓 Transfer learning paradigm
  4. 🖥️ GPU/TPU hardware revolution
  5. 📚 Massive dataset creation
  6. 👁️ Attention mechanisms
  7. 🧩 Mixture of Experts (MoE)
  8. 🔄 RLHF and alignment
  9. 🛠️ Software ecosystem (PyTorch, HuggingFace)
  10. 💰 Economic models (APIs, cloud)

Result: Convergence enabled modern AI


Understanding LLMs: Essential Points

What They Are:

What They’re Not:

Best Practice:


Agentic AI: The New Frontier

From Chatbots to Agents:

Key Capabilities: ✅ Autonomous planning and reasoning ✅ Tool use and system integration ✅ Multi-step task execution ✅ Persistent memory and learning

Current State:

Future: Foundation for next AI revolution


Implications for Society

Opportunities:

Challenges:

Need: Balanced, thoughtful approach


Strategic Takeaways for Implementation

For Organizations:

  1. Start with clear use cases: Don’t use AI for AI’s sake
  2. Invest in infrastructure: Data, compute, talent
  3. Prioritize safety and oversight: Human-in-the-loop
  4. Build responsibly: Ethics and privacy first
  5. Stay informed: Rapid evolution requires continuous learning

For Individuals:


The Road Ahead

What We Know:

What’s Uncertain:

What’s Certain:

We’re still in early days of this revolution!


Key Resources and References

Academic Papers:

Educational Resources:

Online Platforms:


Questions for Discussion

Technical:

  1. How do attention mechanisms differ from previous approaches?
  2. Why did scaling laws change AI development strategy?
  3. What makes agentic AI different from traditional LLMs?

Strategic:

  1. What industries will AI transform most in next 5 years?
  2. How should organizations prepare for AI adoption?
  3. What skills remain uniquely human?

Ethical:

  1. How do we ensure AI remains beneficial?
  2. What regulations are needed?
  3. How do we address job displacement?

Thank You!

The History of Artificial Intelligence

From Early Foundations to Agentic AI

Key Message:

We’ve journeyed from symbolic logic to neural networks, from reactive systems to autonomous agents. Understanding this history helps us build a better AI future.

Remember:

Let’s shape the future of AI together!


Backup Slides & Additional Information


Detailed Timeline: 1950-1980

Year Event Significance
1950 Turing Test Conceptual foundation
1956 Dartmouth AI field founded
1957 Perceptron Neural network learning
1965 DENDRAL First expert system
1969 Perceptrons book Showed limitations
1974-80 First AI Winter Funding dried up

Detailed Timeline: 1980-2010

Year Event Significance
1980 XCON Commercial expert system
1986 Backprop Neural network training
1987-93 Second AI Winter Expert systems fail
1997 Deep Blue Chess victory
2006 Deep learning Hinton’s breakthrough
2011 Watson Jeopardy! win

Detailed Timeline: 2012-Present

Year Event Significance
2012 AlexNet Deep learning proves out
2017 Transformer Architecture revolution
2018 GPT-1, BERT Foundation models emerge
2020 GPT-3 Scaling breakthrough
2022 ChatGPT Mass adoption
2023 GPT-4 Multimodal + agents
2024+ Agentic AI Autonomous systems

Compute Growth in AI Training

Exponential Increase:

Cost Evolution:

Trend: 10x increase every 2 years


Economic Impact of AI

Market Size:

Investment:

Job Impact:


AI Ethics Frameworks

Key Principles:

  1. Fairness: Reduce bias and discrimination
  2. Transparency: Explainable AI decisions
  3. Privacy: Data protection and consent
  4. Accountability: Clear responsibility
  5. Safety: Prevent harmful outputs
  6. Beneficence: Maximize positive impact

Implementation: Ongoing challenge across industry


Thank You for Your Attention!

Questions?

Contact Information:

Resources:


END OF PRESENTATION