Files / Emerging technologies

What it is doing and why it works

Stephen Wolfram delves into the language generation mechanisms, neural network foundations, and training logic from first principles, exploring the underlying "science of language" and "laws of thought" behind its success.

Detail

Published

22/12/2025

Key Chapter Title List

  1. It Just Adds One Word at a Time
  2. Where Do the Probabilities Come From?
  3. What Is a Model?
  4. Models for Human-Like Tasks
  5. Neural Networks
  6. Machine Learning and Neural Net Training
  7. The Practice and Lore of Neural Net Training
  8. “Surely a Network That’s Big Enough Can Do Anything!”
  9. The Concept of Embeddings
  10. Inside ChatGPT
  11. The Training of ChatGPT
  12. Beyond Basic Training
  13. What Really Lets ChatGPT Work?
  14. The Space of Meaning and the Laws of Semantic Motion
  15. Semantic Grammar and the Power of Computational Language
  16. So … What Is ChatGPT Doing, and Why Does It Work?
  17. Wolfram|Alpha as a Way to Give ChatGPT Computational Knowledge Superpowers

Document Introduction

This report, authored by computer scientist Stephen Wolfram, aims to provide an in-depth explanation from first principles of how ChatGPT works and the reasons for its success. The report is not merely a technical analysis but also integrates scientific, philosophical, and interdisciplinary perspectives, attempting to synthesize centuries of thought and discoveries to understand the logic behind this artificial intelligence system that seemingly generates human-like language.

The report begins at an intuitive level, pointing out that ChatGPT's core task is to generate plausible continuations based on received text, essentially predicting the next most probable word (or token) one at a time. This process relies on probability distributions learned from vast amounts of human text and controls randomness through a temperature parameter to balance creativity and coherence in the generated text. Using examples from simple language models (such as GPT-2), the author demonstrates the process of generating text from probability tables and contrasts zero-temperature generation with temperature-based generation, laying the groundwork for subsequent in-depth analysis.

Subsequently, the report systematically explores the origin of probabilities, the basic concept of models, and how neural networks—idealized models simulating the workings of the human brain—are used to handle complex tasks like language. Through classic machine learning cases such as image recognition (e.g., handwritten digit classification), the author explains how neural networks extract features through hierarchical processing and, during training, adjust weights to minimize a loss function, gradually approximating the target function. The report particularly emphasizes the artistry and engineering practices in current neural network training, including key aspects like architecture selection, data preparation, and hyperparameter tuning. It notes that ChatGPT's key advantage in handling massive data lies in its ability to learn language patterns from raw text through unsupervised learning.

When delving into the internal structure of ChatGPT, the report highlights its Transformer-based neural network design, explaining how the embedding layer, attention mechanism, and multi-layer attention blocks work together to transform text sequences into high-dimensional vectors for processing, ultimately outputting a probability distribution for the next word. The author points out that although the network contains 175 billion weights, its basic computational units are still simple artificial neurons, collectively forming an acyclic feedforward network. The success of this design suggests that human language may possess a simpler and more regular structure than previously imagined, and ChatGPT implicitly captures these patterns through training.

The report further explores the deeper implications behind ChatGPT's success, introducing concepts like semantic grammar and computational language. The author posits that the meaning of human language may follow certain rules akin to laws of semantic motion, and ChatGPT's embedding space maps this semantic structure to some extent. Simultaneously, by integrating ChatGPT with Wolfram|Alpha, it can be endowed with superpowers for structured computation and knowledge retrieval, thereby compensating for its shortcomings in precise calculation, fact-checking, and complex reasoning. This achieves a complementary synergy between statistically-driven and symbolically-driven AI.

Finally, the report summarizes the essence of ChatGPT: it is a system trained on vast amounts of human text, generating text word-by-word through a feedforward neural network. Its success reveals that human language and the underlying thought processes may be more structured and regular than previously believed. This discovery holds significant scientific importance for the development of artificial intelligence and points the way toward building more powerful, interpretable intelligent systems that combine human linguistic creativity with computational precision.