—Brian Christian—1 min summary

The Alignment Problem

Machine Learning and Human Values

Summary

Really good book I would highly recommend to anyone interested in AI, machine learning, and how training those systems links back to our understanding of our own brains and ways we learn and do things.

Some of the core topics covered:

Training fundamentals: How different methods of LLM training work
Bias: How biases in the training data and process can impact the capabilities of the LLM; really interesting here are some aspects where they could actually be really useful in identifying social biases, but too often we use those models for actual decision making instead (e.g. crime predictions and sentencing suggestions)
LLMs and humans: Various examples how our simplified view of “how things learn” has proven to be wrong, and where modelling the training based on our understanding of how humans learn and behave led to massive milestones in the ML space