Unsupervised pre-training is a term from the fields of Artificial Intelligence and Digital Transformation. It describes a method by which computers learn independently from large amounts of data, without a human explicitly telling them what is right or wrong. The goal is for the systems to discover correlations and structures in the data so they can be used for various tasks later on.
Imagine a clever computer reading millions of texts on the internet to understand the German language. During unsupervised pre-training, the computer is given these texts, but no one tells it what a „dog“ is, for example. The system therefore looks for patterns on its own – for instance, that the word „Hund“ (dog) often appears alongside „bellen“ (to bark) – and stores this knowledge.
Later, the trained system can be used for specific tasks, such as writing automatically generated texts or answering questions. For example, unsupervised pre-training is often used nowadays for voice assistants like Siri or Alexa to make them more intelligent and provide answers that are oriented towards actual language and user needs.













