A token is not a character or word but a sequence of characters that often appear together. These represent the fundamental unit a LLM works with.

Words, punctuation, and special characters are chunked into tokens using a process called Tokenization.

As an example “unhappiness!” might be chunked as “un”, “happi”, “ness”, and "!".