A token is not a character or word but a sequence of characters that often appear together. These represent the fundamental unit a LLM works with.
Words, punctuation, and special characters are chunked into tokens using a process called Tokenization.
As an example “unhappiness!” might be chunked as “un”, “happi”, “ness”, and "!".