Natural Language Processing (NLP) — Models, Embeddings, Transformers, and Evaluation
Technical survey of modern NLP: subword tokenization, embeddings, sequence models, transformer architectures, pretraining/fine-tuning, evaluation metrics, and deployment patterns.
Symbolic representation of language models and embeddings (stock image)
Foundational Components
Tokenization: Byte-Pair Encoding (BPE), WordPiece, and SentencePiece produce subword units balancing vocabulary size and OOV handling.
Embeddings: Context-free (word2vec, GloVe) vs. contextual embeddings (ELMo, BERT) where vector representations vary with sentence context.
Transformer Architecture
Self-attention computes pairwise token interactions with scaled dot-product attention. Transformers stack encoder/decoder layers, using multi-head attention and feed-forward blocks with layer normalization and residual connections. Pretraining objectives include masked language modeling and next-token prediction.
Q
K
V
softmax(QKᵀ/√d)
Weighted Sum · V → Output
Training Paradigms
Pretrain on large corpora (self-supervised) and fine-tune on downstream tasks. Transfer learning dominates—foundation models are adapted to classification, QA, summarization, and generation.
Evaluation Metrics & Safety
Metrics: accuracy, F1, BLEU/ROUGE (generation), perplexity (language modeling). Safety: toxicity detection, calibration, prompt robustness, and alignment considerations when deploying generative models.
Deployment
Serving strategies: server-side large model inference with batching and TPU/GPU acceleration; edge/quantized models for on-device inference. Retrieval-augmented generation (RAG) combines retrieval with generative models for up-to-date knowledge.
References
- Vaswani et al., “Attention Is All You Need,” NeurIPS 2017.
- Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers,” 2019.
- Radford et al., OpenAI GPT series papers.
Leave a comment