What is Named Entity Recognition
Extracting named entities from text
Named Entity Recognition (NER) is an NLP task for automatically identifying and classifying named entities in text: names of people, organizations, geographical locations, dates, monetary amounts, and other categories.
Entity Types
- PER — person names (John Smith, Elon Musk)
- ORG — organizations (Google, Apple, UN)
- LOC — locations (New York, USA, Mount Everest)
- DATE — dates and times (January 1, 2024, yesterday)
- MONEY — monetary amounts ($100, 5000 EUR)
- PRODUCT — products (iPhone 15, Tesla Model 3)
NER Methods
- Rules and dictionaries — basic approach with regular expressions
- Machine learning — CRF, SVM on labeled data
- Deep learning — BiLSTM-CRF, BERT, RoBERTa
- Transfer learning — fine-tuning pre-trained models
Applications
- Search engines and information retrieval
- Chatbots and virtual assistants
- News analysis and media monitoring
- Data extraction from documents
- Compliance and sanctions list checking
Libraries and Tools
- spaCy — fast NLP with built-in NER
- NLTK — classic NLP library
- Hugging Face Transformers — BERT models for NER
- Stanford NER — Java library
- Flair — state-of-the-art NLP
Quality Metrics
- Precision — recognition accuracy
- Recall — completeness (how many entities found)
- F1-score — harmonic mean of precision and recall
- Entity-level vs Token-level — evaluation at entity or token level
Challenges
- Homonymy (Apple — company or fruit?)
- Nested entities (University of California, Los Angeles)
- Rare and emerging entities
- Multilingual support