Thai-English Full-Text Search API — FastAPI + PostgreSQL + PyThaiNLP
Thai-English Full-Text Search API
A ready-to-use full-text search API that supports both Thai and English. Built with FastAPI, PostgreSQL tsvector, and PyThaiNLP — handles Thai word segmentation out of the box.
What's Included
- 4 core Python modules — main.py, database.py, models.py, nlp_utils.py
- 10 API endpoints — Article CRUD, full-text search with pagination, custom dictionary management, CSV import, and re-index
- Auto schema creation — Tables and GIN indexes are created automatically on startup
- HTML documentation — Comprehensive single-page docs with architecture diagrams and API reference
- Example CSV template — Sample custom dictionary file for bulk import
Tech Stack
- FastAPI (async REST API framework)
- PostgreSQL tsvector + GIN index (full-text search engine)
- PyThaiNLP newmm engine (Thai word segmentation)
- SQLAlchemy 2.0 + asyncpg (async database access with connection pooling)
- python-dotenv (environment configuration)
Key Features
- Thai + English full-text search with relevance ranking (ts_rank)
- Thai word segmentation using PyThaiNLP with ~60,000 base dictionary words
- Custom dictionary management via API — add, delete, batch add, CSV import
- In-memory Trie cache for fast tokenization
- GIN index for high-performance search queries
- HTML tag stripping and Thai text normalization
- Pagination support (limit/offset)
- Re-index endpoint to update search vectors after dictionary changes
- Auto schema creation on startup — no manual SQL needed
- Async throughout — non-blocking database access with connection pooling
API Endpoints (10 total)
- POST /articles/ — Create article with auto-tokenized search vector
- GET /search/?q= — Full-text search with pagination and relevance ranking
- GET /custom-words/ — List all custom dictionary words
- POST /custom-words/ — Add a single custom word
- POST /custom-words/batch — Add multiple words at once
- POST /custom-words/import — Import words from CSV file
- DELETE /custom-words/{word} — Delete a custom word
- POST /reindex/ — Re-tokenize all articles after dictionary changes
- GET /docs — Swagger UI (auto-generated)
- GET /redoc — ReDoc (auto-generated)
How Thai Search Works
Thai text has no spaces between words, so full-text search requires word segmentation. This API uses PyThaiNLP to tokenize Thai text into space-separated words, then indexes them with PostgreSQL's tsvector using the simple configuration (no stemming that would break Thai tokens). Custom words can be added via API for domain-specific terms like "บิตคอยน์", "ฟูลเท็กซ์เสิร์ช", etc.
Perfect For
- Thai content platforms that need search functionality
- News or blog sites with Thai articles
- E-commerce product search in Thai
- Internal knowledge base search
- Any application requiring Thai full-text search with custom vocabulary
Ready-to-deploy Thai search API — just connect your PostgreSQL and go.