Tell text files apart by extension, without extra dependencies
When it’s useful
Pipelines, upload handlers, and developer tools often need a quick answer: “Does this path look like text?” Guessing from content is heavy; maintaining your own extension list drifts out of date. This library answers from a curated set of known text extensions so you can branch logic early.
What you can do
- Check an extension or a full path for a text type, with case-insensitive, dot-aware matching (including dotfiles such as
.gitignore). - Rely on a large, immutable list (300+ text extensions covering source, markup, configs, data formats, and docs, aligned with the widely used
text-extensionsnpm list this project ports). - Look up in constant time via a
frozensetso membership checks stay cheap at scale. - Import the raw sets (
TEXT_EXTENSIONS,TEXT_EXTENSIONS_LOWER) when you need custom rules on top of the defaults.
Limits and fit
Classification is by extension and path shape, not by reading file bytes or detecting encodings. If you need to know whether a file is actually UTF-8 text, combine this with encoding checks or parsers. Python 3.8+; install from PyPI (text-extensions). API details, changelog, and tests live in the GitHub repository.



