The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
#自然语言处理#Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.