Gensim
Gensim is implemented in Python and Cython for performance. Gensim is designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing.
Clients: Amazon Retail, Cisco Security, Channel 4, Juju, Issuu, 12K Research, Stillwater Supercomputing, SiteGround, Capital One.

Use cases
- Amazon Retail
Document similarity. National Institutes of Health nih Health Processing grants and publications with word2vec.
- Cisco Security Large-scale fraud detection.
Mindseye Legal Similarities in legal documents.
- Channel 4 Media Recommendation engine.
Talentpair HR Candidate matching in high-touch recruiting.
- Juju HR
Provide non-obvious related job suggestions.
- Tailwind Media
Post interesting and relevant content to Pinterest.
- Issuu Media
Gensim’s LDA module lies at the very core of the analysis we perform on each uploaded publication to figure out what it’s all about.
- Search Metrics
Content Marketing Gensim word2vec used for entity disambiguation in Search Engine Optimisation. 1
- 2K Research
Document similarity analysis on media articles.
- Stillwater Supercomputing
Hardware Document comprehension and association with word2vec.
- SiteGround Web hosting
An ensemble search engine which uses different embeddings models and similarities, including word2vec, WMD, and LDA.
- Capital One
Finance Topic modeling for customer complaints exploration.

Benefits
All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core), Intuitive interfaces easy to plug in your own input corpus/datastream (trivial streaming API) easy to extend with other Vector Space algorithms (trivial transformation API) Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.
Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.
