Recent Releases of Turftopic
Turftopic - v0.17.1
Bugfixes:
- Fixed issues with
label_binarize## Convenience:- Added
Top2VecandBERTopicas separate models, as some users had issues figuring out how to use these in Turftopic - Switched to Poetry 2.0 standard
- - Made plots nicer by switching to Roboto Mono ## Multimodal Features
- Added
- Added a multimodal topic browser with a slider that also allows you to see top documents at the same time.
- Added an image compass to $S^3$
Scientific Software - Peer-reviewed
- Python
Published by x-tabdeveloping 7 months ago
Turftopic - v0.11.0
New in version 0.11.0: Vectorizers Module
You can now use a set of custom vectorizers for topic modeling over phrases, as well as lemmata and stems.
```python from turftopic import KeyNMF from turftopic.vectorizers.spacy import NounPhraseCountVectorizer
model = KeyNMF( ncomponents=10, vectorizer=NounPhraseCountVectorizer("encorewebsm"), ) model.fit(corpus) model.print_topics() ```
| Topic ID | Highest Ranking | | - | - | | | ... | | 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism | | 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index | | | ... |
Turftopic now also comes with a Chinese vectorizer for easier use, as well as a generalist multilingual vectorizer.
```python from turftopic.vectorizers.chinese import defaultchinesevectorizer from turftopic.vectorizers.spacy import TokenCountVectorizer
chinesevectorizer = defaultchinesevectorizer() arabicvectorizer = TokenCountVectorizer("ar", removestopwords=True) danishvectorizer = TokenCountVectorizer("da", remove_stopwords=True) ...
```
Scientific Software - Peer-reviewed
- Python
Published by x-tabdeveloping about 1 year ago
Turftopic - v0.8.0
Automated Topic Naming
Turftopic now allows you to automatically assign human readable names to topics using LLMs or n-gram retrieval!
```python from turftopic import KeyNMF from turftopic.namers import OpenAITopicNamer
model = KeyNMF(10).fit(corpus)
namer = OpenAITopicNamer("gpt-4o-mini") model.renametopics(namer) model.printtopics() ```
| Topic ID | Topic Name | Highest Ranking | | - | - | - | | 0 | Operating Systems and Software | windows, dos, os, ms, microsoft, unix, nt, memory, program, apps | | 1 | Atheism and Belief Systems | atheism, atheist, atheists, belief, religion, religious, theists, beliefs, believe, faith | | 2 | Computer Architecture and Performance | motherboard, ram, memory, cpu, bios, isa, speed, 486, bus, performance | | 3 | Storage Technologies | disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot | | | ... |
Scientific Software - Peer-reviewed
- Python
Published by x-tabdeveloping about 1 year ago
Turftopic - v0.7.0
New in version 0.7.0
Component re-estimation, refitting and topic merging
Some models can now easily be modified after being trained in an efficient manner, without having to recompute all attributes from scratch. This is especially significant for clustering models and $S^3$.
```python from turftopic import SemanticSignalSeparation, ClusteringTopicModel
s3model = SemanticSignalSeparation(5, featureimportance="combined").fit(corpus)
Re-estimating term importances
s3model.estimatecomponents(feature_importance="angular")
Refitting S^3 with a different number of topics (very fast)
s3model.refit(ncomponents=10, random_seed=42)
clustering_model = ClusteringTopicModel().fit(corpus)
Reduces number of topics automatically with a given method
clusteringmodel.reducetopics(nreduceto=20, reduction_method="smallest")
Merge topics manually
clusteringmodel.jointopics([0,3,4,5])
Resets original topics
clusteringmodel.resettopics()
Re-estimates term importances based on a different method
clusteringmodel.estimatecomponents(feature_importance="centroid") ```
Manual topic naming
You can now manually label topics in all models in Turftopic.
```python
you can specify a dict mapping IDs to names
model.rename_topics({0: "New name for topic 0", 5: "New name for topic 5"})
or a list of topic names
model.rename_topics([f"Topic {i}" for i in range(10)]) ```
Saving, loading and publishing to HF Hub
You can now load, save and publish models with dedicated functionality.
```python from turftopic import load_model
model.todisk("outfolder/") model = loadmodel("outfolder/")
model.pushtohub("youruser/modelname") model = loadmodel("youruser/model_name") ```
Scientific Software - Peer-reviewed
- Python
Published by x-tabdeveloping about 1 year ago
Turftopic - v0.4.0
Release Highlights:
1. Online KeyNMF
KeyNMF can now be fitted in an online fashion in batches: ```python from itertools import batched from turftopic import KeyNMF
model = KeyNMF(10, top_n=5)
corpus = ["some string", "etc", ...] for batch in batched(corpus, 200): batch = list(batch) model.partial_fit(batch) ```
2. Precompute keyword matrices in KeyNMF
You can precompute the keyword matrix of KeyNMF models and then use them in training.
python
model.extract_keywords(["Cars are perhaps the most important invention of the last couple of centuries. They have revolutionized transportation in many ways."])
python
[{'transportation': 0.44713873,
'invention': 0.560524,
'cars': 0.5046208,
'revolutionized': 0.3339205,
'important': 0.21803442}]
python
keyword_matrix = model.extract_keywords(corpus)
model.fit(keywords=keyword_matrix)
3. Concept Compass in $S^3$
You can now produce a concept compass figure with $S^3$ similar to that in the paper:
```python from turftopic import SemanticSignalSeparation
model = SemanticSignalSeparation(10).fit(corpus)
You will need to pip install plotly before this.
fig = model.conceptcompass(topicx=1, topic_y=4) fig.show() ```
4. Bugfixes in Dynamic Modeling
Binning is now fixed in dynamic modeling and will create the appropriate number of time slices when asked to. The first time slice is not left out either.
Scientific Software - Peer-reviewed
- Python
Published by x-tabdeveloping over 1 year ago
Turftopic - v0.3.0
Highlight: Dynamic KeyNMF
From version 0.3.0 you can use KeyNMF for dynamic topic modeling: ```python from datetime import datetime from turftopic import KeyNMF
corpus: list[str] = [...] timestamps = list[datetime] = [...]
model = KeyNMF(10) doctopicmatrix = model.fittransformdynamic(corpus, timestamps=timestamps, bins=10)
model.printtopicsover_time()
This needs Plotly: pip install plotly
model.plottopicsover_time() ```
Scientific Software - Peer-reviewed
- Python
Published by x-tabdeveloping over 1 year ago