Recent Releases of Turftopic

Turftopic - v.0.17.2

Refactored multimodal encoding and fixed issues.

Scientific Software - Peer-reviewed - Python
Published by x-tabdeveloping 7 months ago

Turftopic - v0.17.1

Bugfixes:

  • Fixed issues with label_binarize ## Convenience:
    • Added Top2Vec and BERTopic as separate models, as some users had issues figuring out how to use these in Turftopic
    • Switched to Poetry 2.0 standard
    • - Made plots nicer by switching to Roboto Mono ## Multimodal Features
  • Added a multimodal topic browser with a slider that also allows you to see top documents at the same time. newplot (6)
  • Added an image compass to $S^3$

Scientific Software - Peer-reviewed - Python
Published by x-tabdeveloping 7 months ago

Turftopic - v0.11.0

New in version 0.11.0: Vectorizers Module

You can now use a set of custom vectorizers for topic modeling over phrases, as well as lemmata and stems.

```python from turftopic import KeyNMF from turftopic.vectorizers.spacy import NounPhraseCountVectorizer

model = KeyNMF( ncomponents=10, vectorizer=NounPhraseCountVectorizer("encorewebsm"), ) model.fit(corpus) model.print_topics() ```

| Topic ID | Highest Ranking | | - | - | | | ... | | 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism | | 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index | | | ... |

Turftopic now also comes with a Chinese vectorizer for easier use, as well as a generalist multilingual vectorizer.

```python from turftopic.vectorizers.chinese import defaultchinesevectorizer from turftopic.vectorizers.spacy import TokenCountVectorizer

chinesevectorizer = defaultchinesevectorizer() arabicvectorizer = TokenCountVectorizer("ar", removestopwords=True) danishvectorizer = TokenCountVectorizer("da", remove_stopwords=True) ...

```

Scientific Software - Peer-reviewed - Python
Published by x-tabdeveloping about 1 year ago

Turftopic - v0.8.0

Automated Topic Naming

Turftopic now allows you to automatically assign human readable names to topics using LLMs or n-gram retrieval!

```python from turftopic import KeyNMF from turftopic.namers import OpenAITopicNamer

model = KeyNMF(10).fit(corpus)

namer = OpenAITopicNamer("gpt-4o-mini") model.renametopics(namer) model.printtopics() ```

| Topic ID | Topic Name | Highest Ranking | | - | - | - | | 0 | Operating Systems and Software | windows, dos, os, ms, microsoft, unix, nt, memory, program, apps | | 1 | Atheism and Belief Systems | atheism, atheist, atheists, belief, religion, religious, theists, beliefs, believe, faith | | 2 | Computer Architecture and Performance | motherboard, ram, memory, cpu, bios, isa, speed, 486, bus, performance | | 3 | Storage Technologies | disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot | | | ... |

Scientific Software - Peer-reviewed - Python
Published by x-tabdeveloping about 1 year ago

Turftopic - v0.7.0

New in version 0.7.0

Component re-estimation, refitting and topic merging

Some models can now easily be modified after being trained in an efficient manner, without having to recompute all attributes from scratch. This is especially significant for clustering models and $S^3$.

```python from turftopic import SemanticSignalSeparation, ClusteringTopicModel

s3model = SemanticSignalSeparation(5, featureimportance="combined").fit(corpus)

Re-estimating term importances

s3model.estimatecomponents(feature_importance="angular")

Refitting S^3 with a different number of topics (very fast)

s3model.refit(ncomponents=10, random_seed=42)

clustering_model = ClusteringTopicModel().fit(corpus)

Reduces number of topics automatically with a given method

clusteringmodel.reducetopics(nreduceto=20, reduction_method="smallest")

Merge topics manually

clusteringmodel.jointopics([0,3,4,5])

Resets original topics

clusteringmodel.resettopics()

Re-estimates term importances based on a different method

clusteringmodel.estimatecomponents(feature_importance="centroid") ```

Manual topic naming

You can now manually label topics in all models in Turftopic.

```python

you can specify a dict mapping IDs to names

model.rename_topics({0: "New name for topic 0", 5: "New name for topic 5"})

or a list of topic names

model.rename_topics([f"Topic {i}" for i in range(10)]) ```

Saving, loading and publishing to HF Hub

You can now load, save and publish models with dedicated functionality.

```python from turftopic import load_model

model.todisk("outfolder/") model = loadmodel("outfolder/")

model.pushtohub("youruser/modelname") model = loadmodel("youruser/model_name") ```

Scientific Software - Peer-reviewed - Python
Published by x-tabdeveloping about 1 year ago

Turftopic - v0.4.0

Release Highlights:

1. Online KeyNMF

KeyNMF can now be fitted in an online fashion in batches: ```python from itertools import batched from turftopic import KeyNMF

model = KeyNMF(10, top_n=5)

corpus = ["some string", "etc", ...] for batch in batched(corpus, 200): batch = list(batch) model.partial_fit(batch) ```

2. Precompute keyword matrices in KeyNMF

You can precompute the keyword matrix of KeyNMF models and then use them in training.

python model.extract_keywords(["Cars are perhaps the most important invention of the last couple of centuries. They have revolutionized transportation in many ways."])

python [{'transportation': 0.44713873, 'invention': 0.560524, 'cars': 0.5046208, 'revolutionized': 0.3339205, 'important': 0.21803442}]

python keyword_matrix = model.extract_keywords(corpus) model.fit(keywords=keyword_matrix)

3. Concept Compass in $S^3$

You can now produce a concept compass figure with $S^3$ similar to that in the paper:

```python from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(10).fit(corpus)

You will need to pip install plotly before this.

fig = model.conceptcompass(topicx=1, topic_y=4) fig.show() ```

4. Bugfixes in Dynamic Modeling

Binning is now fixed in dynamic modeling and will create the appropriate number of time slices when asked to. The first time slice is not left out either.

Scientific Software - Peer-reviewed - Python
Published by x-tabdeveloping over 1 year ago

Turftopic - v0.3.0

Highlight: Dynamic KeyNMF

From version 0.3.0 you can use KeyNMF for dynamic topic modeling: ```python from datetime import datetime from turftopic import KeyNMF

corpus: list[str] = [...] timestamps = list[datetime] = [...]

model = KeyNMF(10) doctopicmatrix = model.fittransformdynamic(corpus, timestamps=timestamps, bins=10)

model.printtopicsover_time()

This needs Plotly: pip install plotly

model.plottopicsover_time() ```

Scientific Software - Peer-reviewed - Python
Published by x-tabdeveloping over 1 year ago