et-ai-hybrid

Assistente de Revisões Científicas Híbrido - Resumos Inteligentes e Geração de Citações com IA.

https://github.com/marcosgoncaf/et-ai-hybrid

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Unable to calculate vocabulary similarity
Last synced: 6 months ago · JSON representation ·

Repository

Assistente de Revisões Científicas Híbrido - Resumos Inteligentes e Geração de Citações com IA.

Basic Info
  • Host: GitHub
  • Owner: marcosgoncaf
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 9.77 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

et-ai-hybrid

Assistente de Revisões Científicas Híbrido - Resumos Inteligentes e Geração de Citações com IA.

Owner

  • Login: marcosgoncaf
  • Kind: user

Citation (citation_engine.py)

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

class EnhancedCitationGenerator:
    def __init__(self, pdf_data, num_matches=3):
        self.pdf_data = pdf_data
        self.num_matches = num_matches
        self.sbert = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")

    def _chunk_sentences(self, text):
        return [s.strip() for s in text.split(".") if s.strip()]

    def generate(self, user_text):
        segments = [seg.strip() for seg in user_text.split("/") if seg.strip()]
        emb_u = self.sbert.encode(segments)
        results = {}
        for seg, u in zip(segments, emb_u):
            scores = {}
            for key, info in self.pdf_data.items():
                sents = self._chunk_sentences(info["full_text"])
                emb_s = self.sbert.encode(sents, show_progress_bar=False)
                sims = cosine_similarity([u], emb_s)[0]
                best_idx = int(np.argmax(sims))
                scores[key] = (float(sims[best_idx]), sents[best_idx])
            topk = sorted(scores.items(), key=lambda x: x[1][0], reverse=True)[:self.num_matches]
            refs = [{"source": self.pdf_data[k]["nome"], "score": sc, "excerpt": ex, "page": "N/D"}
                    for k,(sc,ex) in topk]
            results[seg] = refs
        return results

GitHub Events

Total
  • Push event: 3
  • Create event: 2
Last Year
  • Push event: 3
  • Create event: 2

Dependencies

Dockerfile docker
  • python 3.11-slim build
requirements.txt pypi
  • PyMuPDF *
  • bibtexparser *
  • faiss-cpu *
  • huggingface_hub *
  • llama-cpp-python *
  • pandas *
  • sentence-transformers *
  • streamlit *
  • sumy *
  • torch *
  • transformers *