projects

https://github.com/avmehta/projects

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (0.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: avmehta
Language: Python
Default Branch: master
Size: 12.7 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 10 years ago · Last pushed over 10 years ago

Metadata Files

Readme Citation

README.md

Projects

Owner

Name: Avi
Login: avmehta
Kind: user
Location: NYC
Company: Columbia University Student

Repositories: 1
Profile: https://github.com/avmehta

Citation (citationextractor.py)

from bs4 import BeautifulSoup
import re
import pandas as pd
from datetime import datetime
import os
"""Extract single line references from text documents. More efficient if method 
to determine position of final reference is known"""
startTime = datetime.now()
d = os.listdir('/Users/avi/Documents/Dec14/pdf/pdftotext')
uid = file['uid']
for x in d:
    x = str(x)
    if x[-4:] == '.txt':
        uid = x[:-4]
        with open(x, encoding='latin-1') as f:
            c = f.readlines()
            leng = len(c)
            leng = round(leng*.5)
            c = c[leng:]
            """print(len(c))
            c = [x for x in c if x != '\n']
            c = [y for y in c if len(y) > 50]"""
            for element in c:
                if 'references' in element.lower():
                    a = c.index(element)
                    c = c[a:]
                    break
            c = [x for x in c if len(x) > 50]
            c = c[:number]
            #new = [(item, val) for item in c]
            #if len(c) < 10:
            #    print(len(c), number, val)
            df = pd.DataFrame(c)
            #vals = str(val)
            newfile = uid + '.csv'
            df.to_csv(newfile)
            path1 = '/Users/Avi/Documents/use/codes/' + newfile
            path2 = '/Users/Avi/Documents/use/codes/refs/' + newfile
            os.rename(path1, path2)    
        

print(datetime.now() - startTime)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science