Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Repository
Transform csv to txt
Basic Info
- Host: GitHub
- Owner: UUDigitalHumanitieslab
- License: other
- Language: Python
- Default Branch: master
- Size: 10.7 KB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 1
- Releases: 3
Metadata Files
README.md
CSV to TXT transformer
This is a simple python script that can transform csv files to txt files. By default it will create files which contain all but one fields in the filename, and the data in the last column as content of the file. There is, however, also the option to aggregate rows into separate files, based in unique values in a column.
The script has two basic required, command line options, and relies heavily on two conventions with regard to the source files: filename conventions, and data conventions.
Requirements
Pyhton 3. (Developed against 3.6 but other version should work)
Command line options
| Command | Description |
| ------- | ---------- |
|--source, -s | Either a file or folder full of files that will be transformed |
| --target, -t | The target folder where the new txt files will be placed. Will be created if it does not exist |
| --delimiter, -d | Optional. The symbol that delimits the fields in the source file. Defaults to ',' |
| --agg_col, -ac | Optional. The column to aggregate on. Can be either the name of the column as it appears in the top row, or a zero-based index. Rows will be divided into files named after unique values in this column |
| --help, -h | Display help menu |
Source filename conventions
The script uses the source filenames to store the new .txt files per source in a separate folder, and to give these files names that are retraceable to the csv. To facilitate this, filenames should start with a field indicating the source (e.g. GR for GoodReads.com) followed by an underscore (e.g. GR_). After that any numbers of fields may exist, separated by underscores (e.g. GR_whatever_field_and_as_many_as_you_like.csv). However, the last field should contain some unique identifier to ensure strict separation of files and retraceability.
Some examples of correct filenames:
txt
GR_myreviews_NL.csv
GR_my_other_reviews_EN.csv
GR_my_other_reviews_EN001.csv
GR_my_other_reviews_EN002.csv
GR_myreviews_ALL.csv
If these files were in the source folder together you would end up with a directory tree like this:
txt
target_folder
├── GR_NL
| └── GR_NL_whatever_metadata_fields_were_found.txt
| └── GR_NL_whatever_metadata_fields_were_found.txt
├── GR_EN
| └── GR_EN_whatever_metadata_fields_were_found.txt
└── GR_ALL
└── GR_ALL_whatever_metadata_fields_were_found.txt
Note how the last example is not a language indicator per se, any unique value here would do.
Also, and importantly, the .txts from files called GR_one_NL.csv, GR_two_NL, and GR_three_NL will all end up in the same folder, so try to avoid that.
Aggregating by column
If you supply the --agg_col argument, the above still counts. However, the metadata fields will be lost, i.e. they are not stored anywhere.
Instead what you get is a number of files aggregated into files with unique values found in the column you supplied. Example output:
txt
target_folder
└── GR_NL
└── unique_value_1.txt
└── unique_value_2.txt
...
Note that the file unique_value_1.txt contains all the texts (i.e. very last column) from rows that have unique_value_1 in the column you provided. A file will be created for every value found.
CAUTION: when running the script to aggregate, it appends texts into existing files. Therefore, if you run the script multiple times with the same output folder, you might mess up your data, because the same text could be appended multiple times into the same file.
Data conventions
The data in the csv should be structured as follows:
1) The first row holds title fields. This row is used by the script simply to count the number of columns but is ignored otherwise. So, if you're csv doesn't contain a title row, insert a line with the correct amount of columns to make sure the script does as promised. (Perhaps simply copy-pasting the top line of the file is a good option?)
2) The very last column contains the data that should be written to the txt.
Owner
- Name: UU Digital Humanities Lab
- Login: UUDigitalHumanitieslab
- Kind: organization
- Email: digitalhumanities@uu.nl
- Location: Utrecht
- Website: https://cdh.uu.nl/rsl/
- Repositories: 102
- Profile: https://github.com/UUDigitalHumanitieslab
Research Software Lab · Centre for Digital Humanities · Utrecht University
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: csv-to-txt
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- name: >-
Research Software Lab, Centre for Digital Humanities,
Utrecht University
city: Utrecht
country: NL
website: >-
https://cdh.uu.nl/centre-for-digital-humanities/research-software-lab/
identifiers:
- type: doi
value: 10.5281/zenodo.10569032
repository-code: 'https://github.com/UUDigitalHumanitieslab/csv-to-txt'
abstract: >-
This is a simple python script that can transform csv
files to txt files.
By default it will create files which contain all but one fields in the filename, and the data in the last column as content of the file.
There is, however, also the option to aggregate rows into separate files, based in unique values in a column.
license: BSD-3-Clause
version: 0.1.2
date-released: '2024-01-25'
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- JosedeKruif (3)