Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Armin-Smailzade
  • License: gpl-3.0
  • Language: Shell
  • Default Branch: main
  • Size: 128 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 5 years ago · Last pushed over 4 years ago
Metadata Files
Readme License Citation

README.md

Hadoop-spark-ansible

  • Install Hadoop cluster with ansible.
  • JDK is Openjdk-1.8.
  • Hadoop version 2.7.2.
  • Spark version 2.4.7.
  • Hive is the version 2.3.7.
  • Postgres database.

Before Install

Update /hosts/host file.

Install Hadoop

  1. Download Hadoop to any path
  2. Update the {{ downloadpath }} in vars/varhadoop.yml & varspark.yml & varhive.yml
  3. Use ansible template to generate the hadoop configration, so If your want to add more properties, just update the vars/var_hadoop.yml

Install Hadoop Master

run shell like

ansible-playbook -i hosts/host master.yml

Install Hadoop Workers

run shell like: ``` masterip: your hadoop master ip masterhostname: your hadoop master hostname

above two variables must be same like your real hadoop master

ansible-playbook -i hosts/host workers.yml -e "masterip=172.16.251.70 masterhostname=hadoop-master"

```

Install Spark Master

run shell like: ansible-playbook -i hosts/host spark.yml

Install Spark Workers

run shell like: ansible-playbook -i hosts/host spark_workers.yml

Install Postgres DB

run shell like: ansible-playbook -i hosts/host postgres.yml

Install Hive

  1. Create postgres database first and give right authority
  2. check vars/var_hive.yml ``` ---

hive basic vars

downloadpath: "/home/pippo/Downloads" # your download path hiveversion: "2.3.2" # your hive version hivepath: "/home/hadoop" hiveconfigpath: "/home/hadoop/apache-hive-{{hiveversion}}-bin/conf" hive_tmp: "/home/hadoop/hive/tmp" # your hive tmp path

hivecreatepath: - "{{ hive_tmp }}"

hivewarehouse: "/user/hive/warehouse" # your hdfs path hivescratchdir: "/user/hive/tmp" hivequeryloglocation: "/user/hive/log"

hivehdfspath: - "{{ hivewarehouse }}" - "{{ hivescratchdir }}" - "{{ hivequeryloglocation }}"

hiveloggingoperationloglocation: "{{ hivetmp }}/{{ user }}/operationlogs"

database

dbtype: "postgres" # use your dbtype, default is postgres hiveconnectiondrivername: "org.postgresql.Driver" hiveconnectionhost: "172.16.251.33" hiveconnectionport: "5432" hiveconnectiondbname: "hive" hiveconnectionusername: "hiveuser" hiveconnectionpassword: "nfsetso12fdds9s" hiveconnectionurl: "jdbc:postgresql://{{ hiveconnectionhost }}:{{ hiveconnectionport }}/{{hiveconnection_dbname}}?ssl=false"

hive configration # your hive site properties

hivesiteproperties: - { "name":"hive.metastore.warehouse.dir", "value":"hdfs://{{ masterhostname }}:{{ hdfsport }}{{ hivewarehouse }}" } - { "name":"hive.exec.scratchdir", "value":"{{ hivescratchdir }}" } - { "name":"hive.querylog.location", "value":"{{ hivequeryloglocation }}/hadoop" } - { "name":"javax.jdo.option.ConnectionURL", "value":"{{ hiveconnectionurl }}" } - { "name":"javax.jdo.option.ConnectionDriverName", "value":"{{ hiveconnectiondrivername }}" } - { "name":"javax.jdo.option.ConnectionUserName", "value":"{{ hiveconnectionusername }}" } - { "name":"javax.jdo.option.ConnectionPassword", "value":"{{ hiveconnectionpassword }}" } - { "name":"hive.server2.logging.operation.log.location", "value":"{{ hiveloggingoperationloglocation }}" }

hiveserverport: 10000 # hive port hivehwiport: 9999 hivemetastoreport: 9083

firewallports: - "{{ hiveserverport }}" - "{{ hivehwiport }}" - "{{ hivemetastore_port }}" ```

  1. check hive.yml
  2. run it

``` ansible-playbook -i hosts/host hive.yml

```

Owner

  • Name: Armin EsmaeilZadeh
  • Login: Armin-Smailzade
  • Kind: user
  • Location: Las Vegas, Nevada

Software Engineer

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "YOUR_NAME_HERE"
  given-names: "YOUR_NAME_HERE"
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Lisa"
  given-names: "Mona"
  orcid: "https://orcid.org/0000-0000-0000-0000"
title: "hadoop-spark-role"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2021-11-01
url: "https://github.com/Armin-Smailzade/hadoop-spark-role"

GitHub Events

Total
Last Year