hadoop-spark-role

https://github.com/armin-smailzade/hadoop-spark-role

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: Armin-Smailzade
License: gpl-3.0
Language: Shell
Default Branch: main
Size: 128 KB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 5 years ago · Last pushed over 4 years ago

Metadata Files

Readme License Citation

Hadoop-spark-ansible

Install Hadoop cluster with ansible.
JDK is Openjdk-1.8.
Hadoop version 2.7.2.
Spark version 2.4.7.
Hive is the version 2.3.7.
Postgres database.

Before Install

Update /hosts/host file.

Install Hadoop

Download Hadoop to any path
Update the {{ downloadpath }} in vars/varhadoop.yml & varspark.yml & varhive.yml
Use ansible template to generate the hadoop configration, so If your want to add more properties, just update the vars/var_hadoop.yml

Install Hadoop Master

run shell like

ansible-playbook -i hosts/host master.yml

Install Hadoop Workers

run shell like: ``` masterip: your hadoop master ip masterhostname: your hadoop master hostname

above two variables must be same like your real hadoop master

ansible-playbook -i hosts/host workers.yml -e "masterip=172.16.251.70 masterhostname=hadoop-master"

```

Install Spark Master

run shell like: ansible-playbook -i hosts/host spark.yml

Install Spark Workers

run shell like: ansible-playbook -i hosts/host spark_workers.yml

Install Postgres DB

run shell like: ansible-playbook -i hosts/host postgres.yml

Install Hive

Create postgres database first and give right authority
check vars/var_hive.yml ``` ---

hive basic vars

downloadpath: "/home/pippo/Downloads" # your download path hiveversion: "2.3.2" # your hive version hivepath: "/home/hadoop" hiveconfigpath: "/home/hadoop/apache-hive-{{hiveversion}}-bin/conf" hive_tmp: "/home/hadoop/hive/tmp" # your hive tmp path

hivecreatepath: - "{{ hive_tmp }}"

hivewarehouse: "/user/hive/warehouse" # your hdfs path hivescratchdir: "/user/hive/tmp" hivequeryloglocation: "/user/hive/log"

hivehdfspath: - "{{ hivewarehouse }}" - "{{ hivescratchdir }}" - "{{ hivequeryloglocation }}"

hiveloggingoperationloglocation: "{{ hivetmp }}/{{ user }}/operationlogs"

database

dbtype: "postgres" # use your dbtype, default is postgres hiveconnectiondrivername: "org.postgresql.Driver" hiveconnectionhost: "172.16.251.33" hiveconnectionport: "5432" hiveconnectiondbname: "hive" hiveconnectionusername: "hiveuser" hiveconnectionpassword: "nfsetso12fdds9s" hiveconnectionurl: "jdbc:postgresql://{{ hiveconnectionhost }}:{{ hiveconnectionport }}/{{hiveconnection_dbname}}?ssl=false"

hive configration # your hive site properties

hivesiteproperties: - { "name":"hive.metastore.warehouse.dir", "value":"hdfs://{{ masterhostname }}:{{ hdfsport }}{{ hivewarehouse }}" } - { "name":"hive.exec.scratchdir", "value":"{{ hivescratchdir }}" } - { "name":"hive.querylog.location", "value":"{{ hivequeryloglocation }}/hadoop" } - { "name":"javax.jdo.option.ConnectionURL", "value":"{{ hiveconnectionurl }}" } - { "name":"javax.jdo.option.ConnectionDriverName", "value":"{{ hiveconnectiondrivername }}" } - { "name":"javax.jdo.option.ConnectionUserName", "value":"{{ hiveconnectionusername }}" } - { "name":"javax.jdo.option.ConnectionPassword", "value":"{{ hiveconnectionpassword }}" } - { "name":"hive.server2.logging.operation.log.location", "value":"{{ hiveloggingoperationloglocation }}" }

hiveserverport: 10000 # hive port hivehwiport: 9999 hivemetastoreport: 9083

firewallports: - "{{ hiveserverport }}" - "{{ hivehwiport }}" - "{{ hivemetastore_port }}" ```

check hive.yml
run it

``` ansible-playbook -i hosts/host hive.yml

```

Owner

Name: Armin EsmaeilZadeh
Login: Armin-Smailzade
Kind: user
Location: Las Vegas, Nevada

Website: https://www.linkedin.com/in/arminesmaeilzadeh
Repositories: 2
Profile: https://github.com/Armin-Smailzade

Software Engineer

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "YOUR_NAME_HERE"
  given-names: "YOUR_NAME_HERE"
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Lisa"
  given-names: "Mona"
  orcid: "https://orcid.org/0000-0000-0000-0000"
title: "hadoop-spark-role"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2021-11-01
url: "https://github.com/Armin-Smailzade/hadoop-spark-role"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science