hadoop-spark-role
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Armin-Smailzade
- License: gpl-3.0
- Language: Shell
- Default Branch: main
- Size: 128 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Hadoop-spark-ansible
- Install Hadoop cluster with ansible.
- JDK is Openjdk-1.8.
- Hadoop version 2.7.2.
- Spark version 2.4.7.
- Hive is the version 2.3.7.
- Postgres database.
Before Install
Update /hosts/host file.
Install Hadoop
- Download Hadoop to any path
- Update the {{ downloadpath }} in vars/varhadoop.yml & varspark.yml & varhive.yml
- Use ansible template to generate the hadoop configration, so If your want to add more properties, just update the vars/var_hadoop.yml
Install Hadoop Master
run shell like
ansible-playbook -i hosts/host master.yml
Install Hadoop Workers
run shell like: ``` masterip: your hadoop master ip masterhostname: your hadoop master hostname
above two variables must be same like your real hadoop master
ansible-playbook -i hosts/host workers.yml -e "masterip=172.16.251.70 masterhostname=hadoop-master"
```
Install Spark Master
run shell like:
ansible-playbook -i hosts/host spark.yml
Install Spark Workers
run shell like:
ansible-playbook -i hosts/host spark_workers.yml
Install Postgres DB
run shell like:
ansible-playbook -i hosts/host postgres.yml
Install Hive
- Create postgres database first and give right authority
- check vars/var_hive.yml ``` ---
hive basic vars
downloadpath: "/home/pippo/Downloads" # your download path hiveversion: "2.3.2" # your hive version hivepath: "/home/hadoop" hiveconfigpath: "/home/hadoop/apache-hive-{{hiveversion}}-bin/conf" hive_tmp: "/home/hadoop/hive/tmp" # your hive tmp path
hivecreatepath: - "{{ hive_tmp }}"
hivewarehouse: "/user/hive/warehouse" # your hdfs path hivescratchdir: "/user/hive/tmp" hivequeryloglocation: "/user/hive/log"
hivehdfspath: - "{{ hivewarehouse }}" - "{{ hivescratchdir }}" - "{{ hivequeryloglocation }}"
hiveloggingoperationloglocation: "{{ hivetmp }}/{{ user }}/operationlogs"
database
dbtype: "postgres" # use your dbtype, default is postgres hiveconnectiondrivername: "org.postgresql.Driver" hiveconnectionhost: "172.16.251.33" hiveconnectionport: "5432" hiveconnectiondbname: "hive" hiveconnectionusername: "hiveuser" hiveconnectionpassword: "nfsetso12fdds9s" hiveconnectionurl: "jdbc:postgresql://{{ hiveconnectionhost }}:{{ hiveconnectionport }}/{{hiveconnection_dbname}}?ssl=false"
hive configration # your hive site properties
hivesiteproperties: - { "name":"hive.metastore.warehouse.dir", "value":"hdfs://{{ masterhostname }}:{{ hdfsport }}{{ hivewarehouse }}" } - { "name":"hive.exec.scratchdir", "value":"{{ hivescratchdir }}" } - { "name":"hive.querylog.location", "value":"{{ hivequeryloglocation }}/hadoop" } - { "name":"javax.jdo.option.ConnectionURL", "value":"{{ hiveconnectionurl }}" } - { "name":"javax.jdo.option.ConnectionDriverName", "value":"{{ hiveconnectiondrivername }}" } - { "name":"javax.jdo.option.ConnectionUserName", "value":"{{ hiveconnectionusername }}" } - { "name":"javax.jdo.option.ConnectionPassword", "value":"{{ hiveconnectionpassword }}" } - { "name":"hive.server2.logging.operation.log.location", "value":"{{ hiveloggingoperationloglocation }}" }
hiveserverport: 10000 # hive port hivehwiport: 9999 hivemetastoreport: 9083
firewallports: - "{{ hiveserverport }}" - "{{ hivehwiport }}" - "{{ hivemetastore_port }}" ```
- check hive.yml
- run it
``` ansible-playbook -i hosts/host hive.yml
```
Owner
- Name: Armin EsmaeilZadeh
- Login: Armin-Smailzade
- Kind: user
- Location: Las Vegas, Nevada
- Website: https://www.linkedin.com/in/arminesmaeilzadeh
- Repositories: 2
- Profile: https://github.com/Armin-Smailzade
Software Engineer
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "YOUR_NAME_HERE" given-names: "YOUR_NAME_HERE" orcid: "https://orcid.org/0000-0000-0000-0000" - family-names: "Lisa" given-names: "Mona" orcid: "https://orcid.org/0000-0000-0000-0000" title: "hadoop-spark-role" version: 1.0.0 doi: 10.5281/zenodo.1234 date-released: 2021-11-01 url: "https://github.com/Armin-Smailzade/hadoop-spark-role"