crop-yield-estimate
Harness the power of machine learning to forecast rice and wheat crop yields per acre in India, aiming to empower smallholder farmers, combat poverty and malnutrition, utilizing data from Digital Green surveys to revolutionize agriculture and promote sustainable practices in the face of climate change for enhanced global food security.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Repository
Harness the power of machine learning to forecast rice and wheat crop yields per acre in India, aiming to empower smallholder farmers, combat poverty and malnutrition, utilizing data from Digital Green surveys to revolutionize agriculture and promote sustainable practices in the face of climate change for enhanced global food security.
Basic Info
- Host: GitHub
- Owner: association-rosia
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://zindi.africa/competitions/digital-green-crop-yield-estimate-challenge
- Size: 22 MB
Statistics
- Stars: 13
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
🌾 Crop Yield Estimate

This challenge focuses on leveraging machine learning to predict rice and wheat crop yields per acre in India, with the aim of empowering smallholder farmers and addressing issues of poverty and malnutrition. The data collected by Digital Green through surveys provides insights into farming practices, environmental conditions, and crop yields. The ultimate goal is to revolutionize Indian agriculture, offer a global model for smallholder farmers, and contribute to sustainable farming practices amid climate change, thereby advancing global food security.
This project was made possible by our compute partners 2CRSi and NVIDIA.
🏆 Challenge ranking
The score of the challenge was the RMSE.
Our solution was the best one (out of 678 teams) with a RMSE equal to 100.3610312 🎉.
The podium:
🥇 RosIA - 100.3610312
🥈 ihar - 100.6819477
🥉 belkasanek - 102.4325999
🛠️ Data processing
Pre-processing

GReaT
We used GReaT LLM (GPT-2) implementation to generate new observations and impute missing values.
Here is how this proposed method works to create and impute data:

🏛️ Model architecture

#️⃣ Command lines
Launch a training
bash
python src/models/train_model.py --estimator_name <estimator_name> --task <task> --nb_agents <nb_agents>
View project's runs on WandB.
Create a submission
bash
python src/models/predict_model.py --ensemble_strategy <ensemble_strategy> --class_id <class_id_1> <class_id_2> <class_id_3> --low_id <low_id_1> <low_id_2> <low_id_3> --medium_id <medium_id_1> <medium_id_2> <medium_id_3> --high_id <high_id_1> <high_id_2> <high_id_3>
🔬 References
Hwang, Y., & Song, J. (2023). Recent deep learning methods for tabular data. Communications for Statistical Applications and Methods, 30(2), 215-226.
Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84-90.
Kavita, M., & Mathur, P. (2020, October). Crop yield estimation in India using machine learning. In 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA) (pp. 220-224). IEEE.
Borisov, V., Seßler, K., Leemann, T., Pawelczyk, M., & Kasneci, G. (2022). Language models are realistic tabular data generators. arXiv preprint arXiv:2210.06280.
📝 Citing
@misc{UrgellReberga:2023,
Author = {Baptiste Urgell and Louis Reberga},
Title = {Crop Yield Estimate},
Year = {2023},
Publisher = {GitHub},
Journal = {GitHub repository},
Howpublished = {\url{https://github.com/association-rosia/crop-yield-estimate}}
}
🛡️ License
Project is distributed under MIT License
👨🏻💻 Contributors
Owner
- Name: RosIA
- Login: association-rosia
- Kind: organization
- Location: France
- Twitter: AssoRosIA
- Repositories: 1
- Profile: https://github.com/association-rosia
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "URGELL" given-names: "Baptiste" - family-names: "REBERGA" given-names: "Louis" title: "Crop Yield Estimate" publisher: "Github" year: "2023" version: 1.0 date-released: 2023-4-9 url: "https://github.com/association-rosia/crop-yield-estimate" data: "Digital Green Crop Yield Estimate Challenge"
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Dependencies
- GPUtil *
- be-great *
- catboost *
- imbalanced-learn *
- jupyter *
- lightgbm *
- matplotlib *
- numpy *
- pandas *
- plotly *
- scikit-learn ==1.2
- umap-learn *
- wandb *
- xgboost *