Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/31755
Title: | Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection |
Authors: | Sam, S Dabreo, SM |
Keywords: | crop recommendation model;random forest;support vector machines;Indian agriculture;exploratory data analysis |
Issue Date: | 27-May-2025 |
Publisher: | Cornell University |
Citation: | Sam, S. and Dabreo, S.M. (2025) 'Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection', arXiv preprint arXiv:2505.21201, pp. 1 - x. doi: 10.48550/arXiv.2505.21201. |
Abstract: | Agriculture constitutes a primary source of food production, economic growth and employment in India, but the sector is confronted with low farm productivity and yields aggravated by increased pressure on natural resources and adverse climate change variability. Efforts involving green revolution, land irrigations, improved seeds and organic farming have yielded suboptimal outcomes. The adoption of computational tools like crop recommendation systems offers a new way to provide insights and help farmers tackle low productivity. However, most agricultural recommendation systems in India focus narrowly on environmental factors and regions, limiting accurate predictions of high-yield, profitable crops. This study uses environmental and economic factors with 19 crops across 15 states to develop and evaluate Random Forest and SVM models using 10-fold Cross Validation, Time-series Split, and Lag Variables. The 10-fold cross validation showed high accuracy (RF: 99.96%, SVM: 94.71%) but raised overfitting concerns. Introducing temporal order, better reflecting real-world conditions, reduced performance (RF: 78.55%, SVM: 71.18%) in the Time-series this http URL further increase the model accuracy while maintaining the temporal order, the Lag Variables approach was employed, which resulted in improved performance (RF: 83.62%, SVM: 74.38%) compared to the 10-fold cross validation approach. Overall, the models in the Time-series Split and Lag Variable Approaches offer practical insights by handling temporal dependencies and enhancing its adaptability to changing agricultural conditions over time. Consequently, the study shows the Random Forest model developed based on the Lag Variables as the most preferred algorithm for optimal crop recommendation in the Indian context. |
Description: | Data availability – The data that support the findings of this study are openly available from three sources. The data: Kaggle (https://www.kaggle.com/datasets/vihith12/crop-yield recommendationdataset) for environmental parameters and the India Directorate of Economics and Statistics (https://eands.dacnet.nic.in/Cost_of_Cultivation.htm) and Farmer’s Portal for economic parameters of cost and price (https://farmer.gov.in/mspstatements.aspx). |
URI: | https://bura.brunel.ac.uk/handle/2438/31755 |
DOI: | https://doi.org/10.48550/arXiv.2505.21201 |
Other Identifiers: | ORCiD: Steven Sam https://orcid.org/0000-0002-4353-6118 |
Appears in Collections: | Dept of Computer Science Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Preprint.pdf | Copyright © 2025 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). | 1.62 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License