Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection

Sam, S; Dabreo, SM

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/31755

Title:	Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection
Authors:	Sam, S Dabreo, SM
Keywords:	crop recommendation model;random forest;support vector machines;Indian agriculture;exploratory data analysis
Issue Date:	27-May-2025
Publisher:	Cornell University
Citation:	Sam, S. and Dabreo, S.M. (2025) 'Crop recommendation with machine learning: leveraging environmental and economic factors for optimal crop selection', arXiv preprint arXiv:2505.21201, pp. 1 - x. doi: 10.48550/arXiv.2505.21201.
Abstract:	Agriculture constitutes a primary source of food production, economic growth and employment in India, but the sector is confronted with low farm productivity and yields aggravated by increased pressure on natural resources and adverse climate change variability. Efforts involving green revolution, land irrigations, improved seeds and organic farming have yielded suboptimal outcomes. The adoption of computational tools like crop recommendation systems offers a new way to provide insights and help farmers tackle low productivity. However, most agricultural recommendation systems in India focus narrowly on environmental factors and regions, limiting accurate predictions of high-yield, profitable crops. This study uses environmental and economic factors with 19 crops across 15 states to develop and evaluate Random Forest and SVM models using 10-fold Cross Validation, Time-series Split, and Lag Variables. The 10-fold cross validation showed high accuracy (RF: 99.96%, SVM: 94.71%) but raised overfitting concerns. Introducing temporal order, better reflecting real-world conditions, reduced performance (RF: 78.55%, SVM: 71.18%) in the Time-series this http URL further increase the model accuracy while maintaining the temporal order, the Lag Variables approach was employed, which resulted in improved performance (RF: 83.62%, SVM: 74.38%) compared to the 10-fold cross validation approach. Overall, the models in the Time-series Split and Lag Variable Approaches offer practical insights by handling temporal dependencies and enhancing its adaptability to changing agricultural conditions over time. Consequently, the study shows the Random Forest model developed based on the Lag Variables as the most preferred algorithm for optimal crop recommendation in the Indian context.
Description:	Data availability – The data that support the findings of this study are openly available from three sources. The data: Kaggle (https://www.kaggle.com/datasets/vihith12/crop-yield recommendationdataset) for environmental parameters and the India Directorate of Economics and Statistics (https://eands.dacnet.nic.in/Cost_of_Cultivation.htm) and Farmer’s Portal for economic parameters of cost and price (https://farmer.gov.in/mspstatements.aspx).
URI:	https://bura.brunel.ac.uk/handle/2438/31755
DOI:	https://doi.org/10.48550/arXiv.2505.21201
Other Identifiers:	ORCiD: Steven Sam https://orcid.org/0000-0002-4353-6118
Appears in Collections:	Dept of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
Preprint.pdf	Copyright © 2025 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).	1.62 MB	Adobe PDF	View/Open

Show full item record

This item is licensed under a Creative Commons License