Predicting Ag Harvest using ArcGIS and Machine Learning

2025-02-28 13:40:33 英文原文

作者：By Aidan Thurling

A long time ago… a hunter-gatherer planted a seed in the ground and, thus, agriculture was born.

And ever since then, farmers have been analyzing patterns and drawing their own correlations in an effort to predict and control their crop’s performance. For example, the ancient Egyptians carefully planned their planting and harvest around the flooding patterns of the Nile River. Medieval farmers rotated their crops to maximize yield. And within the last few centuries, growers have used tools such as the Old Farmer’s Almanac to predict each year’s growing season conditions.

Today, advanced analytics and modeling techniques allow us to accurately forecast crop performance. These predictions assist agriculture retailers and distributers in planning their transportation and storage logistics, enable agricultural insurance companies to make informed loss estimates, and they help agronomists make decisions about crop varieties and fertility treatments. We can train accurate machine learning models using potential environmental variables that go far beyond the scale that a single farmer could access alone. We can discover and derive advanced metrics about our fields using technologies such as satellite imagery, soil analysis, elevation models, and more. And then we take all those factors that we suspect will influence our yield and combine them with historic harvest data to build a machine learning model that can provide us with accurate, timely, and scalable predictions.

Let’s Train a Yield Forecasting Model…

As a member of Esri’s commercial agriculture team, I decided to run through the exercise of training a machine-learning model to forecast sugarcane yield using ArcGIS Pro. The first step to creating a yield forecasting model is to establish which variables you have access to. For this exercise, I used sugarcane field boundaries, historic sugarcane harvest data for my boundaries (7 years), irrigation method, cut stage, regional soil data, climatic data from TerraClimate, and Sentinel-2 multispectral satellite imagery.

Sugarcane NDVI — Normalized Difference Vegetation Index (NDVI)

I derived the Normalized Difference Vegetation Index (NDVI) from the satellite imagery which is a measure of relative biomass that can be used as a proxy for vegetative health. Being able to quantify the health of the sugarcane on a field by field basis as we move through the growing season is paramount to being able to estimate the ultimate yield of the crop.
I created a multidimensional mosaic dataset using 60 different “snapshots” of NDVI for my area of interest as well as a multidimensional mosaic dataset for the annual climatic data.
I used Zonal Statistics to extract and subsequently join a table of rasterized explanatory variable means across all time periods based on sugarcane field boundary layer

Managing the Satellite Imagery

Because Sentinel-2 imagery is captured so frequently, one aspect that had to be taken into consideration was the best timeframe to use when using NDVI as an explanatory variable to the resulting yield. Because sugarcane in the region of interest has a 12-month growing season, I opted to use imagery from 6 months prior to the field’s harvest date (+/- 1 month depending on imagery availability). This would ensure that the model could be run during the middle of the growing season as opposed to the brief time period prior to harvest (which would not be very helpful for predicting yield if harvest was about to happen anyway!). Ultimately, this resulted in a cumulative field boundary layer containing 60 “snapshots” across a 7-year-period or 1,936 distinct field records containing all the explanatory variables surrounding that record’s actual harvest date.

ArcGIS Pro provides many different machine learning tools and methodologies. Most of them are as simple as a geoprocessing tool where you can input your training data and specify the variable you wish to predict (in our case, the yield value) along with the variables that you suspect influence yield, such as precipitation.

Random Forest Regression

For this exercise, I used a random forest regression algorithm to train a model and determine which variables influence yield the most. You can learn more about the ArcGIS geoprocessing tool I used here. Three versions of the model were trained. The first model used the random-forest algorithm with 100 trees. The second model used the random-forest algorithm with 1000 trees. And the third model used the Extreme Gradient Boosting (XGBoost) algorithm with 1000 trees. The most accurate result came from the third model.

Sugarcane pivots showing model predicted versus actual yield means — Sugarcane pivots showing the model's predicted versus actual yield means for 2022

Not only did the model create successfully trained features, but it also performed well when run independently on the 2022 field data as well. ArcGIS also output many supplemental details about the regression analysis itself and how each variable fared during the model’s training. The chart of variable importance below illustrates that of all the variables I incorporated into my training data, the most important variables for predicting sugarcane yield in this region of interest are the field’s NDVI mean six months prior to harvest, its irrigation method, solar radiance, and the soil moisture.

Summary of Variable Importance Chart — NDVI Mean for each field was the most important variable in predicting yield

From a temporal perspective, I also compared the predicted annual yield means to the actual annual yield means for all my fields across nearly seven years. This illustrates how closely aligned the prediction is with the actual yield results.

Predicted versus actual harvest values over harvest date line graph — Predicted annual yield means compared to actual annual yield means over time

It is important to bear in mind that your own yield forecasting model will only be as accurate as the quality and quantity of data you’re able to procure. A model cannot accurately predict circumstances that it was never trained on. For example, a yield forecasting model that was trained on Iowa corn yield data will likely not be able to accurately forecast corn yield in North Carolina. Ultimately, the best yield forecasting models will be trained on thousands of high-resolution inputs spanning multiple years and many different variables. That includes decisions made by the grower and the grower’s team such as seed variety, fertility application, or planting rate as well as environmental variables that are not as easily controlled, such as rainfall and temperature.

I hope this case study excites you as much as it excites me. The opportunity to forecast yield values can provide invaluable insight from year to year— It enables growers to predict profit margins and strategically adjust their field management over the course of the growing season. It provides agronomists, retailers, and distributers with the data needed to make informed and cost-saving logistical decisions. It can even help entire nations effectively identify and plan actions against food insecurity.

We can think back to our farmer friends of both today and yesteryear and seek not to replace their centuries of collective traditions and methodologies, but rather to complement and augment tried-and-true farming practices with cutting-edge technology and advanced analytics made simple by ArcGIS.

natural resources imagery & remote sensing agriculture arcgis pro harvest imagery machine learning arcgis pro

Aidan Thurling

As a Solution Engineer on Esri's Natural Resources team, Aidan works on geospatial workflows within the agriculture, forestry, and energy industries. She holds a BS in Biology (Ecology) from Cal Poly SLO and a Masters in GIS Tech from NC State. She has experience in geospatial data science & analytics, web GIS, remote sensing, process automation, and parallel parking large vehicles.

关于《Predicting Ag Harvest using ArcGIS and Machine Learning》的评论

暂无评论

发表评论

摘要

Advanced analytics and machine learning are transforming agricultural forecasting by enabling precise predictions of crop performance. Using historical data, environmental variables, and satellite imagery like NDVI from Sentinel-2, agronomists can train models to predict yields accurately. This aids in logistics planning, insurance loss estimation, and informed decision-making regarding crop management. A case study using ArcGIS Pro demonstrates the effectiveness of random forest regression and XGBoost algorithms for sugarcane yield prediction, highlighting the importance of variables such as NDVI, irrigation method, solar radiance, and soil moisture. Accurate forecasting empowers farmers to optimize profits and manage resources efficiently, complementing traditional farming practices with modern technology.