Taylor Geospatial Releases First Global Agricultural Field Boundary Dataset
The first global dataset showing the boundaries of agricultural fields was released in late April 2026, after an 18-month campaign by geospatial experts in industry and academia. The initiative, led by the nonprofit Taylor Geospatial and Microsoft AI for Good Lab, produced an open and publicly available dataset with applications for food security, carbon accounting, precision agriculture and water-quality analysis.
SpaceNews first reported the announcement from San Francisco. Jennifer Marcus, Taylor Geospatial vice president of strategic innovation programs, told the outlet that the project revealed the challenges of applying machine learning and computer vision to satellite data "to get insights at global scale."
Taylor Geospatial was established in St. Louis in 2026 to catalyze the development and commercialization of geospatial AI. The organization combines the Taylor Geospatial Institute and Taylor Geospatial Engine. Marcus said the focus is on the application of AI, machine learning and computer vision to satellite imagery with the ambitious goal of being able to create global datasets and publish the training data, models and output data.
Despite rapid progress in applying AI to language and photographs of everyday objects, hurdles remain in applying it to satellite imagery. The Fields of the World project began by bringing together geospatial experts in industry and academia to create a training dataset. Part of the limitation in the industry is that training data can be very focused on just the U.S. or just European countries, where the governments put out data that could be used as ground truth. They took it upon themselves to create a more global training dataset.
Next, participants selected the best model architecture and published their results. The global dataset was then released with a confidence layer, showing "the model works better in some places than others," Marcus said. (This is the kind of honest uncertainty that most AI vendors would rather bury in a footnote.)
The technical documentation from Fields of The World reveals the actual scale of the operation. A global 10m field boundary dataset covers 2024 with 1.62 billion polygons and 2025 with 1.55 billion polygons, produced by running the FTW PRUE model worldwide. The dataset includes both model inputs (Sentinel-2–derived median composites in COG and Zarr v3 formats) and outputs (in Zarr, GeoParquet and PMTiles).
Features are defined by selecting DOY ranges as planting/harvest heuristics and computing the median of masked pixels across approximately 5–10 scenes. All feature COGs are reprojected and resampled to EPSG:4326 at 8.983119e-5° (~10 m at the equator) using GDAL cubic resampling, producing a single Zarr mosaic with dimensions (time, band, y, x). The PRUE model is run over features to produce a Zarr dataset with bands [non_field_background, field, field_boundaries].
A GeoParquet vector dataset is derived from the prediction Zarr by thresholding the softmax outputs for [non_field_background, field, field_boundaries] at 0.5 and polygonizing. Files follow the GeoParquet v1.1.0 spec: approximately 8.2 billion rows across 1,001 files, approximately 629 GB on S3. All Sentinel-2 scenes were sourced from s3://sentinel-cogs/sentinel-s2-l2a-cogs.
Users can pan the world map to browse 3.17 billion field boundaries, download polygons for any region, or optionally run inference directly in their browser — pick a model and a Sentinel-2 tile and go. No code, no setup, no downloads required. The full dataset is publicly available on Source Cooperative. The predictions and Sentinel-2 mosaics are also available through the Earthmover Marketplace.
Other global datasets could be created in a similar fashion. "We proved the people part of this," Marcus said. "No one organization can do this. It requires the modeling and machine-learning community" plus infrastructure and access to extensive graphics processing. A future Taylor Geospatial project called Features of the World will look at infrastructure that can be mapped at global scale.
As for the Fields of the World, the United Nations Food and Agriculture organization and NASA Harvest, a group that encourages adoption of Earth-observation imagery for food supply and agricultural production, are applying the dataset. "We're moving into the phase of stakeholder engagement, where we see how people might use it and make fixes," Marcus said. "Those local fixes will have a feedback loop into the model and the training data to continuously improve output from the model."
The dataset carries a CC-BY-4.0 license. Researchers using the data should cite the AAAI Conference on Artificial Intelligence paper from 2025 and the arXiv preprints from 2026. Contact for the project is [email protected].
Whether this dataset actually improves food security outcomes or just becomes another research paper citation remains to be seen. The infrastructure exists. The data is public. The real test is whether governments and organizations will integrate it into decision-making systems that affect real farmers in real fields.
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt Connect on LinkedIn
Artūras Malašauskas is an AI Systems Integrator with 20+ years of production-grade web engineering experience. He has designed, shipped, and scaled enterprise Python/PHP systems for logistics, SaaS, and public-sector clients. For the past year, he has focused exclusively on AI integrations: deploying open-source LLMs, building generative media pipelines (image, audio, video), and engineering multi-agent workflows for real production environments. His standard: reproducibility, security, cost-efficient inference—no vaporware. He documents and evaluates emerging AI tooling, separating verified capabilities from marketing noise. Technical editor at: muza-ai.eu, ai-verslas.lt, ai-naujinos.lt
Comments