The truth about Ambee’s air quality data accuracy

January 5, 2025
2 min read
How accurate is Ambee air quality data
quotation

Air Quality (AQ) is a critical environmental factor impacting millions daily. But how can we ensure that the air quality data we rely on is accurate and actionable? At Ambee, we combine cutting-edge sensor technology, advanced machine learning models, and satellite data to deliver real-time, reliable air quality insights to individuals, businesses, and governments.

Here’s how we achieve that.

The basics: How air quality is measured

Air quality is commonly measured using sensors that detect pollutants like PM2.5, PM10, and gases like nitrogen dioxide (NO₂). These sensors come in two main types:

1. Low-cost sensors: These are affordable, widely deployed sensors that provide basic readings. They’re great for creating dense sensor networks but can suffer from lower precision and accuracy.

2. Reference-grade sensors: These are high-precision sensors that meet stringent regulatory standards, like those used by the U.S. Environmental Protection Agency (US EPA). While highly accurate, these sensors are expensive and thus limited in number, leading to sparse geospatial coverage.

Only relying on sensors, however, leads to a significant challenge.

Geospatial coverage challenge

One of the main challenges in monitoring air quality is the lack of widespread sensor coverage, especially for reference-grade sensors. Given the cost and logistical constraints, these sensors are often concentrated in specific regions, leaving large areas unmonitored. This presents a significant hurdle in providing accurate air quality data for areas without sensors.

At Ambee, combining geospatial interpolation algorithms, machine learning models, and satellite data to fill the gaps.

Filling the gaps: How we use interpolation, machine learning, and satellites

By using a combination of satellite imagery and machine learning models, we can estimate air quality in areas where no physical sensors exist. The models utilize spatial interpolation algorithms to predict air quality between the sparse sensor locations. Satellite data adds another layer of insight, helping capture broad regional patterns.

These take care of our geospatial challenge, where we tackled the lack of reference-grade sensor readings by bringing in interpolation techniques and satellite help.

The question now is: How do we know this data is accurate?

Since there is no benchmark for comparison in this case, we need a different approach to ensure our models remain reliable.

In areas with no AQ sensors, there’s no direct measurement to compare our predictions against, making traditional accuracy calculation methods unsuitable. This is where the Leave-One-Out Cross-Validation method (LOO-CV) comes into play.

Ensuring accuracy with Leave-One-Out Cross-Validation (LOO-CV)

The leave-one-out method helps overcome the challenge of assessing accuracy where sensors are sparse. Here’s how it works:

  1. Pick a sensor location: We start by "leaving out" one sensor from our dataset.
  2. Train the model: The machine learning model is trained on the remaining sensor data, excluding the selected sensor.
  3. Predict for the left-out location: Once trained, the model predicts the air quality for the location of the sensor that was left out.
  4. Compare predictions: The predicted air quality value is compared with the actual value recorded by the sensor.
  5. Repeat for all sensors: This process is repeated for every sensor in the dataset, with each sensor being left out once.

By evaluating the model’s performance across all sensor locations, we can assess how well the model generalizes to new, unseen areas. This allows us to estimate the accuracy of predictions, even for regions where no sensors are installed.

There are some caveats to keep in mind, that might affect the accuracy of air quality models in general, and these are: 

  • Sensor availability and coverage: Limited reference-grade sensors in some regions may reduce prediction accuracy, especially in areas with sparse sensor networks.
  • Measurement inconsistencies: Different sensors can report varying PM2.5 levels due to differences in calibration and sensitivity.
  • Measurement/Reporting delays: Air quality sensors may experience delays in updating their data or might revise data after they have already been published. This might affect the accuracy of the models.

Our accuracy benchmark

The USEPA sets a maximum permissible Root Mean Square Error (RMSE) of 7 µg/m³ for PM2.5 over a 24-hour period or 30% error (Reference: https://cfpub.epa.gov/si/si_public_record_Report.cfm?dirEntryId=350785&Lab=CEMM ). We have set an even higher internal standard, aiming for half of that value to ensure that our AQ data meets stringent accuracy levels.

We regularly compare the past 48 hours of air quality data to ensure the accuracy and reliability of our air quality models. The comparison is done using multiple data sources, including station data from reference-grade sensors and predictions from our own models and other providers. 

The graphs below display the trendlines of PM2.5 levels over the past 48 hours for several stations across different regions. PM2.5 was chosen for these comparisons because it is the most commonly measured pollutant at many stations. Additionally, PM2.5 is often the most prevalent pollutant in many areas, making it a key indicator of air quality and a priority for monitoring.

Each graph represents a comparison of PM2.5 levels across a 48-hour period.

X-axis (timestamp): The horizontal axis shows the measurement time spanning over a 48-hour period.

Y-axis (PM2.5): The vertical axis indicates the concentration of PM2.5 (particulate matter smaller than 2.5 microns in diameter) in micrograms per cubic meter (µg/m³), which is a key pollutant affecting air quality.

Black line (station): This solid black line represents the actual air quality readings from a reference-grade station. This serves as the "ground truth" or benchmark against which predictions from various models are compared.

Dashed lines (predictions): Each dashed line shows predictions from different sources. The legend lists each model along with its performance metrics:

  • Ambee algorithm: These are predictions from our algorithm using the leave-one-out method. This serves as an accuracy estimate for areas without sensors.
  • Breezometer/Google: Predictions from a competitor’s API, potentially modified by their algorithm in certain locations, based on sensor data.
  • Ambee API: These are predictions from our API, where sensor data may be adjusted by our algorithm to account for anomalies or outliers.

Shaded blue area: The light blue shaded region represents a ±3.5 µg/m³ deviation range from the actual station data. This region gives a visual idea of how close each model is to the station data, with lines inside the shaded area representing more accurate predictions.

The closer a dashed line is to the solid black line (station data), the more accurate that model's predictions are for that particular time period. The shaded area helps to emphasize this—lines falling within the shaded region are within a reasonable range of error from the station data.

Performance metrics in legend: Next to each model in the legend, you will find performance metrics:

  • MAE (Mean Absolute Error): This measures the average magnitude of errors between the predicted and actual values.
  • RMSE (Root Mean Square Error): This measures the standard deviation of prediction errors. A lower RMSE indicates more accurate predictions.

Results

Air Quality Accuracy Test Report
Air Quality Accuracy Test Report 2
Air Quality Accuracy Test Report-3
Air Quality Accuracy Test Report-4
Air Quality Accuracy Test Report-5
Air Quality Accuracy Test Report-6
Air Quality Accuracy Test Report-7
Air Quality Accuracy Test Report-8
Air Quality Accuracy Test Report-9
Air Quality Accuracy Test Report-10

We regularly evaluate and refine our models based on these comparisons, aiming to keep deviations well within the benchmark we’ve set, ensuring highly reliable air quality insights across various geographies.

By continuously improving prediction accuracy and setting our internal benchmark at half of the USEPA's permissible RMSE, we demonstrate our commitment to delivering consistent and trustworthy data. The integration of satellite data, geospatial interpolation, and machine learning allows us to provide reliable information even in areas with sparse sensor coverage. This ongoing refinement ensures that our air quality insights remain actionable and relevant.

What does this mean for you?

For businesses, governments, and individuals, reliable air quality data can drive better decisions, whether for public health, urban planning, or personal wellness. By setting stringent accuracy benchmarks and continuously refining our models, Ambee ensures that our air quality insights are reliable, even in regions with sparse sensor coverage, making us your best climate data partner.

CTA
Have questions?
Get in touch!
SUBMIT
Request submitted.
Our team will get in touch with you shortly!
Oops! Something went wrong while submitting the form.
Have Questions? Get in touch
Header image
CTA
CTA
Get your exclusive whitepaper
Thank you! Your email has been received.
CTA
Oops! Something went wrong while submitting the form.