The other day a colleague of ours, who resides in the Bay Area, muttured something on a conference call that shocked us: “It’s raining ash in San Jose. My car, the driveway is covered with ash, as if a volcano had erupted.” We had all read about the deadly wildfires out west, its devastating effects of destroyed habitats, burnt homes, poor air and lives lost, yet the very thought of ash raining from the sky made it scary, almost apocalyptic and difficult to comprehend.
After the call, the stunned silence gave way to a resolute determination. Why can’t we do something to stop the wildfire? The first step to stopping it is to understand its behavior – not just in general terms, but in a very precise manner. We were a bunch of data scientists; we decided to collect data, analyze it, and let it tell the story of how wildfires spread. So began our journey, an ambitious one, that led to a much better appreciation of what fans the wildfires, and culminated in the research report you are reading. Some observations were as expected, others caught us by surprise; the sequel discusses what we discovered.
There’s plenty of data publicly available on the wildfires. True to our training as data scientists, we gathered as much as we could, without a thought for whether or not the data would have relevance to the wildfires – after all, Machine Learning (ML) has taught us that it’s the job of the ML platform, feature engineering in particular, to retain influential variables and remove the frivolous ones, and humans, even subject matter experts, should desist, no matter how strong the urge and principled the belief, to decide which variables matter and which not. The data were collected from variety of sources – about 240 rows (records) in a spreadsheet:
- covering the relevant counties (Humboldt in the north, San Diego in the south, and everything in between)
- for the years 2013 through 2019
- with 17 different input variables, the columns in the spreadsheet (to document the weather conditions, like precipitation, wind, temperature)
- to predict the outcome, acres burnt as a measure of the fire’s intensity
Some of the data sets were California WildFire Incidents (2013-2020) and California Fire Perimeters. They were coalesced into one giant data set, the training data, to train the statistical model on the ML platform to predict how dangerously a wildfire could spread.
Here we ran into a major challenge. The statistical model could predict wildfire, its intensity and spread accurately, but couldn’t explain the reasons behind its predictions. Complex statistical models, such as deep learning framework, used today, don’t suffer from the (model) bias, but are opaque and can’t intelligently explain how they predict: what are the primary reasons behind the prediction. We were stuck. Our primary goal was to understand the wildfire’s behavior, the reasons behind how various factors, when they occur together, make the fire so deadly. We needed a transparent machine – desperately! Unfortunately, none was available in the market.
For our business, we use an internal platform, EazyML, which as the name suggests is very easy to use, with a primary intention to democratize ML, making it accessible to all, not just the data scientists. We set about to extend EazyML to make it transparent – next-gen explainable AI, as the data scientists call it. It wasn’t easy. First, we had to invent a simpler, transparent model which approximated the complex model reasonably well in the neighborhood of the point being predicted; the training data for the simpler model with high fidelity was difficult to derive. Second, armed with the explanations (reasons) for a prediction from the transparent model, we had to develop the intelligence to generalize the reasons to derive the important factors – their values and the relationships with each other – that make it ripe for the wildfire to spread.
Without further ado, here’s what we found. We know in general that, under dry conditions, the wildfires are fanned by winds. Now, for the first time, at least for us, we knew the rules and their specifics – for what fans the flames of the wildfire, causing them to spread:
- probability of precipitation < 0.02, i.e., less than 2% (dry condition)
- humidity < 0.3, i.e., the amount of water vapor in the air, compared to its carrying capacity, is 30% (dry condition)
- precipitation intensity = 0, i.e., no rain (dry condition)
- wind speed > 4 m/s, i.e., 9 mph (sustained winds)
- wind gusts > 15 m/s, i.e., 35 mph (gusty conditions, that last for 20 seconds or more in a burst)
- wind direction, primarily between 170deg to 270deg (out of south west), as also 90deg to 180deg (out of south east)
- Max Temperature > 86 F, Min Temperature > 55 F (hot conditions).
While the temperature – Max Temperature and Min Temperature – played a role in creating ideal conditions for the fire, once the fire was initiated, its effectiveness was low in containing the fire. This was surprising; we expected high temperatures were a prerequisite for wildfires to sustain. Apparently not, at least not as per the data. We thought that maybe our training data was biased, but then thankfully, we found corroborating evidence in Colorado – wildfires spread rapidly, even in the wake of freezing temperatures: https://www.weather.gov/bou/fwf_combiner?zone=COZ226.
The information bulleted above is crucial. If precise conditions are known for wildfires to become super-spreaders (like jumping the continental divide for the first time), the firefighting resources can be selectively deployed to focus on those areas where many of the rules stated above are met.
Perhaps, other studies have improved on what’s reported above, or can build on our research. One thing’s clear about what we need to become more intelligent at managing the wildfire: more data about it and a transparent ML that can extract insights (rules) from the data. One day, we’ll tame the wildfires.