Introduction

Dust storms and other dust events are natural phenomena characterized by strong winds carrying large amounts of fine particles which have significant environmental and human impacts (Shepherd et al., 2016a; Tong et al., 2023). Such events pick up fine sediments and carry them through the air for varying distances ranging from a few kilometers to across continents, vary in duration from hours to days (Prospero, 1999; Griffin et al., 2001; Kok et al., 2017), and spread large amounts of particulate matter in the air which can be harmful to human health (Grineski et al., 2011). Dust storms and other dust events are common in semi-arid and arid regions around the world (Middleton, 2017), with a global increase in their occurrence since the pre-industrial period (Kok et al., 2023) and smaller trends that vary regionally over the last few decades (Shao et al., 2013; Indoitu et al., 2012; Logothetis et al., 2021). Dust source regions have strong winds, frequent soil and atmospheric droughts, and little vegetation (Ginoux et al., 2012), all of which contribute to dust emission. What these regions typically lack, however, are dense observing networks to catalog the occurrence of dust events.

Dust events have been studied in the United States (U.S.) for decades, typically using traditional observational data, mainly that of satellite or ground-based (also called station) data (Wang, 2015). For example, the Interagency Monitoring of Protected Visual Environments (IMPROVE) program on National Park Service (NPS) land operates ground-level observing stations that record dust events (Malm et al., 1994). Data captured by these stations have been used in many studies of dust occurrence in the U.S. (e.g. (Hand et al., 2017; Achakulwisut et al., 2017)). For example, Aryal and Evans (2022) used data from the IMPROVE network of stations on National Park Service land to study the frequency and intensity of dust occurrences. However, these stations conduct measurements on a three-day cycle, which may result in missed detections of short-lived or less intense dust events. Turning to satellites, data from them have been used to map the sources and extent of dust events. Various satellite projects, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Total Ozone Mapping Spectrometer (TOMS), have provided estimates to map global dust sources (Ginoux et al., 2012; Prospero et al., 2002). Other data sources for mapping dust have included official reports from federal agencies, such as the Storm Events Database from the National Centers for Environmental Information (Crooks et al., 2016; Ardon-Dryer et al., 2023), weather codes from Surface Synoptic Observation stations (Shao et al., 2013), and other notable incidents documented by the National Aeronautics and Space Administration (NASA) (Lei & Wang, 2014).

While the use of satellite instruments provide a broad spatial coverage of atmospheric dust sources, it is not without limitations. For example, the MODIS instruments aboard the Terra and Aqua satellites have a fixed overpass time near local noon (Salomonson et al., 1989), causing many dust events that occur in the afternoon to be missed. For example, on April 29, 2018, a major dust storm in eastern Nebraska, which resulted in one fatality, 15 injuries, and the closure of Interstate 80, was missed by the MODIS instruments because it occurred later in the day (Blumer, 2018). In addition, dust events that occur beneath cloud cover are commonly missed by satellite instruments (Sayer et al., 2019). Furthermore, while the ground monitoring stations can provide excellent high temporal resolution and surface concentration data, these stations only observe a single location. This limits studies to either large spatial events, or very local studies, like the study of El Paso dust events (Novlan et al., 2007). While satellites can provide broad spatial resolution and can capture dust events globally, weather stations can only provide localized data that are limited to specific locations thus making comprehensive coverage elusive. While some short-lived and intense dust events are captured by traditional observing methods (e.g. (Raman et al., 2014)), these challenges potentially limit our ability to study such events, as they may go undetected by weather observation stations and satellite instruments.

However, the recent advent of social media platforms has provided a unique opportunity to access a vast amount of crowdsourced data that can be used to address a plethora of scientific research objectives (e.g., (Toivonen et al., 2019; McKitrick et al., 2023)). As a demonstration, this research therefore explores the use of social media data particularly Flickr and X (formerly Twitter) to detect dust event occurrences within the U.S. and their correlation with both NWS advisories and satellite observations. These platforms were selected because they allowed data access via their APIs and also because of their complementary strengths in documenting environmental events. X is widely used for real-time reporting, often capturing eyewitness accounts and immediate impacts of dust events, while Flickr provides geotagged images that shows visual evidence of dust event occurrences. We expect that social media reports of dust events are more likely to record intense dust events than low levels of dust which are a more frequent occurrence, and human impacts are much greater for the most intense events, making a record of these dust events the most important. Our work aims to ascertain the reliability of using crowdsourced data as a supplementary tool to fill the gaps in dust event monitoring for research. To this end, in what follows we first provide a brief review of the significance of dust events (Sect. 2) before presenting our research methodology in Sect. 3. We then present our results (Sect. 4) and discuss their findings in the broader context of dust events within the U.S. (Sect. 5) and finally, Sect. 6 provides a summary of the paper and discusses areas of further work.

Background

Windblown dust has a global coverage, impacting regions such as North Africa, Central and East Asia, the Middle East, and Australia (Ginoux et al., 2012). In the U.S., the Southwest and surrounding areas have been significantly impacted by dust events in the past few decades (Lei & Wang, 2014; Raman et al., 2014; Tong et al., 2017; Kelley & Ardon-Dryer, 2021). This is not a new phenomena to the region, as paleoclimatic data shows that the Southwest has encountered severe and prolonged droughts over the last 1000 years (Cook et al., 2010), predisposing such areas to dust events. The Great Plains region also experiences blowing dust (Evans, 2024; Lambert et al., 2020; Kandakji et al., 2021) and was famously the location of the Dust Bowl environmental disaster of the 1930 s (Cook et al., 2009). Blowing dust is much less common further east in the U.S., though outbreaks do still occur, including some severe enough for deadly traffic accidents (Reicosky et al., 2023).

Researchers have also documented that dust events cause significant environmental impacts. For example, mineral aerosols from gathering dust impacts cloud formation and precipitation (Strong et al., 2015; Evans et al., 2020). Dust particles both absorb and scatter solar radiation which can alter global and regional radiative energy balance, creating impacts on both regional climate and local environment (Shepherd et al., 2016b; Evans et al., 2019). For example, Evans et al. (2019)) showed how North African dust alters rainfall in the Sahel and thus impacts vegetation productivity. The deposition of dust on the surface brings important minerals needed for biological productivity in both forests and oceans (Tatlhego et al., 2020; Weis et al., 2024; Starr et al., 2023; Wang et al., 2023), while the loss of soil from dust source regions reduces agricultural productivity (Chappell et al., 2019).

Dust has many important human impacts as well. Several studies have linked exposure to dust events to cases of asthma, acute bronchitis and other respiratory health diseases (e.g., (Johnston et al., 2011; Grineski et al., 2011; Gyan et al., 2005; Tam et al., 2012; Schweitzer et al., 2018; Tong et al., 2023)). For example, Goudie (2014) discussed how particulate matter levels from dust events can exceed healthy limits thereby causing respiratory and cardiovascular issues, conjunctivitis, and skin irritations. Dust events transport bacteria, viruses, and fungi like Coccidioides immitis which cause coccidioidomycosis in the Southwest region of the U.S. (Reed & Nugent, 2018; Tong et al., 2017). Coccidioides immitis (i.e., Valley Fever) flourishes in arid soil, dispersing spores during dry, windy weather (Griffin, 2007). In the last two decades, there has been a rise in Coccidioides infections in the southwest U.S. through the inhaling of dust caused by increased wind velocities and reduced precipitation (Reed & Nugent, 2018). In the Sahel, Achudume and Oladipo (2009) highlights the connection between meningococcal meningitis and the intrusion of Saharan dust into West Africa, finding that the risk of developing the disease increases with greater dust exposure.

Dust events also reduce horizontal visibility with consequent economic impacts (Al-Hemoud et al., 2017). Rising sand particles from dust events can reduce visibility to below 1000 m (Bhattachan et al., 2019; Indoitu et al., 2012) and extreme reduced visibility from dust events can lead to road accidents (Watson, 2002; Ashley et al., 2015; Reicosky et al., 2023). Research by Miri et al. (2009) showed that increased traffic fatalities occur during summer in Iran which can be primarily attributed to dust events. Bhattachan et al. (2019) showed that the probability of traffic accidents in southern California increased with decreasing visibility from blowing dust and Tong et al. (2023) reported that, in the U.S., a total of traffic 232 deaths were recorded in the US due to windblown dust events between 2007 and 2017. In regions prone to intense dust events, airport closures and cargo truck accidents impact the local economy (Al-Hemoud et al., 2017), damage to machinery and industrial equipment, as well as expenses associated with removing sand pile-up on roads and around residential buildings, are additional economic impacts of dust events (Miri et al., 2009). In the U.S., the economic costs of dust impacts, including healthcare, renewable energy, and agricultural productivity, exceeds $150 billion, putting it on par with damages from hurricanes (Feng et al., 2025). Gholizadeh et al. (2021) showed how dust events widen income inequality, affecting small-scale farmers whose livelihood primarily depends on small quantities of land and livestock. The income inequality effects further extend to both global and regional levels, as well as within the rural economy.

Airborne dust occurs in the U.S. during all seasons (Ginoux et al., 2012), but there is also a distinct seasonality - almost everywhere has the most dust in the spring and summer (Orgill & Sehmel, 1976; Crooks et al., 2016). Most of the Southwest peaks during the spring season while the Great Plains have more dust during the summer (Aryal & Evans, 2022; Hand et al., 2017). Intense thunderstorms can produce dust events called “haboobs,” most often occurring in the Southwest, when strong downbursts from thunderstorms create very strong surface winds that loft dust (Eagar et al., 2017). Haboobs are primarily a localized summer late afternoon/evening phenomenon (Raman et al., 2014). With climate models predicting that the Southwest U.S. will become drier, leading to more desertification and frequent dust events (Pu & Ginoux, 2017), there is a growing need for research and monitoring of dust events. Traditional ground-based and satellite observation networks have limitations and potentially underreport short-lived or localized events. Therefore, leveraging unconventional data sources could help fill observational gaps and improve real-time dust events detection and analysis.

In addition to traditional methods for monitor and observing the world around us, social media usage has grown as a useful tool for event detection and monitoring natural hazards including wildfires, earthquakes, flooding, and other environmental disasters (Finch et al., 2016). Social media platforms like X (Twitter), Facebook, Flickr can provide vast amounts of crowdsourced data (McKitrick et al., 2023) and act as an efficient means for environmental surveillance which researchers have used to complement traditional monitoring methods (Ghermandi & Sinclair, 2019). For example, tweets have proven to be valuable source of information for research (e.g., (Kim et al., 2013; Liu et al., 2012)). For example, Crooks et al. (2013) showed how tweets could delineate the extent of earthquakes while Panteras et al. (2015) showed how Flickr and Twitter when combined could detect and delineate a wildfire event. Several studies have also documented the usefulness of X data in capturing large-scale, non-conventional data for health and behavioral studies (e.g., (Kurkcu et al., 2017; Alvaro et al., 2015; Hong & Ye, 2018; Chen et al., 2023)), land use (e.g., (Iranmanesh et al., 2022; Crooks et al., 2015)) or disaster resilience (e.g., (Moghadas et al., 2023)). However, little or no attention has been placed on the use of social media data for dust events monitoring in the U.S. or anywhere else. Our study therefore explores the value of a combination of social media, in situ and satellite observations as an improved approach to fill gaps in the observation of dust event occurrences.

Methodology

In what follows we first present how we collected the data (Sect. 3.1) before turning to how we processed the social media (Sect. 3.2. The overall workflow is presented in Fig. 1. The scripts and resulting social media datasets have been made available at https://doi.org/10.6084/m9.figshare.28430873 with personal user information removed

Fig. 1
figure 1

Flowchart of our workflow

Data collection

Flickr

In this study, we use metadata from Flickr, an online social networking platform. We choose Flickr as the basis for our research because it provides a unique opportunity to capture large-scale social media data from a photo hosting website (Mislove et al., 2008). The accompanying metadata for each photo are particularly useful for research and educational purposes (as discussed in Sect. 2). Flickr is designed to manage a massive volume of media content effectively, while offering features that promote user interaction and engagement (Zeng & Wei, 2013). Accessing the Flickr API has been documented to have inherent challenges ranging from limited ability to download images to missing of metadata (Ding & Fan, 2019). To overcome these limitations, the Photosearcher R package (Fox et al., 2020) was developed. In this paper, we use the Photosearcher R package to query the Flickr database with a defined set of criteria. By using the photo_search function, we target all available Flickr photos that contain dust event-related keywords in their titles, descriptions, or tags between the year 2001 and 2023. These keywords include"dust storm","sandstorm","haboob","blowing dust","dust devil","dust gale","dust cloud","blowing dust","desert dust"; with dust storm, haboob, and sandstorm returning the most results. Although we considered including more general terms such as"dust"or"dust event,"we opted not to, as"dust"alone is highly ambiguous and could refer to household dust, industrial dust, or other non-relevant contexts. Furthermore,"dust event"is not commonly used in casual social media posts, making it less likely to yield relevant results. However, given that the public can use"sandstorm"and"dust storm"interchangeably despite the scientific distinction-where dust particles are smaller than sand particles- we expect that our selected keywords sufficiently capture relevant dust-related events.

In this study we limit the focus to the U.S, or more specifically only entries from the Great Plains and westwards. This geographic focus is informed by prior research demonstrating a higher frequency of dust events in these regions (Aryal & Evans, 2022; Ginoux et al., 2012). While dust events also occur in the eastern U.S., such as the May 2023 dust storm in Springfield, IL, which resulted in fatalities and major highway closures (Reicosky et al., 2023), these events are comparatively less frequent. Our focus remains on the Great Plains and western regions, where dust events are more prevalent and have been more studied. We also chose the U.S. as we have high quality weather warning and advisory events. Thus we constrain the Flickr query to a geographical bounding box corresponding to the United States. The query yields a dataset consisting of 5,444 metadata entries of dust event tagged photos between 2001 and 2023. This metadata includes information such as photo ID, owner ID, title, date taken, date uploaded, tags, and longitude and latitude coordinates. The date taken field is reliable, as it is generated automatically by camera metadata, accurately reflecting the real-time occurrence of dust events. However, there can be a lapse between date taken and date uploaded, as users may upload images at a later time. Importantly, our analysis relies on the date taken field rather than date uploaded, ensuring that we capture the actual timing of dust events rather than when they were shared online. Next, we conduct a series of data refinement steps. We restrict our dataset to one photo per owner per day to avoid duplicates. To prevent duplicates, we normalize the dataset by ensuring all observations have a unique ID to each entry. A selection of these retrieved images is shown in Fig. 2. We acknowledge that a single dust event could be reported by multiple observers, potentially leading to multiple entries for the same event. However, rather than removing such instances, we consider them valuable as they provide insights into the extent and visibility of the event from different locations, perspectives, and times. Since our focus is on identifying dust events rather than precise dust event measurements, retaining multiple reports enhances our ability to detect and confirm the occurrence of events across a broader spatial scale. Additionally, by constraining our dataset to one photo per user per day, we reduce excessive duplication while still capturing multiple independent observations of significant dust events.

Fig. 2
figure 2

Selected images retrieved from Flickr showing active dust events

X (formally Twitter)

We use Twitter data (now X) to complement the Flickr dataset, increasing the possibility of capturing more social media data on dust events within the U.S.. Our X dataset contains tweets sent in the U.S. between 2009 and 2014, using the geosocial system (Croitoru et al., 2013) using the same keywords as with Flickr above.

Our X query captures 6,511 tweets between August 27, 2009, and January 20, 2014. We filter the dataset to remove retweets and duplicates, only allowing one tweet per day per user as done with Flickr photos. To maintain consistency, we also limit the dataset to tweets from the same states as the Flickr data. The metadata includes information such as the tweet’s content, retweet status, tweet ID, user’s screen name, timestamp, and the longitude and latitude coordinates.

However, the accuracy of the geographical information associated with these tweets varies, as outlined in the X data dictionary (available at X (2024)). Some tweets include coordinates derived from the location field in the author’s profile, while others are geolocated by X using the location information specified in the tweet’s location field. Tweets from these geolocation methods generally have low to medium accuracy, as they are not necessarily based on the user’s actual location at the time of the tweet (Stefanidis et al., 2013). Our dataset also captures a set of high-accuracy tweets featuring coordinates provided directly by the client in the location field within the submitted tweet. This per-tweet geotagging uses the precise location from which the tweet was sent; however, only a small fraction of overall tweets use this feature (Cheng et al., 2010). In Fig. 3, we provide examples of tweets containing dust event related keywords which are extracted from X using the tweet ID from our metadata. These tweets are part of the X metadata entries and clearly showcase attached photos of dust events.

Fig. 3
figure 3

Selected posts retrieved from X showing active dust events

National weather service

NWS data was obtained from the Iowa Environmental Mesonet VTEC Product browser (IEM 2023), which archives metadata and geometries for National Weather Service Valid Time Event Code (VTEC) weather warnings and advisories. The VTEC system is used by the NWS to issue alerts for hazardous weather conditions, including dust events. We retrieved metadata for all Dust Storm [code DS] and Blowing Dust [code DU] warnings and advisories recorded from 2006 - 2022 across all states from the Great Plains westward. Dust Storm (DS) warnings are issued when visibility is reduced to 1/4 mile or less due to dust, while Blowing Dust (DU) advisories are issued when significant blowing dust reduces visibility but does not meet the criteria for a full dust storm warning. The retrieved metadata includes information such as the location, issuance date, expiration date, and the geographical extent covered by each warning. A total of 2,005 NWS metadata entries were obtained during the data collection process.

Satellite observations

The satellite observations of dust used in this research are data acquired from the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument aboard the Suomi-NPP satellite downloaded from NASA (2024). This instrument captures passive observations at multiple wavelengths around 1:30 pm local time (Cao et al., 2013). The raw data gathered by VIIRS can be processed to generate a diverse range of products (e.g., land and sea surface temperature, cloud properties, and dust detection etc) (Jackson et al., 2013). For this study, we use the aerosol products derived from the Deep Blue retrieval algorithm (Hsu et al., 2013) between 2012 and 2022. This classifies the aerosols observed in each retrieval as belonging to one of several types of aerosol, such as dust, smoke, and industrial pollutants. For this study we use all retrievals classified as dust. The data are available with daily temporal resolution, covering the entire globe at gridded \(1^\circ \times 1^\circ\) spatial resolution. While VIIRS Deep Blue dataset provides valuable large-scale observations of dust, its use in event-based analysis presents challenges. The sensor’s overpass time at approximately 1:30 pm local time means that many short-lived dust events occurring outside this window may not be captured. Additionally, thick dust plumes or cloud cover can obscure retrievals, leading to data gaps. Despite these limitations, we include VIIRS dust retrievals in our analysis as an additional source of validation to compare seasonal trends in dust events occurrence with social media reports (Flickr and X), and NWS advisories. Future studies may benefit from supplementing VIIRS-derived dust products with alternative approaches, such as examining true-color imagery from platforms like NASA WorldView, to enhance event detection and validation.

The summary of the records used in the study from different data sets during their respective time periods is shown in Table 1. This difference among time periods is primarily due to the nature of the archival policies of each platform and the practical limitations in retrieving older datasets. As shown in Table 1, Flickr data span from 2001 to 2023, while X (Twitter) data cover a shorter period from 2009 to 2014. This discrepancy is mainly due to access constraints; historical tweets from X are limited to specific archival retrieval methods, while Flickr maintains a more extensive public archive of geotagged images.

Table 1 Summary of records used in the study from different datasets

Data processing

Unlike the more traditional datasets, the social media data was further processed the data to minimize noise and reduce false positives. We cleaned both the Flickr and X datasets to exclude observations where the title, description, or tags contained any of the following words:"raining,""thunderstorm,""tornado,""hurricane,""africa,""sahara,""RT,""snow,""monsoon,""snowing,""ice,""flooding,""blackrockcity,""BRC,""beach,""thunder,""rain,""rainstorm,""cyclone,""burningman,""burning,""rainy,""flood,""festival,""coast,""ocean,"and"typhoon.” For interested readers we chose to remove terms such as"blackrockcity"and"burningman,"as these related to the Burning man Festival which occurs in Nevada between August and September and produces many social media posts. Following these data cleaning and noise removal processes, the final dataset comprises 987 Flickr metadata entries. and 2,922 original tweets. These datasets concentrate solely on data sourced from the Great Plains and Southwest U.S..

Results

Our geolocation analysis of dust events using Flickr and X, combined with weather advisories and warnings from the National Weather Service (NWS), is visually represented in Fig. 4. This spatial overview integrates the multiple data sources used in our research to provide a view of dust event occurrences in the western U.S. It is important to note that although 2,922 tweet entries were used in our analysis, only 24 of those that have tweet-level geolocation with high spatial accuracy (i.e., precisely geolocated) are displayed in Fig. 4. The Great Plains region appears to have limited Flickr data associated with dust events as only a few records of dust events were captured in Great Plains states north of Texas. The same can be said for much of the Rockies, as only a few Flickr data are present in Idaho, Colorado, Wyoming, and Montana. The Pacific Northwest shows Washington with some Flickr data in the dry eastern region of the otherwise forested state while only a few data is captured in Oregon. Our results make clear that observations of dust event occurrences based on X and Flickr data are most common in Arizona, California, and Texas. Notably, both Flickr and X-identified posts of dust events exhibit substantial overlap with areas covered by NWS blowing dust warnings (Fig. 4), indicating that while the datasets shown in this figure span different years, most social media posts originate from regions that experience blowing dust.

While Fig. 4 presents only 24 tweets with high spatial accuracy, we use all 2,922 tweets with lower-level spatial accuracy to analyze state-level distributions and seasonality. Although city-level geolocation may vary in precision, prior research by Cheng et al. (2010) has shown that 30% of X users are accurately located within a 10-mile radius and 51% within a 100-mile radius, increasing our confidence in the accuracy of state-level geolocation. Figure 5 shows that data from X further affirms the high concentrations in the southwestern U.S., with Arizona, California, and Texas at the forefront in terms of the frequency of occurrence, following the same pattern as the results from Flickr.

Turning to seasonal analysis as shown in Fig. 6, we observe that the highest number of dust event reports in both the Flickr and X datasets occur during the Summer (captured as June, July, and August), followed by Spring (March, April, and May). While this trend aligns with previous findings specific to the Southwest U.S., and demonstrates a broad seasonal consistency with other datasets (e.g., NWS and VIIRS) used in our study, we recognize that this pattern may be influenced by the higher concentration of reports from states such as Arizona, where summer dust events are more frequent (Ardon-Dryer et al., 2023; White et al., 2023). The seasonal pattern may show a peak in Spring for other regions with different seasonality (Hand et al., 2017; Aryal & Evans, 2022; Robinson & Ardon-Dryer, 2024).

To further understand the social media posts that identify dust events we performed a word cloud analysis on the entire text content of the titles and tags of the Flickr data and the tweets from X data after removing stop words. As shown in Fig. 7 prominent terms associated with dust events, such as"dust storm,""haboob,""sandstorm,"and"desert storm,” all of which were part of our data search criteria. These terms not only reflect the frequency of dust events but also the regional terminology and public perception of these events. In west Texas, the"haboob"dust events, named after the Arabic term for strong wind, is a frequently used term due to their clear visibility to any observer (Reed & Nugent, 2018). Region-specific terms such as"Arizona,""California,"and"Texas"are also captured in the word cloud analysis, indicating regions of highest dust event occurrences in the U.S. Other non-search criteria terms that emerge from the word cloud include, but are not limited to,"cloud,""dusty,""southwest,""outdoor,""summer,""weather,"and"nature."

Fig. 4
figure 4

Map showing the distribution of flickr-identified dust event occurrences, X-identified dust event occurrences, National Weather Service dust advisories, including dust storm (DS) warnings and blowing dust (DU) advisories

Fig. 5
figure 5

Regional distribution of social media reports mentioning dust events in the U.S. using Flickr and X identified metadata

Fig. 6
figure 6

Seasonal cycle of dust events using social media metadata, the National Weather Service advisories, and the VIIRS satellite data

Fig. 7
figure 7

Word cloud of dust event-related tags from Flickr tags and tweets. The size of each word reflects its frequency. Larger words indicate higher frequency, while smaller words indicate lower frequency

Fig. 8
figure 8

Examples of social media identified dust events and satellite observations for the same day. Brown shaded pixels indicate locations Suomi-VIIRS observed dust particles. Any VTEC warnings issued by NWS for the location are shown after the date of each dust event, with HWW and DSW indicating High Wind Warning and Dust Storm Warning, respectively

Discussion

This study highlights the growing relevance of crowdsourced data sources in environmental monitoring. By using geographical information from multiple sources (e.g., Flickr, X NWS and satellite observations) our study shows that dust event occurrences in the U.S. are most frequent in the Southwest region of the U.S. with a secondary peak in the Great Plains which both have been noted as sources of dust because their deserts and agricultural areas contribute to dust event occurrences (Ginoux et al., 2012; Joshi, 2021).

Our results identify Arizona as the state with the most dust event observations, corroborated by both Flickr and X (Fig. 5). This geographic pattern aligns with wide coverage of NWS dust warnings, suggesting a correlation between social media-detected events and official weather advisories (Fig. 4). It should also be noted that Arizona, while having a much smaller population and population density than California and Texas, still accounts for the plurality of dust observations, indicating that our results are not purely a result of population density. Dust event occurrences in Arizona predominantly concentrate in southwestern Arizona, an agricultural region, and in south-central Arizona, which is part to the Sonoran Desert-the driest part of the state (Huang et al., 2015). Our study shows a dense concentration of dust events around Phoenix and Tucson. Since larger cities generate more social media data than smaller ones (Stefanidis et al., 2013), we expect that the population density of these two centers will influence the overall spatial distribution of reported dust events in Arizona. Dust events in Phoenix tend to be more severe than those at other locations (Nickling & Brazel, 1984). The findings from our research are consistent with previous research that have also documented higher frequency of dust events in southern Arizona especially in the summer. For example, Raman et al. (2014) found that Casa Grande, in southern Arizona, a location (where we have several social media posts) have abandoned farmlands that have been abandoned in recent years which leaves behind more soil to be dispersed by the winds in the summer.

After Arizona, our results identify California and Texas as the next states with the most dust event observations from social media (Fig. 5). In California, the frequency of dust events is associated with water extraction from the Owens and Mono basins, as well as the drying of lakes such as the Salton Sea (Goudie & Middleton, 1992; Biddle et al., 2022). When water bodies like the Salton Sea dry up, they leave behind fine sediments that can be easily lifted by winds, becoming significant sources of dust events (Dempsey, 2014; Biddle et al., 2022). In Texas, the spatial distribution of dust events shows a concentration mostly in West Texas (Fig. 4). While dust events happen for different reasons (Kelley & Ardon-Dryer, 2021; Reed & Nugent, 2018), dust from west and southwest Texas is often transported by the wind to the panhandle region where large cities such as Lubbock are affected. This is partially attributed to the vast expanse of easily eroded bare soil in this area during the winter and spring seasons (Kelley et al., 2020).

The Great Plains and Rockies regions show sparse Flickr and X data, indicating fewer records of dust events in these regions. In part this can be attributed to fewer dust events in these states, and in part due to fewer and sparser observers who could be posting to social media. The Pacific Northwest demonstrates limited Flickr and X data, even in the dry eastern regions (Fig. 4). The spatial variability of dust events shown by the NWS weather warning coverage in Great Plains States like Nebraska and Kansas (Fig. 4) has been linked to the frequent land disturbances due to agricultural activities (Singh et al., 2012; Lambert et al., 2020; Goudie & Middleton, 1992). Our analysis also shows a total absence of NWS dust storm advisories and warnings for Utah, North and South Dakota throughout our study period. However, there are numerous Flickr identified dust events in Utah (Fig. 5), a known dust source (Hahnenberger & Nicoll, 2012). Even though the NWS keeps a dust event database containing issued warnings of dust events, there is a shortage of such data for some states, as seen here with Utah (Neff et al., 2013).

Our study shows a consistency in the seasonal pattern of dust events in the U.S. across various data sources, with dust event activity surging during the summer months followed by spring (Fig. 6) which aligns with previous research (Nickling & Brazel, 1984). For example, Ghosh and Pal (2014) has shown an elevation in the frequency of summer dust events in the Southwest U.S. Our result further shows that Spring has the second highest seasonal occurrence of dust events in the western U.S. Our study further confirms earlier observations of the seasonality of dust events. For example, Orgill and Sehmel (1976) concluded that the high dust levels are typically seen during the early and late spring months in many Southwest states, although certain regions may also encounter increased dust during the summer. The predominant dust events in southwestern Texas typically happen during the spring months (Reed & Nugent, 2018; Rivera et al., 2009; Rivera, 2006).

Our research further demonstrates the validity and value of using crowdsourced data sources to capture dust events that may be omitted by satellite and NWS observations. Figure 8 provides examples of specific dust events, illustrating the overlap and discrepancies between dust events identified through social media, NWS warnings, and satellite observations. For instance, a Flickr photo documented a dust event along Highway 71 in western Nebraska on October 18, 2012. For this event, the NWS issued only high wind warnings, while satellite observations indicated blowing dust well outside the area captured in the Flickr photo. Similarly, a Flickr-identified dust event in Amarillo, Texas, on March 18, 2014, was well-predicted by NWS warning, but the satellite observation captures dust well outside the area captured in the Flickr photo. A dust event in Santa Fe New Mexico on January 31, 2014, identified on Flickr, was only captured by the NWS as a red flag warning for high winds, while satellite observations did not detect any dust. In another example, a Flickr-identified dust event at Glendale, Arizona, on June 27, 2015, was captured by the NWS with a dust storm warning and was detected by satellite imagery. In Tempe, Arizona, on August 07, 2018, Flickr identified a huge dust event that was predicted by NWS dust storm warning and satellite observations. A massive dust event in Ritzville, Washington, documented by Flickr on August 12, 2014, was not captured by the NWS but fairly captured by satellite imagery. Finally, a huge dust event that occurred on May 28, 2011 in Death Valley California is captured by Flickr post without NWS dust storm warning.

Using social media data for dust event monitoring presents inherent challenges. For example, while one of the key strengths of crowdsourced data is its ability to capture observations in real time that may not be recorded by traditional monitoring systems. However, these data are inherently limited by reporting biases. Unlike systematic environmental monitoring, social media users do not report dust events continuously or uniformly across space and time. Social media reports are also more likely to come from areas with a higher concentration of social media users, such as cities and highways, rather than remote desert regions where dust events may also occur. These inconsistency makes it challenging for deriving long-term trends, as changes in reporting behavior, platform usage, and public awareness can introduce artificial patterns in the data. Previous studies, such as (Ardon-Dryer et al., 2023), have highlighted concerns about the misuse of crowdsourced data in environmental research, emphasizing the need for careful interpretation and integration with more structured datasets. Therefore, while social media data provide valuable supplemental information, they should be used with caution, particularly when assessing temporal trends or attempting large-scale extrapolations.

Another limitation is the accuracy of user-generated content. Social media posts may contain misclassified events, incorrect geotags, or non-event-related mentions of dust storms. Additionally, only a small fraction of posts include precise geolocation metadata (Cheng et al., 2010), making it difficult to verify the exact location of reported dust events.These limitations reduce the quantity of high-accuracy posts available for advanced analysis. These issues highlight the need for improved data filtering techniques, such as natural language processing (NLP) methods to assess context and spatial validation approaches using satellite datasets. In addition, computer vision techniques for labeling the content of images, such as multi-model large language models (e.g., ChatGPT) could provide a method for validating the images of dust events.

Beyond social media data, the traditional monitoring datasets used in this study also have constraints. National Weather Service (NWS) advisories are issued by authorized meteorologists based on established meteorological thresholds and observed or forecasted weather conditions. However, these advisories are limited to events that meet specific criteria and may not capture smaller, short-lived, or highly localized dust events that fall below these thresholds. Meanwhile, satellite observations provide large-scale coverage but are limited by the timing of sensor overpasses, cloud cover, and the difficulty of distinguishing dust from other aerosols. This means some dust events may be missed, particularly those occurring outside the satellite’s observation window.

Summary

While social media has been used to study a broad range of topics, little if no attention has been given to dust events. Our study indicates that the combination of social media, NWS warnings and satellite observations could be an improved method to fill the gaps in dust occurrences. The spatial distribution of dust events reveals a high concentration of events in Arizona, California, and Texas, with fewer occurrences in the Great Plains, Rockies, and Pacific Northwest regions. Dust events identified in the Southwest U.S. through Flickr and X correlate well with NWS advisories. Arizona shows the highest frequency of dust events with highest frequencies around Phoenix and Tucson. Seasonal analysis across all data sources consistently indicates that dust event activity peaks during the summer months, followed by the spring. This seasonal occurrence is critical for understanding the temporal dynamics of dust events and their potential impacts on public health, transportation and other economic activities. Our research indicates the importance of incorporating crowdsourced data to capture dust events that may be missed by satellites and traditional weather stations. Examples illustrate the discrepancies and overlaps between social media-identified dust events, NWS warnings, and satellite observations, validating the complementary role of social media in enhancing the spatial and temporal coverage of dust event monitoring in the U.S. and could be applied to other dust-prone regions of the world.