“Biophilic Cities”: Quantifying the Impact of Google Street View-Derived Greenspace Exposures on Socioeconomic Factors and Self-Reported Health

According to the biophilia hypothesis, humans have evolved to prefer natural environments that are essential to their thriving. With urbanization occurring at an unprecedented rate globally, urban greenspace has gained increased attention due to its environmental, health, and socioeconomic benefits. To unlock its full potential, an increased understanding of greenspace metrics is urgently required. In this first-of-a-kind study, we quantified street-level greenspace using 751 644 Google Street View images and computer vision methods for 125 274 locations in Ireland’s major cities. We quantified population-weighted exposure to greenspace and investigated the impact of greenspace on health and socioeconomic determinants. To investigate the association between greenspace and self-reported health, a negative binomial regression analysis was applied. While controlling for other factors, an interquartile range increase in street-level greenspace was associated with a 2.78% increase in self-reported “good or very good” health [95% confidence interval: 2.25–3.31]. Additionally, we observed that populations in upper quartiles of greenspace exposure had higher levels of income and education than those in lower quartiles. This study provides groundbreaking insights into how urban greenspace can be quantified in unprecedented resolution, accuracy, and scale while also having important implications for urban planning and environmental health research and policy.


INTRODUCTION
Urbanization is occurring at an unprecedented rate worldwide, a trend that poses immense challenges to urban sustainability, livability, and public health. Currently, 55% of the world's population live in urban areas, which is set to increase to 70% by 2050. 1 There is tremendous opportunity to counteract some of the negative impacts of urbanization on human health and well-being, 2 with positive environmental exposures such as urban greenspace. 3−7 Urban greenspace refers to natural environments, parks, and recreational spaces; green infrastructure including walking and cycling lanes; and natural vegetation including tree-lined streets, shrubs, gardens, lawns, green walls, and green roofs.
In environmental epidemiological literature, increased exposure to greenspace has been associated with lower mortality rates, 3,4,7 greater life satisfaction, 8 and improved mental health and well-being. 9,10 Additionally, urban residents are more likely to engage in physical activity, a key factor for noncommunicable disease prevention, in greenspace environments. 11,12 In the current technological era, promoting outdoor activity among younger generations through enhanced greenspace provision is extremely important. Dadvand et al. 13 found that children living in close proximity to greenspace had reduced screen time and lower obesity rates. Urban greenspace alleviates adverse environmental health impacts related to pollution and climate change 14−16 as it mitigates air pollution, reduces noise pollution, and enhances the thermal environment. 17−20 Urban greenspace has numerous social benefits including reduction in crime rates, increased perception of safety, 16 and encouraging interaction between residents, leading to social cohesion. 2 Greenspace enhances the esthetics of urban areas. It attracts people, investment, and development and is associated with higher property values. 21 Such green gentrification can exclude socially and economically vulnerable residents due to social isolation or forcing them to relocate. 22 Research has shown that greenspace provision is not spatially or socially equitably distributed. 23,24 Therefore, reducing disparities in greenspace availability is important to prevent widening of health, environmental, and social inequalities.
Due to the Covid-19 pandemic, governments have implemented public health guidance including restricting travel and indoor gatherings. 25 This further highlights the need for "safe, inclusive and accessible green and public spaces". 26 However, greenspace development is hindered by increased demand for residential, transport, commercial, and infrastructure developments. 27 The successful management of fast-paced urbanization and urban greenspace is critical to ensure healthy and livable cities.
A major barrier to successful urban greenspace management is the lack of details in greenspace exposure maps including those used to examine socioeconomic and human health benefits of urban greenspace. Established methods for benchmarking greenspace, including questionnaires, field audits, and geographic information systems (GIS), have several limitations. 28,29 Questionnaires and field audits are laborintensive, time consuming, and limited to small study domains. 30 GIS and satellite imagery offers objective and efficient aerial assessment of greenspace over large spatial scales. Established aerial greenspace metrics include the normalized difference vegetation index (NDVI), % greenspace, % street tree buffering, and distance to parks. 31 These aerial views fail to capture street-level greenspace as experienced by humans on the ground. However, high-resolution street-level greenspace computed over large spatial scales is not routinely available.
Recent advances in street-level imagery availability, data processing, and computer vision methods have great potential for improving urban street-level greenspace assessment. 32−36 By obtaining publicly available high-resolution Google Street View (GSV) images and applying computer vision methods, an accurate measure of visible greenspace can be achieved. Over 10 million miles of GSV images have been captured globally. 37, 38 Although studies have shown significant potential to advance greenspace exposure assessment, they have not quantified and compared urban street-level greenspace in extremely high spatial resolution for all major cities in an entire country. Furthermore, previous research has neither examined population-weighted exposure to greenspace nor examined how health and socioeconomic variables vary with these greenspace metrics within and among cities on a national scale.
To advance our understanding of urban greenspace and unlock its full socioeconomic and health benefit potential, we quantified urban street-level greenspace for three major Irish cities in unprecedented accuracy, spatial resolution, and scale using 751 644 GSV images and computer vision methods. Street-level greenspace was compared to NDVI. Following this, population-weighted exposures to these greenspace metrics were quantified. Finally, all greenspace metrics and populationweighted exposures were investigated in relation to socioeconomic and health variables.

METHODS
2.1. Protocol and Study Domain. Three major Irish cities were included in our study, namely, Dublin city, Cork city, and Galway city. Located in the east, Dublin is the capital of Ireland with a population of 550 000. 39 Cork city, located in the south of Ireland, has a population of 210 000 and an area of 187 km 2 . 39,40 Galway city, located in the west, has a population of 80 000 and covers an area of 54 km 2 . 39,41 Urban greenspace was quantified at street-level and overhead using GSV and satellite imagery, respectively, throughout our study domain. These greenspace metrics were examined with respect to socioeconomic and health measures, and associations between all greenspace metrics and self-reported "good and very good" health were then determined.
2.2. Quantifying Urban Greenspace Using GSV Imagery and Computer Vision Methods. Street-level urban greenspace was quantified using 751 644 GSV images and computer vision methods in high spatial resolution (125 274 point locations) for Dublin, Cork, and Galway cities. For each city, global positioning system (GPS) points were generated every 50 m on a road network shapefile. Using a Google Application Programming Interface (API) key, metadata of the GSV panoramas, including an ID, date, latitude, and longitude, were collected for each point. For each point location, six GSV images were downloaded capturing different horizontal viewing angles (every 60°) (see Figure 1). Across the three cities, GSV images were captured from 2009 to 2019, with 65% of the images taken between 2017 and 2019. Based on the seasonal visual appearance of greenery in Ireland, only GSV images captured in March to October inclusive ("green" months) were used to determine greenspace; 125 274 locations met this criterion, resulting in 751 644 GSV images. Existing Python scripts were adopted and modified to generate point locations and to process the GSV images. 42 To classify greenspace in each image, we adopted an objectbased image analysis technique. Images were segmented into homogeneous polygons, which are physically meaningful, minimizing misclassification of green objects as greenspace. 43 The Python module pymeanshift was used to perform image segmentation. The contrast between greenspace pixels and nongreenspace pixels was enhanced by employing the Excess Green Index (ExG). This modifies the hue of the image and is computed as follows where, r, g, and b are the red, green, and blue color model components, respectively. 44 Otsu's automatic thresholding method 45 was subsequently employed to identify optimum thresholds from the ExG image to extract greenspace vegetation pixels. Green vegetation has a high reflectance in the green band and a low reflectance in both red and blue bands. Based on this principle, greenspace was identified from the GSV images and was subsequently quantified using the greenspace metric, Green View Index (GVI). 32−36 GVI is measured as a percentage from 0 to 100, with 100 having a maximum density of greenspace. GVI was calculated for each location using the following equation where Area gi is the number of green pixels in the image i, and Area ti is the total number of pixels in the image i. Li et al. 43 tested this classification algorithm by manually classifying greenspace from a random sample of GSV images (n = 100). A correlation coefficient of 0.94 was observed, thereby ensuring high accuracy of the methods employed in this study. The mean GVI was calculated for each Small Area. 2.3. Quantifying NDVI Using Satellite Imagery and Computational Algorithms. The NDVI greenspace metric was quantified for Dublin, Cork, and Galway cities. NDVI is measured between −1 and 1, with values closer to 1 indicating more greenspace. 32,36,46−48 NDVI is computed by applying computational algorithms to satellite imagery. NDVI is determined based on the principle that green vegetation absorbs red light and reflects near-infrared light. 46 NDVI greenspace classification was completed using Landsat-8 satellite imagery in 30 m × 30 m grid-cells, downloaded from the U.S. Geological Survey Earth Explorer 49 for May 2020. Landsat images were screened for cloud, shadow, and water using an Fmask filter. 50,51 Across the three cities, 380 553 grid-cells satisfied the Fmask quality control conditions. NDVI was calculated in each grid-cell using Landsat-8 imagery as follows where Band 4 measures red light and Band 5 measures nearinfrared light. The mean NDVI was determined for each Small Area.
2.4. Health, Socioeconomic, and Air Quality Variables. Health and socioeconomic measures were obtained from 2016 census data. 39 The following variables were obtained for each Small Area: average age, percentage of females, selfreported health (number and percentage of residents whose self-evaluated health status was "good or very good"), unemployment rate, active transport mode usage rate (percentage of residents walking or cycling regularly), and third-level education attainment rate (percentage of residents aged 15 years and over with at least an ordinary bachelor's degree). Small Areas are regions within Electoral Divisions, which contain between 80 and 120 dwellings and are used for the compilation of census statistics. 39 The household median gross income was obtained for each Electoral Division and assigned to each Small Area within the respective Electoral Division. The 2016 annual PM 2.5 concentrations, modeled in five zones for Dublin city using ADMS-Urban, were obtained from the Irish Environmental Protection Agency (see Figure  S1). 52 The median PM 2.5 concentration was assigned to each Small Area located within each zone.
2.5. Population-Weighted Exposure to Greenspace and Statistical Analysis. Following the determination of GVI and NDVI for the three study domains, univariate outliers were removed from the data. Subsequently, populationweighted exposure to GVI and NDVI was computed for each Small Area (pwe to GVI and pwe to NDVI, respectively) according to the following equation 53 where pwe i is the population-weighted exposure to GVI or NDVI in each Small Area i; p i is the population living in each Small Area i; and G i is the GVI or NDVI greenspace in area i. The overall population-weighted exposure to GVI and NDVI for each city (PWE to GVI and PWE to NDVI, respectively) was calculated as follows where PWE j is the overall population-weighted exposure in each city j; P ij is the percentage of the total population living in each Small Area i in each city j; G ij has been defined previously; and n is the number of Small Areas in each city. Summary statistics were determined for all greenspace metrics and socioeconomic and health variables for each city utilizing the Small Area data set. Following this, the distributions of the greenspace metrics were examined. We also computed Pearson's correlations of the greenspace metrics (GVI, NDVI, and natural log-transformed pwe metrics).
The lower and upper quartiles (i.e., 25th and 75th percentiles) of GVI, NDVI, pwe to GVI, and pwe to NDVI were computed for the three cities and all three cities combined. Descriptive statistics for socioeconomic and health variables were computed for Small Areas corresponding to the lower and upper greenspace quartiles. For each variable, Mann−Whitney U-tests were applied to assess statistically significant differences between the lower and upper quartiles. P-values <0.05 were considered statistically significant, and <0.001 were highly statistically significant. Statistical analyses were performed in Python version 2.7. 54 Environmental Science & Technology pubs.acs.org/est Article A negative binomial regression model was employed to determine associations between greenspace exposures and counts of self-reported "good or very good" health for each Small Area. A negative binomial regression model is suitable for predicting overdispersed count data such as our selfreported "good or very good" health metric. 54−59 We controlled for average age, percentage of females, active transport mode usage rate, third-level education attainment rate, and household median income. We used the population of each Small Area as an offset variable. These variables were selected to control for known confounding effects and were employed in similar analyses exploring health effects of greenspace. 55−59 To reduce the skew of the distributions, the natural log was applied to average age, household median income, pwe to GVI, and pwe to NDVI. Models were run for each greenspace exposure metric (GVI, NDVI, pwe to GVI, and pwe to NDVI), for each city, and the combined city data set. Furthermore, we ran a model for an additional greenspace metric, which considered both GVI and NDVI in combination. For this, we normalized GVI and NDVI and subsequently computed a GVI/NDVI ratio for each Small Area. 32 Sensitivity tests were performed by including PM 2.5 concentration levels (natural log-transformed) in the regression models for the Dublin city data set.
We also conducted stratified analyses to assess potential effect modification by age, gender, and socioeconomic status. Small Areas were stratified into quartiles based on average age, percentage of females, third-level education attainment rate, and household median income. Base models were run, whereby the variable that the data were stratified by was excluded from the model, and subsequently, models were run for all Small Areas corresponding to each quartile. The regression models were performed in R version 4.0.3 using "MASS" package version 7.3.  Table 1 for a summary of the results.
We examined the spatial differences and similarities of GVI and NDVI within the three cities (see Figures 2 and S2). While lower greenspace levels were consistently observed in urban centers, the level of greenspace varied in many Small Areas depending on the greenspace metric used. Dublin city suburbs had highly variable levels of GVI, while greenspace determined by NDVI was more consistent. Cork city suburbs were identified as areas in the top quartile of both greenspace metrics. This was similar for Galway city, while Dublin city center had a high number of Small Areas within the lower quartile of both metrics.
Population-weighted exposure to greenspace was determined in all Small Areas, and the overall PWE to greenspace was computed for each city. Cork city had the highest mean pwe to GVI and NDVI, while both metrics were notably lower for Dublin city. A similar trend was observed when comparing the overall PWE to GVI and NDVI (see Table 1). Figure S3 shows the pwe to GVI and pwe to NDVI mapped for each city. Changes from Figures 2 to S3 include a higher number of Small Areas in the upper quartile of pwe to GVI in Dublin city center. A similar trend was identified across the city for pwe to NDVI. To gain insight into such changes, an additional metric was developed (see the note in the Supporting Information).
The distributions of Small Area GVI, pwe to GVI, and pwe to NDVI in each city were examined and found to be rightskewed (see Figure S6). In contrast, the NDVI distributions were normally distributed. Pearson's correlation between all greenspace metrics was positive. GVI and NDVI had a strong positive correlation of 0.71, while a correlation of 0.81 was determined for pwe to GVI and pwe to NDVI (see Table S1).
3.2. Comparing Socioeconomic and Health Determinants by Greenspace Quartiles. Lower and upper quartiles of GVI, NDVI, pwe to GVI, and pwe to NDVI were computed,  Tables 2 and S2). Residents whose selfreported health was "good or very good" were consistently higher in areas in the upper quartiles of greenspace. For the combined data set, the percentage of the population with selfreported "good or very good" health was 7% higher in the upper quartile of GVI than in the lower quartile. Notably, the population was older in the upper quartile. Lower unemployment rates and higher median gross income were observed in Figure 2. Map of the GVI and NDVI greenspace metrics per Small Area in Dublin, Cork, and Galway cities, where GVI and NDVI have been categorized as octiles. The GVI was computed using a combination of street-level GSV imagery and computer vision methods, while the NDVI was determined by processing satellite imagery using computational algorithms.   This includes the mean (μ) and standard deviation (δ) for greenspace metrics and socioeconomic and health variables for Dublin, Cork, and Galway cities. b The percentage of residents aged 15 years and above with at least an ordinary bachelor's degree.
Environmental Science & Technology pubs.acs.org/est Article areas within the upper quartiles of greenspace. Third-level education attainment rates were consistently higher for people living in areas in the upper quartile of GVI. A similar trend was observed for third-level education rates for NDVI in Cork city. However, the rates were similar in the upper and lower quartiles of NDVI in Dublin and Galway cities. Relative differences in socioeconomic and health variables observed between Small Areas in the lower and upper quartiles of population-weighted exposure to GVI and NDVI were examined (see Tables S3−S6). The magnitude of these differences was attenuated when examining population-weighted exposure to greenspace (pwe to GVI and pwe to NDVI). This trend was observed across most variables for all cities, with some exceptions (see Tables S3−S6). We examined differences between the distributions of the lower and upper quartiles of GVI, NDVI, pwe to GVI, pwe to NDVI, and socioeconomic and health variables. Differences were observed for almost all variables in Dublin, Cork, and Galway cities, with some minor exceptions (see Tables S3−S6). 3.3. Associations between Greenspace Exposure and Health. Higher greenspace exposures (GVI and NDVI) were associated with higher levels of self-reported "good or very Table 3. Difference in Counts of Self-Reported "Good or Very Good" Health Outcomes Associated with an IQR Increase in Exposure to Greenspace (GVI, NDVI, GVI/NDVI, pwe to GVI, and pwe to NDVI) as Observed for All Small Areas in Dublin City (n = 2179), Cork City (n = 848), and Galway City (n = 308) and All Cities (n = 3335) a,b Difference in counts of self-reported "good or very good" health outcome per IQR increase in exposure to greenspace (95% CI) All models were adjusted for average age, percentage of females, active transport mode usage rate, third-level education attainment rate, and household median income. b The population of each Small Area was used as an offset variable. c Natural log-transformed. d Sensitivity test, included an additional variable (PM 2.5 concentration levels) in the regression model. Table 4. Difference in Counts of Self-Reported "Good or Very Good" Health Outcomes Associated with an IQR Increase in Exposure to Greenspace (GVI, NDVI, GVI/NDVI, pwe to GVI, and pwe to NDVI) as Observed for Stratified Quartiles for All Small Areas for the Combined City Data Set a,b,c Difference in counts of self-reported "good or very good" health outcome per IQR increase in exposure to greenspace (95% CI) In addition to the stratified quartile models, base models were run for the full data set (n = 3335). The variable by which the data were stratified was omitted from the models. b In each set of models, we excluded the effect modifier category variable. Models were adjusted for average age, percentage of females, active transport mode usage rate, third-level education attainment rate, and household median income. c The population of each small area was used as an offset variable.  Table 3). A sensitivity test was conducted by additionally controlling for PM 2.5 concentration levels in Dublin City. A slight decrease (approximately 0.5%) was observed in the change in counts of self-reported "good or very good" health for an IQR increase in GVI and NDVI greenspace exposures.
The results were modified when the data were stratified by Small Areas corresponding to the quartiles of average age, percentage of females, third-level education attainment rate, and household median income relative to their base model (see Table 4 for details). Associations between greenspace metrics and self-reported "good or very good" health were strongest in the third quartile for age. The differences in counts of self-reported "good or very good" health outcomes associated with an IQR increase in greenspace exposure were highest in the Small Areas where the percentage of females and third-level education attainment rate were lowest. Increases in self-reported "good or very good" health in response to greenspace were highest in quartile 3 for income. There is evidence of effect modification.

DISCUSSION
Exposure to street-level greenspace is currently unquantified in high resolution and understudied worldwide. An increased understanding of exposure to greenspace is needed to fully yield its socioeconomic and health benefits. This study sought to address this major knowledge gap. The methodology utilized in this research demonstrated a scalable and novel approach to quantifying greenspace and population-weighted exposure to greenspace within and among cities on a national scale. In this first-of-a-kind study, GVI was computed for 125 274 locations in three Irish cities using 751 644 GSV images and computer vision methods. Further analyses provided insight into how greenspace and population-weighted exposure to greenspace varied with socioeconomic factors and self-reported health.
The spatial similarities and differences of GVI and NDVI were examined, identifying low greenspace levels in city centers. GVI and NDVI were not matched in greenspace quantity in some Small Areas. Similar findings were observed by Lu et al. 61 who compared greenspace distribution in Hong Kong. Lack of greenspace development within city centers may be due to demand for commercial space. Modern greenspaces such as skyrise greenery and green roofs must be considered in areas where competition exists for open space.
In our study, higher median income and lower unemployment levels were observed in upper quartile greenspace areas.
Previous studies have reported that areas with lower household income had significantly less greenspace development. 23,32 With urbanization continuing at a rapid rate, this threatens further socioeconomic inequities of greenspace accessibility. It is important that future greenspace development is distributed among societies to eliminate inequalities.
We observed that the percentage of people who use active transport modes was higher in the lower quartiles of greenspace than the upper quartiles. Residents in city centers, which account for a significant number of Small Areas in the lower quartiles, are more likely to actively commute as their journeys are shorter. With that in mind, greenspace development is vital in city centers and peripheries as commuters are beneficiaries of greenspace. Studies have shown that greenspace alleviates environmental issues such as air pollution 17 while also acting as a natural barrier against such emissions, reducing exposure to those who travel adjacent to them. Commuters' journeys could be enhanced and their health improved as a result of higher greenspace exposure.
A directed acyclic graph (DAG) was used to identify confounders, mediators, and effect modifiers (see Figure  S7). [55][56][57][58][59]62,63 While controlling for identified factors, an interquartile range increase in street-level greenspace was associated with a 2.78% increase in counts of self-reported "good or very good" health [95% CI: 2.33−3.42]. We also found evidence of effect modification by age, gender, and socioeconomic status in associations between counts of selfreported "good or very good" health with greenspace exposure. Such associations identified between greenspace and selfreported health were determined at the Small Area level. Although this is the smallest unit of aggregated data used for the compilation of census statistics, we cannot exclude the possibility of ecological fallacy. 64 Therefore, we are unable to assume that such associations exist at individual level.
Previous studies have compared traditional greenspace metrics, such as NDVI, with self-reported health. 47,55,56,65−67 Associations between greenspace exposure and self-reported health that we identified were consistent with such studies, with better health outcomes observed for residents in greener areas. 10,55,58,59 However, the impact of greenspace exposure on self-reported health may differ by greenspace metric, including our GVI exposure metric. Therefore, we cannot directly compare our results with the previous literature.
General health is investigated in the Irish census based on participants' response to a single qualitative question: "how is your health in general?" Although it is a subjective assessment of health, studies have found that self-reported health is a strong predictor of objective health measures and mortality. 68−72 However, other studies determined that self-reported health underestimates the magnitude of health inequalities by socioeconomic status. 73,74 Under Irish legislation, it is compulsory for everyone in Ireland to complete or be included on a census form. 75 Ninety-nine percent of the population were accounted for by completed census forms, while basic demographic information was included for the remaining 1% to ensure completeness of the data. 76 By using novel computer vision methods to identify greenspace, this research has made significant progress in examining urban environmental engineering problems using computer science-derived methodologies. Although computationally intensive, automatic processing of GSV images is costand time-efficient. Moreover, GSV images are obtained virtually, supporting research safety by eliminating physical Environmental Science & Technology pubs.acs.org/est Article surveying of potentially dangerous areas. Additionally, it is an unobtrusive method of data collection. Most GSV images are captured by a panoramic camera, which is mounted onto a car. 77 Google have also developed a Street View Three-Wheeler to gather imagery in cities with narrow streets and a Street View Trekker, which is a wearable backpack that captures images in locations only accessible by foot. 37 Previously, GSV images were freely available. However, Google introduced a pricing plan for GSV API of $0.0056 per image. The total cost for this study is approximately $4200. 78 Additionally, GSV API does not allow historical images to be collected. Therefore, green months were specified to eliminate seasonality effects. This research assumes that greenspace exposure is determined based on its availability, which is a key requirement for greenspace use. However, other determinants contribute to greenspace use, 79 for instance, accessibility to greenspace. Another factor that impacts greenspace use is perceived safety and security. High crime rates and antisocial behavior discourage residents from utilizing such spaces. For this large-scale study, data to provide insights for such factors are not readily available or easy to gather.
Many opportunities exist to balance urban environmental and health problems with positive access and exposure to greenspace. The cutting-edge approach adopted in this research identifies a scalable approach to benchmarking street-level and overhead greenspace in high resolution. Examining associations between higher levels of greenspace exposure and self-reported "good or very good" health, we observed positive associations in most cases. While urbanization continues at a rapid rate, the development of greenspace will become increasingly contested. The socioeconomic and health benefits of greenspace observed in this study must be recognized. Safe and accessible greenspace development must be embedded in national and international policies and become part of our global vision. This will help society develop healthy, sustainable, and livable cities of the future.

* sı Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.est.1c01326. PM 2.5 concentration level maps; greenspace metric comparison maps; pwe to GVI and pwe to NDVI maps; maps of GVI and NDVI, population, octiles, and pwe to GVI and NDVI; distribution plots; correlation matrix; summary statistics for the lower and upper quartiles of pwe to GVI and NDVI; tables of relative differences and the statistical significance of differences in greenspace, socioeconomic and health variables between the lower and upper quartiles of greenspace metrics; and directed acyclic graph (PDF)