• Skip to primary navigation
  • Skip to main content
UT Shield
Urban Information Lab at UT Austin
  • About
    • The Director
    • Mission
  • News & Events
  • Projects
    • Deserts
      • Austin Housing Analysis
      • Austin AI Housing Analysis
      • Transit Deserts
      • Hospital Deserts
      • Community Hub for Smart Mobility (CHSM)
    • Health
      • Urban Health Risk Mapping
      • [COVID-19] VMT Impacts
      • [COVID-19] Epidemic Risk Index
      • Texas Entrepreneurship
      • Optimizing Ambulance Allocation and Routing During Extreme Events
    • Digital twin
      • Smart City Data Integration
      • National Housing Data Portal
      • Active Fire Incident Map
    • Miscellaneous
      • AI Image Generation for Architecture Design
      • Convergent, Responsible, and Ethical AI Training Experience (CREATE Roboticists)
  • Team
  • Contact Us

Urban Health Risk Mapping

Urban Health Risk Mapping
While estimating health outcomes at a neighborhood scale is important for promoting urban health, it has been a costly and time-consuming task. The Urban Health Risk Mapping project leverages crowdsourced data and machine learning technologies to predict the census tract-level health outcomes for ten major US cities, including Austin, Baltimore, Boston, Dallas, Washington, D.C., Houston, Los Angeles, New York City, San Antonio, and San Francisco. The machine-learning-enabled approach has an advantage over the traditional survey methods in terms of time and cost.

Read more…
Austin
Baltimore
Boston
Dallas
Houston
Los Angeles
New York City
San Antonio
San Francisco
Washington D.C.
Urban Health Risk Mapping

The project consists of four parts: (1) database development, (2) modeling and analytics, (3) visualization and web development, and (4) community engagement and application. The first two parts are associated with the actual building, training, and testing of machine learning models. The targets are the various health outcomes, namely the prevalence of common non-communicable chronic diseases such as coronary heart disease, cancer, diabetes, poor mental health, obesity, and stroke. The actual health outcomes used in training and testing the models are accessed from the CDC’s 500 Cities Project. The features are created based on three data sources, namely the CDC’s Social Vulnerability Index (SVI) dataset, the EPA’s Smart Location Database (SLD), and the 311 service request datasets accessed from each municipality. Sixty features (i.e., predictor variables) are considered, which characterize the social environment, the physical environment, and the aspects and degrees of neighborhood disorder. A variety of machine learning algorithms are applied and compared, including Ridge Regression, Lasso Regression, Elastic Net, Support Vector Machine, Decision Tree, Random Forest, Extra Trees, and Gradient Boosting. To improve the model performance, the model hyperparameters are fine-tuned using 10-fold cross-validation. Different sets of features are also experimented with.

It is shown that the tract-level prevalence for the common non-communicable chronic diseases can be reasonably well predicted based on the publicly available datasets. Furthermore, two major findings have been yielded from this study: (1) the sociodemographic and socioeconomic variables are the strongest predictors for tract-level health outcomes; (2) the historical records of 311 service requests can be a useful complementary data source because the information distilled from the 311 data often helps improve the models’ performance.

The datasets and the predictive models are published online. Users can play with the models interactively by using the web tools we developed. The web tools can help the public and city officials evaluate future scenarios and understand how changes in the neighborhood conditions can lead to changes in the health outcomes.

Data Sources

The census tract-level health data are drawn from the 500 Cities Project dataset. (https://chronicdata.cdc.gov/browse?category=500+Cities)

The built environment variables are calculated based on EPA’s Smart Location Database (SLD). (https://www.epa.gov/smartgrowth/smart-location-mapping#SLD)

The socioeconomic and sociodemographic variables are extracted from CDC’s Social Vulnerability Index (SVI) dataset. (https://svi.cdc.gov/data-and-tools-download.html)

The 311 data are accessed from the open data portal of each municipality:

Austin: https://data.austintexas.gov/Utilities-and-City-Services/Austin-311-Public-Data/xwdj-i9he
Baltimore: https://data.baltimorecity.gov/City-Services/311-Customer-Service-Requests/9agw-sxsr
Boston: https://data.boston.gov/dataset/311-service-requests
Dallas: https://www.dallasopendata.com/City-Services/311-Service-Requests-October-1-2016-to-September-3/shgm-yzbp
https://www.dallasopendata.com/City-Services/311-Service-Requests-October-1-2018-to-Present-/m36q-vtbr
Washington, D.C.: https://opendata.dc.gov/datasets/311-city-service-requests-in-2019
Houston: http://www.houstontx.gov/311/
Los Angeles: https://data.lacity.org/browse?q=311&sortBy=relevance&page=2
New York City: https://nycopendata.socrata.com/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9
San Antonio: https://data.sanantonio.gov/dataset/service-calls
San Francisco: https://data.sfgov.org/City-Infrastructure/311-Cases/vw6y-z8j6/data

Abbreviations and Descriptions of Variables

Variable

Abbreviation

Data Source

Outcome variable

Arthritis among adults aged ≥ 18 years (%)

ARTHRITIS

CDC’s 500 Cities Project

High blood pressure among adults aged ≥ 18 years (%)

BPHIGH

Cancer (excluding skin cancer) among adults aged ≥ 18 years (%)

CANCER

Current asthma prevalence among adults aged ≥ 18 years (%)

CASTHMA

Coronary heart disease among adults aged ≥ 18 years (%)

CHD

Chronic obstructive pulmonary disease among adults aged ≥ 18 years (%)

COPD

Diagnosed diabetes among adults aged ≥ 18 years (%)

DIABETES

High cholesterol among adults aged ≥ 18 years who have been screened in the past 5 years (%)

HIGHCHOL

Chronic kidney disease among adults aged ≥ 18 years (%)

KIDNEY

Mental health not good for ≥ 14 days among adults aged ≥ 18 years (%)

MHLTH

Physical health not good for ≥ 14 days among adults aged ≥ 18 years (%)

PHLTH

Stroke among adults aged ≥ 18 years (%)

STROKE

All teeth lost among adults aged ≥ 65 years (%)

TEETHLOST

Binge drinking prevalence among adults aged ≥ 18 years (%)

BINGE

Current smoking among adults aged ≥ 18 years (%)

CSMOKING

No leisure-time physical activity among adults aged ≥ 18 years

LPA

Obesity among adults aged ≥ 18 years

OBESITY

Sleeping less than 7 hours among adults aged ≥ 18 years

SLEEP

Note: The column names for the predicted health outcome values are made simply by prefixing a lowercase ‘p’ before the variable names shown above. For example, ‘ARTHRITIS’ becomes ‘pARTHRITIS’.

Predictor variable

Percentage of persons below poverty

P_POV

CDC’s SVI data

Percentage of civilian (age 16+) unemployed estimate

P_UNEMP

Per capita income (US$)

PCI

Percentage of persons with no high school diploma (age 25+)

P_NOHSDP

Percentage of persons aged 65 and older

P_AGE65P

Percentage of persons aged 17 and younger

P_AGE17M

Percentage of civilian noninstitutionalized population with a disability

P_DISABL

Percentage of single parent households with children under 18

P_SNGPNT

Percentage minority (all persons except white, non-Hispanic)

P_MINRTY

Percentage of persons (age 5+) who speak English “less than well”

P_LIMENG

Percentage of housing in structures with 10 or more units

P_MUNIT

Percentage of mobile homes

P_MOBILE

Percentage of occupied housing units with more people than rooms

P_CROWD

Percentage of households with no vehicle available

P_NOVEH

Percentage of persons in institutionalized group quarters

P_GROUPQ

Percentage uninsured in the total civilian noninstitutionalized population

P_UNINSUR

Percent of population that is working aged

P_WRKAGE

EPA’s Smart Location Database

Percent of one-car households

P_AO1

Percent of two-plus-car households

P_AO2P

Percentage of low-wage workers (earning $1250/month or less) among total workers (home location)

P_LOWWAGEr

Percentage of low-wage workers (earning $1250/month or less) among total workers (work location)

P_LOWWAGEe

Gross residential density (HU/acre) on unprotected land

D_HH

Gross population density (people/acre) on unprotected land

D_POP

Gross employment density (jobs/acre) on unprotected land

D_EMP

Gross activity density (employment + HUs) on unprotected land

D_HUEMP

Gross retail (5-tier) employment density (jobs/acre) on unprotected land

D_EMP_RET

Gross office (5-tier) employment density (jobs/acre) on unprotected land

D_EMP_OFF

Gross industrial (5-tier) employment density (jobs/acre) on unprotected land

D_EMP_IND

Gross service (5-tier) employment density (jobs/acre) on unprotected land

D_EMP_SVC

Gross entertainment (5-tier) employment density (jobs/acre) on unprotected land

D_EMP_ENT

Jobs per household

JOBSPERHH

5-tier employment entropy (denominator set to observed employment types in the census tract)

EMPMIX

Employment and household entropy

EMPHHMIX

Employment and household entropy (based on vehicle trip production and trip attractions including all 5 employment categories)

TRIPMIX

Trip productions and trip attractions equilibrium index

TRIPEQ

Household workers per job, by census tract

WRKSPERJOB

Household workers per job equilibrium index

HHWRKJOBEQ

Total road network density

D_RD

Network density in terms of facility miles of auto-oriented links per square mile

D_RD_AO

Network density in terms of facility miles of multi-modal links per square mile

D_RD_MM

Network density in terms of facility miles of pedestrian-oriented links per square mile

D_RD_PO

Street intersection density (auto-oriented intersections eliminated)

D_X_EXCLAO

Intersection density in terms of auto-oriented intersections per square mile

D_X_AO

Intersection density in terms of multi-modal intersections having three legs per square mile

D_X_MM3

Intersection density in terms of multi-modal intersections having four or more legs per square mile

D_X_MM4

Intersection density in terms of pedestrian-oriented intersections having three legs per square mile

D_X_PO3

Intersection density in terms of pedestrian-oriented intersections having four or more legs per square mile

D_X_PO4

Proportion of census tract employment within ¼ mile of fixed-guideway transit stop

P_EMP025

Proportion of census tract employment within ½ mile of fixed-guideway transit stop

P_EMP050

Aggregate frequency of transit service per square mile

D_TRANSIT

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025