Introduction

Health literacy enables individuals to comprehend and apply health-related knowledge, adopt healthier lifestyles, and maintain or enhance their overall well-being. It is widely recognized as a key determinant of an individual’s ability to manage health effectively1,2. In recent years, health literacy has become a focal point of global health, with many countries acknowledging it as a crucial indicator for evaluating the success of national health initiatives3,4,5,6,7,8,9,10. Among community-dwelling residents, inadequate health literacy has been independently associated with poorer physical and mental health outcomes11. Previous studies have shown that low health literacy is linked to a limited understanding of health information, insufficient disease knowledge, and poor medication adherence12, all of which contribute to increased rates of hospitalization and mortality13,14. This issue is particularly concerning for individuals with chronic diseases, as over 40% are at risk of misunderstanding, forgetting, or neglecting healthcare advice15. Therefore, improving health literacy could represent one of the most effective and cost-efficient strategies for mitigating the growing burden of Non-communicable Chronic Diseases16,17.

Since the first national health literacy survey was conducted in 2008, the China Health Literacy Survey (CHLS) was established in 2012 by the National Health Commission of China, with technical support from the Chinese Center for Health Education. The survey is conducted annually among Chinese residents aged 15–69, making it a continuous, nationally representative surveillance system. The data show that, as a result of ongoing interventions, the national health literacy levels of Chinese residents have steadily increased, with the percentage of residents achieving a CHLS score of 80% or above rising from 8.8% in 2008 to 25.4% in 202118. The CHLS has been widely used for national health literacy surveillance, and factors such as the social development index, age, and education level have been found to be strongly associated with health literacy18.

Although health literacy has improved nationally, significant disparities remain at the local level. National averages often mask regional variations, where certain districts may lag behind due to unique demographic, socioeconomic, or infrastructural challenges. Baiyun District in Guangzhou exemplifies such a case. Despite being part of a major metropolitan area, Baiyun faces persistent issues such as low awareness of chronic disease prevention, limited access to reliable health information, and pronounced disparities across age and education levels. These challenges, combined with its distinctive demographic structure, rapid urbanization, and the coexistence of urban and semi-rural communities, make Baiyun District an important and representative site for examining local health literacy inequalities. This study contributes to the existing body of knowledge by providing a rare multi-year (2019–2024) perspective on health literacy trends at the community level, based on data from a representative sample using a multi-stage sampling design. Unlike prior cross-sectional studies, this work captures temporal patterns and identifies key influencing factors within a dynamic urbanizing district. The findings aim to support the design of more tailored and sustainable health education and promotion strategies in Guangzhou, Guangdong Province.

Methods

Study setting

Baiyun District, located in Guangzhou, was chosen as the study site due to its unique social and demographic characteristics. It is one of the largest districts in Guangzhou and is undergoing rapid urbanization, with a population comprising both urban and semi-rural residents. Previous reports have indicated persistent health literacy gaps in this region, particularly in chronic disease prevention awareness and equitable access to health information. These factors make Baiyun District a representative area for identifying local disparities and informing targeted public health strategies.

Participants

A multistage, stratified, probability proportional to size sampling was used19. The sample size was calculated using the standard formula for such sampling methods, \(\:n=\frac{{{\mu\:}_{/2}}^{2}\times\:p(1-p)}{{\delta\:}^{2}}\times\:deff\). The key parameters used for estimation were as follows:

  1. (1)

    Confidence level and margin of error:

    A 95% confidence level was used, corresponding to a Z value of 1.96. The margin of error (δ) was set at 5%. The design effect (deff) for the cluster sampling design was set at 1.05.

  1. (2)

    Expected health literacy rate and non-response rate:

    Based on the 2023 health literacy level in Baiyun District (38.89%), and assuming a non-response rate of 10%, the required sample size was calculated using PASS 15.0 statistical software, resulting in a minimum required sample of n = 426 participants.

In the first stage, 3 townships (streets) were randomly selected from the Baiyun district using the PPS method. In the second stage, 3 villages (residential committees) were randomly selected using the PPS method. In the third stage, 50 households were randomly selected from each chosen village (residential committee). In the fourth stage, investigators randomly selected one resident aged 15–69 from each selected household using the KISH20 table method.

Specifically, all participants were assigned unique identification codes, and no personally identifiable information was recorded. For face-to-face interviews, particularly with illiterate participants, trained interviewers ensured privacy by conducting the interviews in a quiet and secure setting, away from others. Participants were informed that their responses would be anonymized and used solely for research purposes. The study protocol was approved by the Ethics Committee of the Baiyun Center for Disease Control and Prevention (GZBYJK-2025-001). Written informed consent was obtained from all participants, as well as from a parent or guardian for participants under the age of 17. These measures helped protect the privacy and dignity of all participants, including those with limited literacy. All methods were performed in accordance with relevant ethical guidelines and regulations. The resident population was defined as individuals who had lived in the survey area for at least 6 months within the year prior to the survey. Finally, A total of 2630 participants were included in the analysis for this study.

Measures

The survey collected information on sociodemographic characteristics, including age, gender, ethnicity, education level, marital status, and other relevant factors, as well as health literacy. The Chinese Center for Health Education developed this scale based on the Health Literacy of Chinese Citizens: Basic Knowledge and Skills. This 50-item scale evaluates knowledge and skills essential for addressing real-world health problems and is divided into 6 dimensions: scientific views of health (8 items; score range, 0–11), infectious disease literacy (6 items; score range, 0–7), chronic disease literacy (9 items; score range, 0–12), safety and first aid literacy (10 items; score range, 0–14), medical care literacy (11 items; score range, 0–14), and health information literacy (6 items; score range, 0–8). Additionally, the scale assesses health knowledge, attitudes, behaviors, and skills necessary for managing health-related issues, which are categorized into three dimensions: (1) knowledge and attitudes: basic knowledge and concepts related to health (22 items); (2) behavior and lifestyle: health-related behaviors and lifestyle (16 items); and (3) health-related skills: basic health-related skills (12 items). The scale consists of four types of questions: true-or-false, single-answer, multiple-answer, and situational questions. For multiple-answer questions, a correct response requires all correct answers with no errors. Situational questions followed a paragraph of instruction or medical information. According to the scoring rules of the 2018 edition of the scale, participants received 1 point for each correct response, except for multiple-answer questions, which were scored with 2 points for a correct response. The total score ranges from 0 to 66 points. Based on the official classification standard, participants were grouped into three health literacy levels: low (0−41), intermediate (42−52), and adequate (53−66). This classification has been widely adopted in national health literacy monitoring efforts. The scale has shown strong internal consistency (Cronbach’s α = 0.931) and split-half reliability (Spearman-Brown correlation coefficient = 0.808)19. Participants independently completed the scale, while illiterate and physically disabled individuals who were unable to read or write were interviewed by investigators.

To accommodate illiterate or physically disabled participants, trained interviewers provided assistance by reading questions aloud and recording responses. All interviewers underwent standardized training sessions, which included instruction on ethical conduct, neutral questioning techniques, avoidance of suggestive prompts, and accurate recording procedures. Quality control was ensured through field supervision by senior staff, random checks of completed questionnaires, and double-entry validation. These measures were implemented to minimize interviewer bias and ensure the reliability and validity of the data.

Covariates included sociodemographic characteristics and physical conditions. The sociodemographic characteristics assessed were gender, age, ethnicity, education level, and annual household income. “Local household registration” refers to the Hukou system, a household registration system in China that ties individuals to their place of official residence. This designation affects eligibility for local public services, including healthcare and education. Education level was grouped into three categories: less than junior high school, junior/senior high school, and college or above, consistent with prior national classification standards. For annual household income per capita, although the variable was originally collected as a continuous variable, it was retained in continuous form for analysis and described using medians and interquartile ranges due to its skewed distribution. This approach aligns with the handling of income in similar studies. The presence of chronic conditions was determined by asking participants whether they had been diagnosed with any physician-confirmed chronic diseases, such as high blood pressure, heart disease, stroke, diabetes, or cancer. Self-rated health (SRH) was assessed using the following question: How would you rate your overall health during the past year? The following response options were presented: 1—very good, 2—good, 3—fair, 4—bad, and 5—very bad.

Statistical analysis

Complex data analysis was conducted using the SPSS Complex Samples procedure (version 25), in accordance with the study’s sampling design. Normally distributed quantitative data are presented as mean ± standard deviation (\(\:\stackrel{-}{x}\pm\:s\)), while non-normally distributed data are described using the median (M) and interquartile range (Q1, Q3). Categorical data are summarized as frequencies and percentages (%). Ordered logistic regression (OLR) was applied to model the relationship between health literacy (as an ordinal dependent variable) and these factors. Multiple correspondence analysis (MCA)21,22was used to visualize the relationships among categorical variables (e.g., health literacy, education level, and age), with closer points on the map indicating stronger associations. The cumulative contribution rate of the first two dimensions reflects the proportion of variance explained. MCA reduces the dimensionality of categorical data and projects them into a low-dimensional space, enabling patterns, clusters, and associations between variable categories to be more intuitively interpreted. This approach is particularly suitable for identifying population subgroups with similar response patterns and helps to uncover underlying structures within complex survey data. A P-value of < 0.05 was considered statistically significant.

Results

Characteristics of the research by the CHLS

A total of 2,630 participants aged 15–69 years from Baiyun District, Guangzhou, were included in the final analysis. The mean age was 44.73 years (SD = 12.43), and 40.4% were male. Most participants (86.2%) had local household registration. The average CHLS score was 47.31 (SD = 11.41), with 25.2% classified as having low health literacy, 40.3% intermediate, and 34.5% adequate. Key characteristics included:

  1. (1)

    Age: 51.6% were 15–44 years old, 34.8% were 45–60, and 13.6% were 61–69.

  2. (2)

    Education: 6.2% had less than junior high school, 50.4% completed junior/senior high school, and 43.3% had college or higher education.

  3. (3)

    Chronic disease: 17.5% reported at least one chronic condition.

  4. (4)

    Marital status: 83.4% were married.

  5. (5)

    Annual per capita household income: Median 22,500 yuan (IQR: 35,000 yuan).

  6. (6)

    Self-rated health: 56% rated their health from “very good” to “fair”, with 34.2% of responses missing.

  7. (7)

    Smoking behavior: 11.9% were current smokers; 34.2% of responses were missing.

  8. (8)

    Sick leave in past year: 6.2% had taken sick leave, 55.6% had not, and 4.0% were unsure; 34.2% of data were missing. (See Table 1 for full details.)

We added a comparison of key demographic characteristics—such as gender, age distribution, and education level—between the study sample and the general population of Baiyun District, based on the 2020 data from the Seventh National Population Census of China. This comparison is presented in the revised manuscript (see Table S1).

Table 1 Characteristics of the participants by the CHLS (N = 2630).

The result of the ordinal logistic regression for health literacy

Table 2 presents the summary of the model of the ordinal logistic regression for health literacy. Model fitting information revealed a −2 log-likelihood of 134.38, a Chi-square value of 474.05, and a P-value < 0.001. The goodness-of-fit statistics showed a Pearson Chi-square value of 42.91 (P < 0.001) and a Nagelkerke pseudo-R-square of 0.186. In Table 3, the “threshold” refers to the constant terms of the two binary logistic regression models. Regarding education, participants with less than junior high school education (\(\hat{\beta}\) = −1.71, 95% CI: −2.06, −1.37, Wald χ2 = 94.18, P < 0.001) and those with junior or senior high school education (\(\hat{\beta}\) = −1.24, 95% CI: −1.40, −1.07, Wald χ2 = 207.49, P < 0.001) were significantly less likely to have adequate health literacy compared to those with college or higher education (reference group). Age was also significantly associated with health literacy. Age was significantly associated with health literacy. Participants aged 15–44 years were more likely to have adequate health literacy (\(\hat{\beta}\) = 0.82, 95% CI: 0.58, 1.06, P < 0.001). Those aged 45–60 years also had a higher likelihood of adequate health literacy compared to the reference group (aged 61–69 years) (\(\hat{\beta}\) = 0.32, 95% CI: 0.08, 0.55, P = 0.008).

Table 2 Model summary of the ordinal logistic regression.
Table 3 The result of ordinal logistic regression for health literacy.

The result of the MCA

We conducted MCA analysis to explore the relationships between age, education, and health literacy. The analysis revealed two main dimensions that together explained 96.06% of the variance, with the first dimension explaining 58.48% and the second 37.58% (Table 4). Dimension 1 showed a strong association with age and education, suggesting that these factors were closely linked to health literacy levels. Dimension 2 explained less variance but still contributed meaningful differentiation.

The two-dimensional MCA plot (Fig. 1) clearly demonstrated these patterns: younger individuals with higher education were associated with better health literacy and appeared in the upper-left quadrant, while older individuals with lower education were linked to lower health literacy and clustered in the lower-right quadrant.

Fig. 1
figure 1

Visualization of MCA: Relationships Between Categorical Variables. (Note: Each ellipse highlights groups with similar characteristics in age, education level, and health literacy.).

In Table 5, health literacy showed moderate correlations with education (r = 0.38) and age (r = 0.28), while the correlation between education and age was stronger (r = 0.47). Discrimination measures showed that education had the highest overall contribution (mean = 0.64), followed by age (mean = 0.52), while health literacy had a lower contribution, especially in Dimension 2 (mean = 0.29) (Table 6).

Table 4 Model summary of the MCA.
Table 5 Correlations of transformed variables through principal normalization.
Table 6 Discrimination measures of the MCA.

The discrimination scores were highest for education in Dimension 1 (value = 0.68), followed by age (value = 0.59). The active total scores were 1.75 for Dimension 1 and 1.13 for Dimension 2, with an overall mean score of 1.44. These are visualized in Fig. 2.

Fig. 2
figure 2

Discrimination measure plot of MCA. (Each arrow represents a variable, with its direction and length indicating its contribution to Dimension 1 and Dimension 2. A longer arrow reflects a higher contribution to the explanation of total variance. In this plot, Education shows the strongest discriminatory power across dimensions, followed by Age and Health literacy.).

Discussion

This study aimed to explore the factors influencing health literacy among residents aged 15–69 years in Baiyun District, Guangzhou, based on data from the China Health Literacy Survey (CHLS). The findings revealed significant associations between health literacy and key socio-demographic variables, including education level and age. These results provide valuable insights into the determinants of health literacy in this population, which is crucial for informing public health interventions and policies.

The mean health literacy score in this study was 47.31 (SD = 11.41), with only 34.5% of participants categorized as having adequate health literacy. This highlights a significant need for targeted efforts to improve health literacy in the region, especially among vulnerable populations. The distribution of health literacy levels observed in this study was higher than previous findings in China, where disparities in health literacy are evident across different demographic groups18,19,23,24. These results indicate that there is still a substantial gap in achieving higher levels of health literacy, emphasizing the importance of ongoing public health initiatives to address this issue25.

Education level showed the strongest association with health literacy among the factors examined, with individuals having lower levels of education (less than junior high school or junior/senior high school) being significantly less likely to possess adequate health literacy compared to those with higher education levels. This finding aligns with existing literature, which consistently shows that higher educational attainment is associated with better health literacy26. Education plays a fundamental role in shaping an individual’s cognitive capacity and critical thinking skills, which are essential for processing, evaluating, and applying health-related information27. Higher educational attainment equips individuals with enhanced cognitive skills and critical thinking abilities necessary for effective health decision-making28. It facilitates better employment opportunities and higher income levels, which are associated with improved access to health information and services29. So policies aimed at improving educational infrastructure and accessibility could have far-reaching implications for public health, contributing to sustained socioeconomic development.

Age was also found to influence health literacy, with younger individuals (aged 15–44 years) being more likely to have adequate health literacy compared to older individuals (aged 61–69 years). This is consistent with other studies that have found health literacy to decrease with age, possibly due to cognitive decline, reduced access to information, or decreased familiarity with modern health-related technologies30,31. Social isolation, which is more prevalent in older populations, may also limit opportunities to discuss and clarify health information with others. The association between age and health literacy suggests that targeted interventions for older populations, such as health education programs32,33 tailored to their specific needs, may be crucial in improving their health literacy levels. Midlife health education programs might focus on chronic disease prevention34, whereas those for seniors could incorporate training in digital literacy and comprehension of complex medical information35,36. Such targeted approaches can help mitigate the effects of aging on health literacy, thereby promoting lifelong health management and reducing healthcare costs.

Digital literacy—the ability to access, evaluate, and use information via digital platforms—is increasingly essential for navigating modern healthcare systems. With the growing reliance on online health resources, telemedicine, electronic health records, and mobile health apps, individuals are expected to engage with complex technologies to manage their health. However, older adults often face barriers such as limited prior exposure to digital tools, reduced cognitive or motor function, and lower confidence in using technology. These challenges may hinder their ability to obtain accurate health information, communicate with providers, and make informed decisions. Recognizing digital literacy as a facet of health literacy is crucial for designing inclusive health interventions. Tailored strategies, such as user-friendly interfaces, digital training sessions, or support from caregivers, could help bridge the digital divide and enhance health outcomes for older populations.

The results of MCA further supported these findings, revealing distinct clusters of participants based on their age, education, and health literacy levels. Younger individuals with higher education levels were more likely to exhibit better health literacy, while older individuals with lower education levels were positioned in the lower-right quadrant of the MCA plot, indicating lower health literacy. These patterns highlight the compounded effects of age and education on health literacy, suggesting that both factors should be prioritized in the design of public health interventions. Targeted strategies addressing the specific needs of older, less-educated populations may help reduce health literacy disparities in this region37,38.

The MCA also revealed that education exhibited the highest discriminatory power across the dimensions, followed by age. This underscores the pivotal role of education in shaping health literacy, while also highlighting the importance of considering age when addressing health literacy gaps. For example, Dimension 1 appears to capture a socioeconomic and educational gradient, while Dimension 2 may reflect differences in age-related and digital access–related factors affecting health literacy, consistent with patterns observed in the Baiyun District population. The weak association between health literacy and Dimension 2 in the MCA suggests that other factors, not captured in this analysis, may further influence health literacy. To gain a more comprehensive understanding of health literacy disparities and inform policys aimed at mitigating these gaps, future research should explore additional determinants, such as income39,40, employment status41, and access to healthcare services42,43, which may contribute to the observed disparities. Stable income provides the financial resources necessary for sustained health education44, while favorable employment conditions offer opportunities for skill development and career advancement that enhance health literacy45. Although inadequate health literacy is associated with increased acute healthcare utilization46,47, the role of health literacy in reducing such utilization warrants further investigation48. Lastly, cultural norms shape beliefs, communication styles, and trust in medical systems. For example, in some communities, health information may be shared more through informal networks or influenced by traditional practices, which may or may not align with mainstream health guidance.

Several international studies conducted in urban populations have similarly identified education level and age as key determinants of health literacy. For instance, a study49 in Berlin, Germany, found that younger adults with higher educational attainment demonstrated significantly better health literacy scores, mirroring our observations. Likewise, research from metropolitan areas in Canada50 and Australia51 reported that older adults faced greater difficulties in understanding and applying health-related information, especially in contexts requiring digital engagement. These parallels suggest that the social determinants of health literacy, such as education access and age-related barriers, are consistent across diverse urban contexts. However, cultural, policy, and healthcare system differences may moderate the strength and implications of these associations. By situating our findings within a global framework, we emphasize the universal importance of targeted interventions to address health literacy disparities in aging urban populations.

To address the disparities in health literacy identified in this study, particularly among older adults and individuals with lower educational attainment, practical and targeted interventions are needed. Educational programs should be designed with consideration for varying literacy levels and cultural backgrounds. These programs can be delivered in accessible venues such as community health centers, adult education institutions, and neighborhood associations32,33. Teaching strategies should emphasize visual aids, interactive learning, and repetition, and could benefit from involving trained peer educators to foster relatability and trust.

Improving digital literacy is also critical, especially in an era where digital health technologies are becoming increasingly prevalent. Older adults may face unique challenges in using smartphones, online portals, or telehealth services. Thus, hands-on digital literacy workshops should be offered, focusing on basic skills such as navigating health websites, accessing online records, and using mobile health applications35. These workshops can be implemented through partnerships with local libraries, elderly service centers, and community-based organizations, and should provide ongoing support rather than one-time instruction.

In addition, broad public health campaigns can be leveraged to raise awareness and promote health literacy across the population. These campaigns should utilize multiple communication channels—including social media, traditional media, and community outreach—and should be linguistically and culturally adapted to resonate with specific subgroups36. Collaborating with local influencers, health professionals, and grassroots leaders can help amplify messaging and ensure relevance.

Collectively, these multi-level interventions—educational programming, digital skills training, and culturally sensitive public communication—can build health literacy capacity and reduce inequalities in health outcomes, especially among vulnerable populations in urban settings.

Although previous studies have reported significant associations between annual household income and chronic health conditions with health literacy18,39, our findings did not reveal such associations after adjusting for other sociodemographic variables. Specifically, neither income nor chronic disease status emerged as a significant predictor in the multivariable models. One possible explanation is that these variables may share overlapping variance with more dominant factors, such as education and employment status. In addition, self-reported income and health conditions may have limited accuracy or variability in our sample, which could attenuate their apparent influence. Despite their lack of statistical significance in this study, these variables remain important contextual factors and warrant further investigation in future research with more detailed or objective measures.

This study has several limitations. First, the cross-sectional design limits the ability to infer causality between the factors and health literacy. Longitudinal studies are needed to better understand the temporal relationships between these variables. Second, the reliance on self-reported data for variables such as smoking behavior and health status may introduce biases, although these measures are commonly used in health surveys. Finally, the study was conducted in a single district of Guangzhou, which may limit the generalizability of the findings to other regions in China or internationally.

Conclusions

The findings from this study underscore the importance of education and age in determining health literacy in the Baiyun District of Guangzhou. Public health initiatives aimed at improving health literacy should prioritize educational interventions, especially for older individuals and those with lower educational attainment. Additionally, future research should explore the broader socio-economic and cultural factors that influence health literacy, as well as the development of tailored strategies to address these disparities.