A Statistical Analysis of Immigration and Population Growth Effects on Canadian Housing Prices

Published:

Preparation to run codes

Repo is in private. Please contact (jaechoi.dev@gmail.com) for more information.

conda create --name cmpt353-jca589
conda activate cmpt353-jca589
conda install --file requirements.txt
python run.py --raw # this downloads raw datafiles
python run.py # this analyze using processed data

0. Summary

This project analyzes the relationship between immigration patterns and real estate price growth in Canada from 2006-2024. First, I will explore population data to examine the relationship between immigrants and the population in Canada by province. By analyzing population growth components such as births, deaths, influx and outflux, I will attempt to develop hypotheses about the impact of immigrants on urbanization and increasing real estate prices in Canada. These hypotheses will be examined using both statistical analysis and machine learning techniques including Random Forest, K-means clustering, and PCA. The results may have limitations due to the lack of long-term immigration data, my limited economics background and data analysis experience, but this could demonstrate that immigration’s impact is complex and requires multiple perspectives.

1. Defining Questions

1.1 Context

Canada has been recognized as a nation of immigrants. Immigrants have consistently comprised a significant portion of Canada’s population. However, the aggressive immigration policies of the past decade have been cited as a cause of various social issues, including rising real estate prices. This project examines data on Canada’s population growth and immigration increases, exploring the correlation with rising real estate prices as a prominent social issue. The analysis aims to interpret the significance of this relationship.

1.2 Questions

  • Has immigration accelerated Canadian urbanization?
  • Does immigration directly cause real estate price increases?
  • Is urban concentration the key mechanism linking immigration to housing prices?

2. Data Cleaning and Preparation

2.1 Data Collection

2.2 Data Cleaning and Preparations

Statistics Canada has tools to filter data before downloading CSV files. However, there are limitations in getting relevant data sometimes. So, I had to download complete CSV files for each topic and process only the columns I needed.

For Excel data, I manually cleared some rows and columns. All pre-processing was handled by individual Python files and generates processed data to data/processed.

Raw data is not available for upload on GitHub, so run.py has an option to download raw datasets to view the data processing steps. Otherwise, the analysis step uses processed data.

3. Data Analysis and Methods

3.1 Change of population components in Canada (between 1972 and 2024)

This stacked area chart shows Canada’s population growth components from 1972 to 2024. This shows that immigrants has been filling the gap from reduction of the natural increase.

  • Light Blue (Natural Increase): Births minus deaths. This has been steadily declining from ~200K in the 1970s to near zero by 2024, reflecting Canada’s aging population and declining birth rates.
  • Light Green (Immigrants): Number of permanent residents admitted annually. This shows the actual immigrant contribution, which gets reduced when NPR is negative. Generally increased from ~100-150K in the 1970s to ~250-500K in recent years.
  • Orange (Net NPR): Net non-permanent residents (temporary workers, students, refugees). Only visible when positive.

3.2 Places to be chosen by immigrants in Canada

Retention Rate Statistics by Province (2012-2022)

ProvinceMeanMinMaxStdTrend
Ontario92.290.795.41.3Declining
Alberta92.089.196.42.4Declining
British Columbia87.885.392.92.3Declining
Quebec82.279.989.43.1Declining
Manitoba75.967.588.76.6Declining
Territories73.763.587.37.6Declining
Saskatchewan73.563.588.17.7Declining
Nova Scotia67.759.983.87.5Declining
Newfoundland and Labrador52.737.579.013.3Declining
New Brunswick51.138.376.612.5Declining
Prince Edward Island30.122.657.810.9Declining

Correlation between Province GDP and the number of immigrants

correlation coefficient: 0.9554

ProvinceCorrelationP-valueN Years
Nova Scotia0.9643.37e-1119
New Brunswick0.9486.94e-1019
Alberta0.8759.49e-0719
Saskatchewan0.8721.15e-0619
Yukon0.8291.16e-0519
Prince Edward Island0.8102.64e-0519
Ontario0.7905.66e-0519
Manitoba0.7234.73e-0419
British Columbia0.7106.61e-0419
Nunavut0.6998.74e-0419
Quebec0.2842.38e-0119
Newfoundland and Labrador-0.2762.52e-0119
Northwest Territories-0.3291.69e-0119
  • Strong positive correlations (>0.7) found in 9 provinces
  • Weak/negative correlations in Quebec, Newfoundland and Labrador, and Northwest Territories
  • All significant correlations have p-values < 0.001

All provinces have declining trends in retention rates, which means immigrant mobility after landing has been increasing. However, we can see differences in declining slopes between provinces. Ontario, Alberta, BC, and Quebec still have over 80% retention rates on average between 2012 and 2022. The number of immigrants has a correlation with provincial GDP. Not all individual provinces have strong correlations, but 9 provinces show strong positive correlations. Therefore, we can assume large provinces have the capability to retain immigrants, while small and rural provinces have fewer immigrants and also show a greater tendency for people to plan moves after landing.

3.3 Correlation between immigrants and housing prices in Canada

  • dataset: data/processed/hpi_yearly.csv , data/processed/immigrants_by_cities.csv
  • Cities analyzed: 26 (HPI has 26 major cities, immigrant data has larger though)
  • Significant relationships (p < 0.05): 22
  • Strong fits (R^2 > 0.7): 10
  • Moderate fits (0.5 < R^2 <= 0.7): 6
  • Weak fits (R^2 <= 0.5): 10

The graph clearly shows which cities have strong correlations between immigration and housing prices, with Kitchener-Cambridge-Waterloo, Kelowna, Greater Sudbury, and Windsor showing the strongest relationships.

3.4 Important Features of real estate prices

Model Performance

Training R^2: 0.8948 Testing R^2: 0.8269 Testing MAE: 3.03 Testing RMSE: 4.02 Cross-validation R^2 (mean +- std): -2.7461 +- 1.4248

Feature Importance Ranking:

  • Province GDP per Capita: 0.4307
  • Retention Rate (%): 0.1792
  • Population: 0.1158
  • Interest Rate: 0.0943
  • Number of Immigrants: 0.0740
  • Vacancy Rate (%): 0.0689
  • Employment Rate (%): 0.0373

Important features of real estate prices show that the number of immigrants is not a major factor. Comparing the immigrant-HPI correlation, we can see that correlation does not mean immigration is a causal factor. Rather, immigrants tend to move to large cities, even though housing prices in economically wealthy areas are higher.

3.5 Clustering Cities in Canada

4. Conclusion and Limitations

4.1 Findings

Immigration has a different meaning compared to normal population increase. Because immigrants usually come as adults with children, they are also looking for new places to live. This means they tend to seek “better” places to live, which relates to the economic factors of cities and provinces.

Therefore, immigrants have a correlation with housing prices, but this does not appear to be a direct cause and effect. Rather, the population component changing from natural growth to immigrant growth accelerates urbanization, which increases housing prices.

4.2 Limitations

  • Time periods of data differ from each other, and immigration-related data has a relatively short timeframe. It would have been beneficial to compare pre-1990 immigration datasets, but most analysis had to be conducted based on post-2000 data.

  • There are countries in similar situations, such as Australia, New Zealand, and the US. Comparing policies between countries could help extend this analysis.

  • HPI does not represent rental prices. I strongly suspect rental prices correlate with housing prices and non-PR residents, but due to time constraints, I couldn’t find a way to filter different rental types to reflect this in the analysis.

  • There are not much dataset per cities. Province data is too much blended because canada is a geographically huge country.