stats_can package

Submodules

stats_can.api_class module

Define a class wrapper for Stats Can functions.

class stats_can.api_class.StatsCan(data_folder=None)[source]

Bases: object

Load Statistics Canada data and metadata into python.

Parameters
  • data_folder (Path/str, default None) –

  • location to save/search for locally stored Statistics Canada (The) –

  • tables. Defaults to the current working directory (data) –

delete_tables(tables)[source]

Remove locally stored tables.

Parameters

tables (str or [str]) – tables to delete

Returns

Return type

[deleted tables]

property downloaded_tables

Check which tables you’ve downloaded.

Checks the file “stats_can.h5” in the instantiated data folder and lists all tables stored there.

Returns

Return type

[table_ids]

static get_code_sets()[source]

Get code sets.

Code sets provide additional metadata to describe variables and are grouped into scales, frequencies, symbols etc.

Returns

code_sets – one dictionary for each group of information

Return type

[dict]

static get_tables_for_vectors(vectors)[source]

Find which table(s) a V# or list of V#s is from.

Parameters

vectors (str or [str]) – V#(s) to look up tables for

Returns

  • dictionary of vector (table pairs plus an)

  • ”all_tables” key with a list of all tables

  • containing the input V#s

>>> StatsCan.get_tables_for_vectors("v39050")
{39050: '10100139', 'all_tables': ['10100139']}
>>> StatsCan..get_tables_for_vectors(["v39050", "v1074250274"])
{39050: '10100139', 1074250274: '16100011', 'all_tables': ['10100139', '16100011']}
table_to_df(table)[source]

Read a table to a dataframe.

Parameters

table (str) – The ID of the table of interest, e.g “271-000-22”

Returns

  • pandas.DataFrame – Dataframe of the requested table

  • If the table has been previously loaded to the file in self.data_folder

  • it will retrieve that locally stored dataframe. If it’s unavailable it will

  • download it and then return the table. To update a locally stored table,

  • call StatsCan.update_tables(), optionally passing just the table number of interest

static tables_updated_on_date(date)[source]

Get a list of tables that were updated on a given date.

Parameters

date (str or datetime.date) – The date to check tables

Returns

changed_tables – one dictionary for each table with its update date

Return type

[dict]

static tables_updated_today()[source]

Get a list of tables that were updated today.

Returns

changed_tables – one dictionary for each table with its update date

Return type

[dict]

update_tables(tables=None)[source]

Update locally stored tables.

Compares latest available reference period in locally stored tables to the latest available on Statistics Canada and updates any tables necessary

Parameters

tables (str or [str], default None) – Optional subset of tables to check for updates, defaults to update all downloaded tables

Returns

Return type

[str] list of tables that were updated, empty list if no updates made

static vector_metadata(vectors)[source]

Get metadata on vectors.

Parameters

vectors (str or [str]) – V#(s) to retrieve metadata

Returns

vector_metadata – list of dictionaries with one dict for each vector

Return type

[dict]

vectors_to_df(vectors, start_date=None)[source]

Get a dataframe of V#s.

Parameters
  • vectors (str or [str]) – the V#s to retrieve

  • start_date (datetime.date, optional) – earliest reference period to return, defaults to all available history

Returns

  • pandas.DataFrame – Dataframe indexed on reference date, with columns for each V# input

  • Note that any V#s in tables that are not currently locally stored will

  • have their tables downloaded prior to returning the dataframe

static vectors_to_df_remote(vectors, periods=1, start_release_date=None, end_release_date=None)[source]

Retrieve V# data directly from Statistics Canada.

Parameters
  • vectors (str or [str]) – V#(s) to retrieve data for

  • periods (int, default 1) – Number of periods to retrieve data. Note that this will be ignored if start_release_date and end_release date are set

  • start_release_date (datetime.date, default None) – earliest release date to retrieve data

  • end_release_date (datetime.date, default None) – latest release date to retrieve data

Returns

  • pandas.DataFrame – Dataframe indexed on reference (not release) date, with columns for each V# input

  • Note that start and end release date refer to the dates the data was released,

  • not the reference period they cover. For example. October labour force survey

  • data is released on the first or second Friday of November.

static vectors_updated_today()[source]

Get a list of all V#s that were updated today.

Returns

changed_series – one dictionary for each vector with its update date

Return type

[dict]

stats_can.helpers module

Helper functions that shouldn’t need to be directly called by an end user.

stats_can.helpers.check_status(results)[source]

Make sure list of results succeeded.

Parameters

results (list of dicts, or dict) – JSON from an API call parsed as a dictionary

Returns

results – JSON from an API call parsed as a dictionary

Return type

list of dicts, or dict

stats_can.helpers.chunk_vectors(vectors)[source]

Break vectors into chunks small enough for the API (300 limit).

Parameters

vectors (list of str or str) – A string or list of strings of vector names to be parsed

Returns

chunks – lists of vectors in chunks

Return type

list of lists of str

stats_can.helpers.parse_tables(tables)[source]

Parse string of table or tables to numeric.

Strip out hyphens or other non-numeric characters from a list of tables or a single table Table names in StatsCan often have a trailing -01 which isn’t necessary So also take just the first 8 characters. This function by no means guarantees you have a clean list of valid tables, but it’s a good start.

Parameters

tables (list of str or str) – A string or list of strings of table names to be parsed

Returns

tables with unnecessary characters removed

Return type

list of str

stats_can.helpers.parse_vectors(vectors)[source]

Parse string of vector or vectors to numeric.

Strip out V from V#s. Similar to parse tables, this by no means guarantees a valid entry, just helps with some standard input formats

Parameters

vectors (list of str or str) – A string or list of strings of vector names to be parsed

Returns

vectors with unnecessary characters removed

Return type

list of str

stats_can.sc module

Functionality that extends on what the base StatsCan api returns in some way.

stats_can.sc.code_sets_to_df_dict()[source]

Get all code sets.

Code sets provide additional metadata to describe information. Code sets are grouped into scales, frequencies, symbols etc. and returned as dictionary of dataframes.

Returns

pandas.Dataframe – dictionary of dataframes

Return type

list

stats_can.sc.delete_tables(tables, path=None, h5file='stats_can.h5', csv=True)[source]

Delete downloaded tables.

Parameters
  • tables (list) – list of tables to delete

  • path (str or path object, default None) – where to look for the tables to delete

  • h5file (str default stats_can.h5) – h5file to remove from, set to None to remove zips

  • csv (boolean, default True) – if h5file is None this specifies whether to delete zipped csv or SDMX

Returns

to_delete – list of deleted tables

Return type

list

stats_can.sc.download_tables(tables, path=None, csv=True)[source]

Download a json file and zip of data for a list of tables to path.

Parameters
  • tables (list of str) – tables to be downloaded

  • path (str or path object, default: None (will do current directory)) – Where to download the table and json

  • csv (boolean, default True) – download in CSV format, if not download SDMX

Returns

downloaded – list of tables that were downloaded

Return type

list

stats_can.sc.get_tables_for_vectors(vectors)[source]

Get a list of dicts mapping vectors to tables.

Parameters

vectors (list of str or str) – Vectors to find tables for

Returns

tables_list – keys for each vector number return the table, plus a key for ‘all_tables’ that has a list of unique tables used by vectors

Return type

list of dict

stats_can.sc.h5_included_keys(h5file='stats_can.h5', path=None)[source]

Return a list of keys in an h5 file.

Parameters
  • h5file (str, default stats_can.h5) – name of the h5file to store the tables in

  • path (str or path, default = current working directory) – path to the h5file

Returns

keys – list of keys in the hdf5 file

Return type

list

stats_can.sc.h5_update_tables(h5file='stats_can.h5', path=None, tables=None)[source]

Update any stats_can tables contained in an h5 file.

Parameters
  • h5file (str, default stats_can.h5) – name of the h5file to store the tables in

  • path (str or path, default = current working directory) – path to the h5file

  • tables (str or list of str, optional, default None) – If included will only update the subset of tables already in the file and in the tables parameter

Returns

update_table_list – List of tables that were updated

Return type

[str]

stats_can.sc.list_downloaded_tables(path=None, h5file='stats_can.h5')[source]

Return a list of metadata for StatsCan tables.

Wrapper for list zipped tables and list h5 tables

Parameters
  • path (str or path, default = current working directory) – path to the h5 file

  • h5file (str, default stats_can.h5) – name of the h5file to read table data from

Returns

jsons – list of available tables json data

Return type

list

stats_can.sc.list_h5_tables(path=None, h5file='stats_can.h5')[source]

Return a list of metadata for StatsCan tables from an hdf5 file.

Parameters
  • path (str or path, default = current working directory) – path to the h5 file

  • h5file (str, default stats_can.h5) – name of the h5file to read table data from

Returns

jsons – list of available tables json data

Return type

list

stats_can.sc.list_zipped_tables(path=None)[source]

List StatsCan tables available.

defaults to looking in the current working directory and for zipped CSVs

Parameters

path (string or path, default None) – Where to look for zipped tables

Returns

tables – list of available tables json data

Return type

list

stats_can.sc.metadata_from_h5(tables, h5file='stats_can.h5', path=None)[source]

Read table metadata from h5.

Parameters
  • tables (str or list of str) – name of the tables to read

  • h5file (str, default stats_can.h5) – name of the h5file to retrieve the table from

  • path (str or path, default = current working directory) – path to the h5file

Returns

Return type

list of local table metadata

stats_can.sc.table_from_h5(table, h5file='stats_can.h5', path=None)[source]

Read a table from h5 to a dataframe.

Parameters
  • table (str) – name of the table to read

  • h5file (str, default stats_can.h5) – name of the h5file to retrieve the table from

  • path (str or path, default = current working directory) – path to the h5file

Returns

df – table in dataframe format

Return type

pd.DataFrame

stats_can.sc.table_subsets_from_vectors(vectors)[source]

Get a list of dicts mapping tables to vectors.

Parameters

vectors (list of str or str) – Vectors to find tables for

Returns

tables_dict – keys for each table used by the vectors, matched to a list of vectors

Return type

list of dict

stats_can.sc.table_to_df(table, path=None, h5file='stats_can.h5')[source]

Read a table to a dataframe.

Wrapper for table_from_h5 and zip_table_to_dataframe

Parameters
  • table (str) – name of the table to read

  • h5file (str, default stats_can.h5) – name of the h5file to retrieve the table from, None for zip

  • path (str or path, default = current working directory) – path to the table data

Returns

df – table in dataframe format

Return type

pd.DataFrame

stats_can.sc.tables_to_h5(tables, h5file='stats_can.h5', path=None)[source]

Take a table and its metadata and put it in an hdf5 file.

Parameters
  • tables (list of str) – tables to add to the h5file

  • h5file (str, default stats_can.h5) – name of the h5file to store the tables in

  • path (str or path, default = current working directory) – path to the h5file

Returns

tables – list of tables loaded into the file

Return type

list

stats_can.sc.update_tables(path=None, h5file='stats_can.h5', tables=None, csv=True)[source]

Update downloaded tables where required.

Reads local metadata, either from json files or stored in an h5 file, compares it to the metadata on the StatsCan website and downloads those tables that don’t have matching metadata

Just a wrapper for zip_update tables and h5_update_tables functions

Parameters
  • path (str or path, default None) – Path where local tables are stored, assumes current directory if None

  • h5file (str, default 'stats_can.h5') – Name of the h5 file storing StatsCan tables, set to None for zips

  • tables (list of str, default None) – For hdf5 only, update only a subset of tables that are both in the file and specified by this argument, None means update all tables

  • csv (boolean, default True) – If updating zips this determines whether to update zipped CSV or SDMX

Returns

update_table_list – list of updated tables

Return type

list of str

stats_can.sc.vectors_to_df(vectors, periods=1, start_release_date=None, end_release_date=None)[source]

Get DataFrame of vectors with n periods data or over range of release dates.

Wrapper on get_bulk_vector_data_by_range and get_data_from_vectors_and_latest_n_periods function to turn the resulting list of JSONs into a DataFrame

Parameters
  • vectors (str or list of str) – vector numbers to get info for

  • periods (int) – number of periods to retrieve data for

  • start_release_date (datetime.date) – start release date for the data

  • end_release_date (datetime.date) – end release date for the data

Returns

df – vectors as columns and ref_date as the index (not release date)

Return type

DataFrame

stats_can.sc.vectors_to_df_local(vectors, path=None, start_date=None, h5file='stats_can.h5')[source]

Make a dataframe with vector columns indexed on date from local data.

Parameters
  • vectors (list) – list of vectors to be read in

  • path (str or path object, default None) – path to StatsCan tables

  • start_date (datetime, optional, default None) – optional earliest reference date to include

  • h5file (str, default stats_can.h5) – if specified will extract dataframes from an hdf5file instead of zipped csv tables

Returns

final_df – DataFrame of the vectors

Return type

pandas.DataFrame

stats_can.sc.zip_table_to_dataframe(table, path=None)[source]

Read a StatsCan table into a pandas DataFrame.

If a zip file of the table does not exist in path, downloads it

Parameters
  • table (str) – the table to load to dataframe from zipped csv

  • path (str or pathlib.Path, default: current working directory when module is loaded) – where to download the tables or load them

Returns

df – the table as a dataframe

Return type

pandas.DataFrame

stats_can.sc.zip_update_tables(path=None, csv=True)[source]

Check local json, update zips of outdated tables.

Grabs the json files in path, checks them against the metadata on StatsCan and grabs updated tables where there have been changes There isn’t actually a “last modified date” part to the metadata What I’m doing is comparing the latest reference period. Almost all data changes will at least include incremental releases, so this should capture what I want

Parameters
  • path (str or pathlib.Path, default: None) – where to look for tables to update

  • csv (boolean, default: True) – Downloads updates in CSV form by default, SDMX if false

Returns

update_table_list – list of the tables that were updated

Return type

list

stats_can.scwds module

Functions that allow the package to return exactly what the api gives.

https://www.statcan.gc.ca/eng/developers/wds/user-guide

Note: StatsCan uses cube/table interchangeably. I’m going to keep cube in my function names where it maps to their api but otherwise I will use table. Hence functions with cube in the function name will take tables as an argument I’m not sure which is less confusing, it’s annoying they weren’t just consistent.

stats_can.scwds.SC_URL

URL for the Statistics Canada REST api

Type

str

stats_can.scwds.get_bulk_vector_data_by_range(vectors, start_release_date, end_release_date)[source]

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-5

Parameters
  • vectors (str or list of str) – vector numbers to get info for

  • start_release_date (datetime.date) – start release date for the data

  • end_release_date (datetime.date) – end release date for the data

Returns

Return type

List of dicts containing data for each vector

stats_can.scwds.get_changed_cube_list(date=None)[source]

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a10-2

Parameters

date (datetime.date) – Date to check for table changes, defaults to current date

Returns

one for each table and when it was updated

Return type

list of dicts

stats_can.scwds.get_changed_series_data_from_cube_pid_coord()[source]

Not implemented yet

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-1

stats_can.scwds.get_changed_series_data_from_vector()[source]

Not implemented yet

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-2

stats_can.scwds.get_changed_series_list()[source]

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a10-1

Gets all series that were updated today.

Returns

one for each vector and when it was released

Return type

list of dicts

stats_can.scwds.get_code_sets()[source]

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a13-1

Gets all code sets which provide additional information to describe information and are grouped into scales, frequencies, symbols etc.

Returns

one dictionary for each group of information

Return type

list of dicts

stats_can.scwds.get_cube_metadata(tables)[source]

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a11-1

Take a list of tables and return a list of dictionaries with their metadata

Parameters

tables (str or list of str) – IDs of tables to get metadata for

Returns

one for each table with its metadata

Return type

list of dicts

stats_can.scwds.get_data_from_cube_pid_coord_and_latest_n_periods()[source]

Not implemented yet

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-3

stats_can.scwds.get_data_from_vectors_and_latest_n_periods(vectors, periods)[source]

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-4

Parameters
  • vectors (str or list of str) – vector numbers to get info for

  • periods (int) – number of periods (starting at latest) to retrieve data for

Returns

Return type

List of dicts containing data for each vector

stats_can.scwds.get_full_table_download(table, csv=True)[source]

Take a table name and return a url to a zipped file of that table.

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-6 https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-7

Parameters
  • table (str) – table name to download

  • csv (boolean, default True) – download in CSV format, if not download SDMX

Returns

path to the file download

Return type

str

stats_can.scwds.get_series_info_from_cube_pid_coord()[source]

Not implemented yet

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a11-2

stats_can.scwds.get_series_info_from_vector(vectors)[source]

https://www.statcan.gc.ca/eng/developers/wds/user-guide#a11-3

Parameters

vectors (str or list of str) – vector numbers to get info for

Returns

Return type

List of dicts containing metadata for each v#

Module contents

Read StatsCan Data into python, mostly pandas dataframes.