stats_can package¶
Submodules¶
stats_can.api_class module¶
Define a class wrapper for Stats Can functions.
-
class
stats_can.api_class.
StatsCan
(data_folder=None)[source]¶ Bases:
object
Load Statistics Canada data and metadata into python.
- Parameters
data_folder (Path/str, default None) –
location to save/search for locally stored Statistics Canada (The) –
tables. Defaults to the current working directory (data) –
-
delete_tables
(tables)[source]¶ Remove locally stored tables.
- Parameters
tables (str or [str]) – tables to delete
- Returns
- Return type
[deleted tables]
-
property
downloaded_tables
¶ Check which tables you’ve downloaded.
Checks the file “stats_can.h5” in the instantiated data folder and lists all tables stored there.
- Returns
- Return type
[table_ids]
-
static
get_code_sets
()[source]¶ Get code sets.
Code sets provide additional metadata to describe variables and are grouped into scales, frequencies, symbols etc.
- Returns
code_sets – one dictionary for each group of information
- Return type
[dict]
-
static
get_tables_for_vectors
(vectors)[source]¶ Find which table(s) a V# or list of V#s is from.
- Parameters
vectors (str or [str]) – V#(s) to look up tables for
- Returns
dictionary of vector (table pairs plus an)
”all_tables” key with a list of all tables
containing the input V#s
>>> StatsCan.get_tables_for_vectors("v39050") {39050: '10100139', 'all_tables': ['10100139']} >>> StatsCan..get_tables_for_vectors(["v39050", "v1074250274"]) {39050: '10100139', 1074250274: '16100011', 'all_tables': ['10100139', '16100011']}
-
table_to_df
(table)[source]¶ Read a table to a dataframe.
- Parameters
table (str) – The ID of the table of interest, e.g “271-000-22”
- Returns
pandas.DataFrame – Dataframe of the requested table
If the table has been previously loaded to the file in self.data_folder
it will retrieve that locally stored dataframe. If it’s unavailable it will
download it and then return the table. To update a locally stored table,
call StatsCan.update_tables(), optionally passing just the table number of interest
-
static
tables_updated_on_date
(date)[source]¶ Get a list of tables that were updated on a given date.
- Parameters
date (str or datetime.date) – The date to check tables
- Returns
changed_tables – one dictionary for each table with its update date
- Return type
[dict]
-
static
tables_updated_today
()[source]¶ Get a list of tables that were updated today.
- Returns
changed_tables – one dictionary for each table with its update date
- Return type
[dict]
-
update_tables
(tables=None)[source]¶ Update locally stored tables.
Compares latest available reference period in locally stored tables to the latest available on Statistics Canada and updates any tables necessary
- Parameters
tables (str or [str], default None) – Optional subset of tables to check for updates, defaults to update all downloaded tables
- Returns
- Return type
[str] list of tables that were updated, empty list if no updates made
-
static
vector_metadata
(vectors)[source]¶ Get metadata on vectors.
- Parameters
vectors (str or [str]) – V#(s) to retrieve metadata
- Returns
vector_metadata – list of dictionaries with one dict for each vector
- Return type
[dict]
-
vectors_to_df
(vectors, start_date=None)[source]¶ Get a dataframe of V#s.
- Parameters
vectors (str or [str]) – the V#s to retrieve
start_date (datetime.date, optional) – earliest reference period to return, defaults to all available history
- Returns
pandas.DataFrame – Dataframe indexed on reference date, with columns for each V# input
Note that any V#s in tables that are not currently locally stored will
have their tables downloaded prior to returning the dataframe
-
static
vectors_to_df_remote
(vectors, periods=1, start_release_date=None, end_release_date=None)[source]¶ Retrieve V# data directly from Statistics Canada.
- Parameters
vectors (str or [str]) – V#(s) to retrieve data for
periods (int, default 1) – Number of periods to retrieve data. Note that this will be ignored if start_release_date and end_release date are set
start_release_date (datetime.date, default None) – earliest release date to retrieve data
end_release_date (datetime.date, default None) – latest release date to retrieve data
- Returns
pandas.DataFrame – Dataframe indexed on reference (not release) date, with columns for each V# input
Note that start and end release date refer to the dates the data was released,
not the reference period they cover. For example. October labour force survey
data is released on the first or second Friday of November.
stats_can.helpers module¶
Helper functions that shouldn’t need to be directly called by an end user.
-
stats_can.helpers.
check_status
(results)[source]¶ Make sure list of results succeeded.
- Parameters
results (list of dicts, or dict) – JSON from an API call parsed as a dictionary
- Returns
results – JSON from an API call parsed as a dictionary
- Return type
list of dicts, or dict
-
stats_can.helpers.
chunk_vectors
(vectors)[source]¶ Break vectors into chunks small enough for the API (300 limit).
- Parameters
vectors (list of str or str) – A string or list of strings of vector names to be parsed
- Returns
chunks – lists of vectors in chunks
- Return type
list of lists of str
-
stats_can.helpers.
parse_tables
(tables)[source]¶ Parse string of table or tables to numeric.
Strip out hyphens or other non-numeric characters from a list of tables or a single table Table names in StatsCan often have a trailing -01 which isn’t necessary So also take just the first 8 characters. This function by no means guarantees you have a clean list of valid tables, but it’s a good start.
- Parameters
tables (list of str or str) – A string or list of strings of table names to be parsed
- Returns
tables with unnecessary characters removed
- Return type
list of str
-
stats_can.helpers.
parse_vectors
(vectors)[source]¶ Parse string of vector or vectors to numeric.
Strip out V from V#s. Similar to parse tables, this by no means guarantees a valid entry, just helps with some standard input formats
- Parameters
vectors (list of str or str) – A string or list of strings of vector names to be parsed
- Returns
vectors with unnecessary characters removed
- Return type
list of str
stats_can.sc module¶
Functionality that extends on what the base StatsCan api returns in some way.
-
stats_can.sc.
code_sets_to_df_dict
()[source]¶ Get all code sets.
Code sets provide additional metadata to describe information. Code sets are grouped into scales, frequencies, symbols etc. and returned as dictionary of dataframes.
- Returns
pandas.Dataframe – dictionary of dataframes
- Return type
list
-
stats_can.sc.
delete_tables
(tables, path=None, h5file='stats_can.h5', csv=True)[source]¶ Delete downloaded tables.
- Parameters
tables (list) – list of tables to delete
path (str or path object, default None) – where to look for the tables to delete
h5file (str default stats_can.h5) – h5file to remove from, set to None to remove zips
csv (boolean, default True) – if h5file is None this specifies whether to delete zipped csv or SDMX
- Returns
to_delete – list of deleted tables
- Return type
list
-
stats_can.sc.
download_tables
(tables, path=None, csv=True)[source]¶ Download a json file and zip of data for a list of tables to path.
- Parameters
tables (list of str) – tables to be downloaded
path (str or path object, default: None (will do current directory)) – Where to download the table and json
csv (boolean, default True) – download in CSV format, if not download SDMX
- Returns
downloaded – list of tables that were downloaded
- Return type
list
-
stats_can.sc.
get_tables_for_vectors
(vectors)[source]¶ Get a list of dicts mapping vectors to tables.
- Parameters
vectors (list of str or str) – Vectors to find tables for
- Returns
tables_list – keys for each vector number return the table, plus a key for ‘all_tables’ that has a list of unique tables used by vectors
- Return type
list of dict
-
stats_can.sc.
h5_included_keys
(h5file='stats_can.h5', path=None)[source]¶ Return a list of keys in an h5 file.
- Parameters
h5file (str, default stats_can.h5) – name of the h5file to store the tables in
path (str or path, default = current working directory) – path to the h5file
- Returns
keys – list of keys in the hdf5 file
- Return type
list
-
stats_can.sc.
h5_update_tables
(h5file='stats_can.h5', path=None, tables=None)[source]¶ Update any stats_can tables contained in an h5 file.
- Parameters
h5file (str, default stats_can.h5) – name of the h5file to store the tables in
path (str or path, default = current working directory) – path to the h5file
tables (str or list of str, optional, default None) – If included will only update the subset of tables already in the file and in the tables parameter
- Returns
update_table_list – List of tables that were updated
- Return type
[str]
-
stats_can.sc.
list_downloaded_tables
(path=None, h5file='stats_can.h5')[source]¶ Return a list of metadata for StatsCan tables.
Wrapper for list zipped tables and list h5 tables
- Parameters
path (str or path, default = current working directory) – path to the h5 file
h5file (str, default stats_can.h5) – name of the h5file to read table data from
- Returns
jsons – list of available tables json data
- Return type
list
-
stats_can.sc.
list_h5_tables
(path=None, h5file='stats_can.h5')[source]¶ Return a list of metadata for StatsCan tables from an hdf5 file.
- Parameters
path (str or path, default = current working directory) – path to the h5 file
h5file (str, default stats_can.h5) – name of the h5file to read table data from
- Returns
jsons – list of available tables json data
- Return type
list
-
stats_can.sc.
list_zipped_tables
(path=None)[source]¶ List StatsCan tables available.
defaults to looking in the current working directory and for zipped CSVs
- Parameters
path (string or path, default None) – Where to look for zipped tables
- Returns
tables – list of available tables json data
- Return type
list
-
stats_can.sc.
metadata_from_h5
(tables, h5file='stats_can.h5', path=None)[source]¶ Read table metadata from h5.
- Parameters
tables (str or list of str) – name of the tables to read
h5file (str, default stats_can.h5) – name of the h5file to retrieve the table from
path (str or path, default = current working directory) – path to the h5file
- Returns
- Return type
list of local table metadata
-
stats_can.sc.
table_from_h5
(table, h5file='stats_can.h5', path=None)[source]¶ Read a table from h5 to a dataframe.
- Parameters
table (str) – name of the table to read
h5file (str, default stats_can.h5) – name of the h5file to retrieve the table from
path (str or path, default = current working directory) – path to the h5file
- Returns
df – table in dataframe format
- Return type
pd.DataFrame
-
stats_can.sc.
table_subsets_from_vectors
(vectors)[source]¶ Get a list of dicts mapping tables to vectors.
- Parameters
vectors (list of str or str) – Vectors to find tables for
- Returns
tables_dict – keys for each table used by the vectors, matched to a list of vectors
- Return type
list of dict
-
stats_can.sc.
table_to_df
(table, path=None, h5file='stats_can.h5')[source]¶ Read a table to a dataframe.
Wrapper for table_from_h5 and zip_table_to_dataframe
- Parameters
table (str) – name of the table to read
h5file (str, default stats_can.h5) – name of the h5file to retrieve the table from, None for zip
path (str or path, default = current working directory) – path to the table data
- Returns
df – table in dataframe format
- Return type
pd.DataFrame
-
stats_can.sc.
tables_to_h5
(tables, h5file='stats_can.h5', path=None)[source]¶ Take a table and its metadata and put it in an hdf5 file.
- Parameters
tables (list of str) – tables to add to the h5file
h5file (str, default stats_can.h5) – name of the h5file to store the tables in
path (str or path, default = current working directory) – path to the h5file
- Returns
tables – list of tables loaded into the file
- Return type
list
-
stats_can.sc.
update_tables
(path=None, h5file='stats_can.h5', tables=None, csv=True)[source]¶ Update downloaded tables where required.
Reads local metadata, either from json files or stored in an h5 file, compares it to the metadata on the StatsCan website and downloads those tables that don’t have matching metadata
Just a wrapper for zip_update tables and h5_update_tables functions
- Parameters
path (str or path, default None) – Path where local tables are stored, assumes current directory if None
h5file (str, default 'stats_can.h5') – Name of the h5 file storing StatsCan tables, set to None for zips
tables (list of str, default None) – For hdf5 only, update only a subset of tables that are both in the file and specified by this argument, None means update all tables
csv (boolean, default True) – If updating zips this determines whether to update zipped CSV or SDMX
- Returns
update_table_list – list of updated tables
- Return type
list of str
-
stats_can.sc.
vectors_to_df
(vectors, periods=1, start_release_date=None, end_release_date=None)[source]¶ Get DataFrame of vectors with n periods data or over range of release dates.
Wrapper on get_bulk_vector_data_by_range and get_data_from_vectors_and_latest_n_periods function to turn the resulting list of JSONs into a DataFrame
- Parameters
vectors (str or list of str) – vector numbers to get info for
periods (int) – number of periods to retrieve data for
start_release_date (datetime.date) – start release date for the data
end_release_date (datetime.date) – end release date for the data
- Returns
df – vectors as columns and ref_date as the index (not release date)
- Return type
DataFrame
-
stats_can.sc.
vectors_to_df_local
(vectors, path=None, start_date=None, h5file='stats_can.h5')[source]¶ Make a dataframe with vector columns indexed on date from local data.
- Parameters
vectors (list) – list of vectors to be read in
path (str or path object, default None) – path to StatsCan tables
start_date (datetime, optional, default None) – optional earliest reference date to include
h5file (str, default stats_can.h5) – if specified will extract dataframes from an hdf5file instead of zipped csv tables
- Returns
final_df – DataFrame of the vectors
- Return type
pandas.DataFrame
-
stats_can.sc.
zip_table_to_dataframe
(table, path=None)[source]¶ Read a StatsCan table into a pandas DataFrame.
If a zip file of the table does not exist in path, downloads it
- Parameters
table (str) – the table to load to dataframe from zipped csv
path (str or pathlib.Path, default: current working directory when module is loaded) – where to download the tables or load them
- Returns
df – the table as a dataframe
- Return type
pandas.DataFrame
-
stats_can.sc.
zip_update_tables
(path=None, csv=True)[source]¶ Check local json, update zips of outdated tables.
Grabs the json files in path, checks them against the metadata on StatsCan and grabs updated tables where there have been changes There isn’t actually a “last modified date” part to the metadata What I’m doing is comparing the latest reference period. Almost all data changes will at least include incremental releases, so this should capture what I want
- Parameters
path (str or pathlib.Path, default: None) – where to look for tables to update
csv (boolean, default: True) – Downloads updates in CSV form by default, SDMX if false
- Returns
update_table_list – list of the tables that were updated
- Return type
list
stats_can.scwds module¶
Functions that allow the package to return exactly what the api gives.
https://www.statcan.gc.ca/eng/developers/wds/user-guide
Note: StatsCan uses cube/table interchangeably. I’m going to keep cube in my function names where it maps to their api but otherwise I will use table. Hence functions with cube in the function name will take tables as an argument I’m not sure which is less confusing, it’s annoying they weren’t just consistent.
-
stats_can.scwds.
SC_URL
¶ URL for the Statistics Canada REST api
- Type
str
-
stats_can.scwds.
get_bulk_vector_data_by_range
(vectors, start_release_date, end_release_date)[source]¶ https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-5
- Parameters
vectors (str or list of str) – vector numbers to get info for
start_release_date (datetime.date) – start release date for the data
end_release_date (datetime.date) – end release date for the data
- Returns
- Return type
List of dicts containing data for each vector
-
stats_can.scwds.
get_changed_cube_list
(date=None)[source]¶ https://www.statcan.gc.ca/eng/developers/wds/user-guide#a10-2
- Parameters
date (datetime.date) – Date to check for table changes, defaults to current date
- Returns
one for each table and when it was updated
- Return type
list of dicts
-
stats_can.scwds.
get_changed_series_data_from_cube_pid_coord
()[source]¶ Not implemented yet
https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-1
-
stats_can.scwds.
get_changed_series_data_from_vector
()[source]¶ Not implemented yet
https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-2
-
stats_can.scwds.
get_changed_series_list
()[source]¶ https://www.statcan.gc.ca/eng/developers/wds/user-guide#a10-1
Gets all series that were updated today.
- Returns
one for each vector and when it was released
- Return type
list of dicts
-
stats_can.scwds.
get_code_sets
()[source]¶ https://www.statcan.gc.ca/eng/developers/wds/user-guide#a13-1
Gets all code sets which provide additional information to describe information and are grouped into scales, frequencies, symbols etc.
- Returns
one dictionary for each group of information
- Return type
list of dicts
-
stats_can.scwds.
get_cube_metadata
(tables)[source]¶ https://www.statcan.gc.ca/eng/developers/wds/user-guide#a11-1
Take a list of tables and return a list of dictionaries with their metadata
- Parameters
tables (str or list of str) – IDs of tables to get metadata for
- Returns
one for each table with its metadata
- Return type
list of dicts
-
stats_can.scwds.
get_data_from_cube_pid_coord_and_latest_n_periods
()[source]¶ Not implemented yet
https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-3
-
stats_can.scwds.
get_data_from_vectors_and_latest_n_periods
(vectors, periods)[source]¶ https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-4
- Parameters
vectors (str or list of str) – vector numbers to get info for
periods (int) – number of periods (starting at latest) to retrieve data for
- Returns
- Return type
List of dicts containing data for each vector
-
stats_can.scwds.
get_full_table_download
(table, csv=True)[source]¶ Take a table name and return a url to a zipped file of that table.
https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-6 https://www.statcan.gc.ca/eng/developers/wds/user-guide#a12-7
- Parameters
table (str) – table name to download
csv (boolean, default True) – download in CSV format, if not download SDMX
- Returns
path to the file download
- Return type
str
-
stats_can.scwds.
get_series_info_from_cube_pid_coord
()[source]¶ Not implemented yet
https://www.statcan.gc.ca/eng/developers/wds/user-guide#a11-2
-
stats_can.scwds.
get_series_info_from_vector
(vectors)[source]¶ https://www.statcan.gc.ca/eng/developers/wds/user-guide#a11-3
- Parameters
vectors (str or list of str) – vector numbers to get info for
- Returns
- Return type
List of dicts containing metadata for each v#
Module contents¶
Read StatsCan Data into python, mostly pandas dataframes.