API Reference
RainLoader (Unified Access)
- class filter_stations.datasets_loader.RainLoader(auth_session, repo_id='DeKUT-DSAIL/weather-data')
Bases:
objectExamples
>>> from filter_stations import RainLoader >>> read_token = '' # Request dsail-info@dkut.ac.ke to get a token to access the data >>> loader = RainLoader(token=read_token)
- get_dataset(dataset, start_date=None, end_date=None)
Main entry point to retrieve climate datasets (Gridded, Station, or Static).
- Parameters:
dataset (str) – Name of the dataset. Options include: - Gridded: ‘imerg’, ‘chirps’, ‘era5’, ‘tamsat’ - Station: ‘tahmo’ - Static: ‘topography’, ‘nasadem’
start_date (str, optional) – Start date (YYYY-MM-DD). Required for time-series datasets.
end_date (str, optional) – End date (YYYY-MM-DD). Required for time-series datasets.
- Returns:
The requested dataset.
- Return type:
xarray.Dataset
Examples
>>> # User gets TAHMO (Stations) >>> ds_stations = loader.get_dataset('TAHMO', '2024-01-01', '2024-01-30')
>>> # User gets IMERG (Grids) - Exact same interface >>> ds_tamsat = loader.get_dataset('tamsat', '2024-01-01', '2024-01-20') >>> ds_era5 = loader.get_dataset('era5', '2024-01-01', '2024-01-20') >>> ds_imerg = loader.get_dataset('imerg', '2024-01-01', '2024-01-20') >>> ds_chirps = loader.get_dataset('chirps', '2024-01-01', '2024-01-20')
>>> # User gets Topography (Static) >>> ds_topo = loader.get_dataset('topography')
TAHMO Data Access and Cleaning
- class filter_stations.filter_stations.RetrieveData(auth_session)
Bases:
object- aggregate_qualityflags(dataframe, freq='1D')
Aggregate quality flags in a DataFrame by day.
Parameters:
dataframe (pd.DataFrame): The DataFrame containing the measurements.
Returns:
pd.DataFrame: A DataFrame with aggregated quality flags, where values greater than 1 are rounded up.
- aggregate_variables(dataframe, freq='1D', method='sum')
Aggregates a pandas DataFrame of weather variables.
- Parameters:
dataframe (pandas.DataFrame) – DataFrame containing weather variable data.
freq (str, optional) – Frequency to aggregate by. Defaults to ‘1D’. Examples: ‘1H’ (hourly), ‘1D’ (daily), ‘1W’ (weekly).
method (str or callable, optional) –
Method to use for aggregation. Defaults to ‘sum’. Options: ‘sum’, ‘mean’, ‘min’, ‘max’.
Example of a custom method:
def custom_median(x): return np.nan if x.isnull().all() else x.median() data = aggregate_variables(df, freq='1D', method=custom_median)
- Returns:
pandas.DataFrame – DataFrame containing aggregated weather variable data.
Usage
—–
To aggregate data hourly:: – hourly_data = aggregate_variables(dataframe, freq=’1H’)
To use a custom aggregation method:: –
- def custom_median(x):
return np.nan if x.isnull().all() else x.median()
daily_median = aggregate_variables(df, freq=’1D’, method=custom_median)
- create_neighbor_graph(ds, threshold_km)
Creates a weighted graph of stations based on geographic proximity.
Parameters:
ds (xarray.Dataset): Dataset containing station coordinates threshold_km (float): Maximum connection distance in kilometers
Returns:
- nx.Graph: NetworkX graph with:
Nodes: Station IDs with latitude/longitude attributes
Edges: Connections between stations within threshold distance
Edge weights: Haversine distances in kilometers
- get_coordinates(station_sensor, normalize=False)
Retrieve longitudes, latitudes for a list of station_sensor names.
- Parameters:
station_sensor (list) – List of station_sensor names.
normalize (bool) – If True, normalize the coordinates using MinMaxScaler to the range (0,1).
- Returns:
pd.DataFrame – DataFrame containing longitude and latitude coordinates.
Usage
—–
To retrieve coordinates:: – start_date = ‘2023-01-01’ end_date = ‘2023-12-31’ country= ‘KE’
# get the precipitation data ke_pr = filt.filter_pr(start_date=start_date, end_date=end_date, country=’Kenya’)
# get the coordinates xs = ret.get_coordinates(ke_pr.columns, normalize=True)
- get_measurements(station, startDate=None, endDate=None, variables=None, dataset='controlled', aggregate='5min', quality_flags=False, quality_flags_filter=[1], method='sum')
Retrieve measurements from a station with fine control over time and quality.
Parameters: - station (str): Station ID. - startDate (str): Start datetime (supports full ISO format). - endDate (str): End datetime (inclusive). - variables (list): List of variable codes to retrieve. - dataset (str): ‘controlled’ or ‘raw’. - aggregate (str): Aggregation frequency (e.g., ‘5min’, ‘30min’). - quality_flags (bool): If True, return quality flags instead of values. - quality_flags_filter (list of int): Optional list of quality flag codes [1-4] to keep.
Returns: - pd.DataFrame: Time-indexed data.
- get_stations_info(station=None, multipleStations=[], countrycode=None, list_coords=None)
Retrieves information about weather stations from an API endpoint and returns relevant information based on the parameters passed to it.
Parameters:
station (str, optional): Code for a single station to retrieve information for. Defaults to None.
multipleStations (list, optional): List of station codes to retrieve information for multiple stations. Defaults to [].
countrycode (str, optional): Country code to retrieve information for all stations located in the country. Defaults to None.
list_coords (list, optional): List of coordinates to filter stations within a certain bounding box. Defaults to None.
Returns:
pandas.DataFrame: DataFrame containing information about the requested weather stations.
Usage:
To retrieve information about a single station:
`python station_info = ret.get_stations_info(station='TA00001') `To retrieve information about multiple stations:`python station_info = ret.get_stations_info(multipleStations=['TA00001', 'TA00002']) `To retrieve information about all stations in a country:`python station_info = ret.get_stations_info(countrycode='KE') `
- get_variables()
Retrieves information about available weather variables from an API endpoint.
Returns:
dict: Dictionary containing information about available weather variables, keyed by variable shortcode.
- get_variables_xarray(startDate, endDate, variables=None, stations_metadata=None, aggregate='5min', method='mean', quality_flags_filter=[1])
- k_neighbours(station, number=5)
- multiple_measurements(stations_list, startDate, endDate, variables, dataset='controlled', csv_file=None, aggregate='1D', quality_flags=False, num_workers=4)
Retrieves measurements for multiple stations within a specified date range.
- Parameters:
stations_list (list) – A list of strings containing the codes of the stations to retrieve data from.
startDate (str) – The start date for the measurements, in the format ‘yyyy-mm-dd’.
endDate (str) – The end date for the measurements, in the format ‘yyyy-mm-dd’.
variables (list) – A list of strings containing the names of the variables to retrieve.
dataset (str, optional) – The name of the database to retrieve the data from. Default is ‘controlled’, alternatively ‘raw’.
csv_file (str, optional) – Pass the name of the csv file to save the data, otherwise it will return the dataframe.
aggregate (str, optional) – Aggregation frequency. If ‘1D’, aggregate per day.
quality_flags (bool, optional) – If True, return quality flags instead of values.
num_workers (int, optional) – Number of parallel workers. Defaults to 4.
- Returns:
A DataFrame containing the aggregated data for all stations.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If stations_list is not a list.
Example
To retrieve precipitation data for stations in Kenya for the last week and save it as a csv file:
# Import the necessary modules from datetime import datetime, timedelta from filter_stations import RetrieveData # An instance of the RetrieveData class ret = RetrieveData(apiKey, apiSecret) # Get today's date today = datetime.now() last_week = today - timedelta(days=7) # Format date as a string last_week_str = last_week.strftime('%Y-%m-%d') today_str = today.strftime('%Y-%m-%d') # Define the list of stations stations = ['TA00001', 'TA00002'] variables = ['pr'] # Call the multiple_measurements method aggregated_data = ret.multiple_measurements( stations, last_week_str, today_str, variables, dataset='raw', csv_file='Kenya_precipitation_data', aggregate='1D' )
- multiple_qualityflags(stations_list, startDate, endDate, csv_file=None)
Retrieves and aggregates quality flag data for multiple stations within a specified date range.
- Parameters:
stations_list (list) – A list of station codes for which to retrieve data.
startDate (str) – The start date in ‘YYYY-MM-DD’ format.
endDate (str) – The end date in ‘YYYY-MM-DD’ format.
csv_file (str, optional) – The name of the CSV file to save the aggregated data. Default is None.
- Returns:
A DataFrame containing the aggregated quality flag data for the specified stations, or None if an error occurs.
- Return type:
pandas.DataFrame or None
- Raises:
ValueError – If stations_list is not a list.
- station_status()
Retrieves the status of all weather stations
Returns:
pandas.DataFrame: DataFrame containing the status of all weather stations.
- trained_models(columns=None)
Retrieves trained models from the MongoDB.
Parameters:
- columns (list of str, optional): List of column names to include in the returned DataFrame.
If None, all columns are included. Defaults to None.
Returns:
pandas.DataFrame: DataFrame containing trained models with the specified columns.
- filter_stations.filter_stations.process_station_wrapper(args)
Kieni Weather Station Data
- class filter_stations.kieni_data_access.Kieni(api_key, api_secret)
Bases:
object- kieni_weather_data(start_date=None, end_date=None, variable=None, method='sum', freq='1D')
Retrieves weather data from the Kieni API endpoint and returns it as a pandas DataFrame after processing.
- Parameters:
start_date (str, optional) – The start date for retrieving weather data in ‘YYYY-MM-DD’ format. Defaults to None (returns from the beginning of the data).
end_date (str, optional) – The end date for retrieving weather data in ‘YYYY-MM-DD’ format. Defaults to None (returns to the end of the data).
variable (str, optional) – The weather variable to retrieve (same as the weather shortcodes by TAHMO e.g., ‘pr’, ‘ap’, ‘rh’).
method (str, optional) – The aggregation method to apply to the data (‘sum’, ‘mean’, ‘min’, ‘max’ and custom functions). Defaults to ‘sum’.
freq (str, optional) – The frequency for data aggregation (e.g., ‘1D’ for daily, ‘1H’ for hourly). Defaults to ‘1D’.
- Returns:
DataFrame containing the weather data for the specified parameters, with columns containing NaN values dropped.
- Return type:
pandas.DataFrame
Examples
To retrieve daily rainfall data from January 1, 2024, to January 31, 2024:
# Instantiate the Kieni class api_key, api_secret = '', '' # Request DSAIL for the API key and secret kieni = Kieni(api_key, api_secret) kieni_weather_data = kieni.kieni_weather_data( start_date='2024-01-01', end_date='2024-01-31', variable='pr', freq='1D', method='sum' )
To retrieve hourly temperature data from February 1, 2024, to February 7, 2024:
kieni_weather_data = kieni.kieni_weather_data( start_date='2024-02-01', end_date='2024-02-07', variable='te', method='mean', freq='1H' )
Forecasting and Seasonality
Medium Range forecasting will be extracted from Google Weather API Seasonal Forecasting will be extracted from IRI
- class filter_stations.Forecasting.MediumForecaster(auth_session)
Bases:
objectHandles medium-range (1 to 10 days) weather forecasting by securely retrieving credentials and querying the Google Weather API.
- get_weather_api_forecast(lat, lon, days=None, hours=None)
Retrieves raw forecasting data from the Google Weather API. Automatically handles pagination. You must specify either ‘days’ (max 10) OR ‘hours’ (max 240).
- Parameters:
lat (float) – Latitude of the location.
lon (float) – Longitude of the location.
days (int, optional) – Number of days to forecast (1 to 10). Defaults to None.
hours (int, optional) – Number of hours to forecast (1 to 240). Defaults to None.
- Returns:
A dictionary containing the forecast data and timezone information.
- Return type:
dict
- Raises:
ValueError – If both days and hours are specified, or if values are out of range.
ConnectionError – If the API request fails.
- class filter_stations.Forecasting.SeasonalForecaster(auth_session=None, catalog_path=None)
Bases:
object- get_forecast(lat, lon, models=None, forecast_months=3, hourly=None, daily=None, weekly=None, monthly=None)
Fetches seasonal forecast data across any requested temporal resolution.
- Parameters:
lat (float) – Latitude of the location.
lon (float) – Longitude of the location.
models (list[str], optional) – List of weather models to use for the forecast. Defaults to None.
forecast_months (int, optional) – Number of months to forecast. Defaults to 3.
hourly (list[str], optional) – List of hourly variables to request.
daily (list[str], optional) – List of daily variables to request.
weekly (list[str], optional) – List of weekly variables to request.
monthly (list[str], optional) – List of monthly variables to request.
- Returns:
A list of dictionaries, where each dictionary represents the forecast data for a specific model and location. Keys include ‘latitude’, ‘longitude’, ‘model_index’, and DataFrames for requested timeframes.
- Return type:
list[dict]
- Raises:
ValueError – If no variables (hourly, daily, weekly, or monthly) are requested.
- list_aggregations()
Returns the raw FlatBuffer aggregation types supported by Open-Meteo.
- list_all_models()
Returns every model identifier defined in the Open-Meteo SDK.
- Returns:
A list of model names sorted alphabetically.
- Return type:
list[str]
- list_api_variables()
Returns the pre-formatted variable strings to pass directly to the API.
- Returns:
A dictionary of API variables available in the catalog.
- Return type:
dict
- list_base_variables()
Returns the raw physical variables defined in the Open-Meteo FlatBuffers.
- list_probabilities()
Returns all supported probability thresholds.
- list_units()
Returns all supported measurement units.