API Reference

RainLoader (Unified Access)

class filter_stations.datasets_loader.RainLoader(auth_session, repo_id='DeKUT-DSAIL/weather-data')

Bases: object

Examples

>>> from filter_stations import RainLoader
>>> read_token = '' # Request dsail-info@dkut.ac.ke to get a token to access the data
>>> loader = RainLoader(token=read_token)

get_dataset(dataset, start_date=None, end_date=None)

Main entry point to retrieve climate datasets (Gridded, Station, or Static).

Parameters:

dataset (str) – Name of the dataset. Options include: - Gridded: ‘imerg’, ‘chirps’, ‘era5’, ‘tamsat’ - Station: ‘tahmo’ - Static: ‘topography’, ‘nasadem’
start_date (str, optional) – Start date (YYYY-MM-DD). Required for time-series datasets.
end_date (str, optional) – End date (YYYY-MM-DD). Required for time-series datasets.

Returns:

The requested dataset.

Return type:

xarray.Dataset

Examples

>>> # User gets TAHMO (Stations)
>>> ds_stations = loader.get_dataset('TAHMO', '2024-01-01', '2024-01-30')

>>> # User gets IMERG (Grids) - Exact same interface
>>> ds_tamsat = loader.get_dataset('tamsat', '2024-01-01', '2024-01-20')
>>> ds_era5 = loader.get_dataset('era5', '2024-01-01', '2024-01-20')
>>> ds_imerg = loader.get_dataset('imerg', '2024-01-01', '2024-01-20')
>>> ds_chirps = loader.get_dataset('chirps', '2024-01-01', '2024-01-20')

>>> # User gets Topography (Static)
>>> ds_topo = loader.get_dataset('topography')

TAHMO Data Access and Cleaning

class filter_stations.filter_stations.RetrieveData(auth_session)

Bases: object

aggregate_qualityflags(dataframe, freq='1D')

Aggregate quality flags in a DataFrame by day.

Parameters:

dataframe (pd.DataFrame): The DataFrame containing the measurements.

Returns:

pd.DataFrame: A DataFrame with aggregated quality flags, where values greater than 1 are rounded up.

aggregate_variables(dataframe, freq='1D', method='sum')

Aggregates a pandas DataFrame of weather variables.

Parameters:

dataframe (pandas.DataFrame) – DataFrame containing weather variable data.
freq (str, optional) – Frequency to aggregate by. Defaults to ‘1D’. Examples: ‘1H’ (hourly), ‘1D’ (daily), ‘1W’ (weekly).
method (str or callable, optional) –
Method to use for aggregation. Defaults to ‘sum’. Options: ‘sum’, ‘mean’, ‘min’, ‘max’.

Example of a custom method:
```
def custom_median(x):
    return np.nan if x.isnull().all() else x.median()

data = aggregate_variables(df, freq='1D', method=custom_median)
```

Returns:

pandas.DataFrame – DataFrame containing aggregated weather variable data.
Usage
—–
To aggregate data hourly:: – hourly_data = aggregate_variables(dataframe, freq=’1H’)
To use a custom aggregation method:: –

def custom_median(x):
return np.nan if x.isnull().all() else x.median()

daily_median = aggregate_variables(df, freq=’1D’, method=custom_median)

create_neighbor_graph(ds, threshold_km)

Creates a weighted graph of stations based on geographic proximity.

Parameters:

ds (xarray.Dataset): Dataset containing station coordinates threshold_km (float): Maximum connection distance in kilometers

Returns:

nx.Graph: NetworkX graph with:

Nodes: Station IDs with latitude/longitude attributes
Edges: Connections between stations within threshold distance
Edge weights: Haversine distances in kilometers

get_coordinates(station_sensor, normalize=False)

Retrieve longitudes, latitudes for a list of station_sensor names.

Parameters:

station_sensor (list) – List of station_sensor names.
normalize (bool) – If True, normalize the coordinates using MinMaxScaler to the range (0,1).

Returns:

pd.DataFrame – DataFrame containing longitude and latitude coordinates.
Usage
—–
To retrieve coordinates:: – start_date = ‘2023-01-01’ end_date = ‘2023-12-31’ country= ‘KE’

# get the precipitation data ke_pr = filt.filter_pr(start_date=start_date, end_date=end_date, country=’Kenya’)

# get the coordinates xs = ret.get_coordinates(ke_pr.columns, normalize=True)

get_measurements(station, startDate=None, endDate=None, variables=None, dataset='controlled', aggregate='5min', quality_flags=False, quality_flags_filter=[1], method='sum')

Retrieve measurements from a station with fine control over time and quality.

Parameters: - station (str): Station ID. - startDate (str): Start datetime (supports full ISO format). - endDate (str): End datetime (inclusive). - variables (list): List of variable codes to retrieve. - dataset (str): ‘controlled’ or ‘raw’. - aggregate (str): Aggregation frequency (e.g., ‘5min’, ‘30min’). - quality_flags (bool): If True, return quality flags instead of values. - quality_flags_filter (list of int): Optional list of quality flag codes [1-4] to keep.

Returns: - pd.DataFrame: Time-indexed data.

get_stations_info(station=None, multipleStations=[], countrycode=None, list_coords=None)

Retrieves information about weather stations from an API endpoint and returns relevant information based on the parameters passed to it.

Parameters:

station (str, optional): Code for a single station to retrieve information for. Defaults to None.
multipleStations (list, optional): List of station codes to retrieve information for multiple stations. Defaults to [].
countrycode (str, optional): Country code to retrieve information for all stations located in the country. Defaults to None.
list_coords (list, optional): List of coordinates to filter stations within a certain bounding box. Defaults to None.

Returns:

pandas.DataFrame: DataFrame containing information about the requested weather stations.

Usage:

To retrieve information about a single station: `python station_info = ret.get_stations_info(station='TA00001') ` To retrieve information about multiple stations: `python station_info = ret.get_stations_info(multipleStations=['TA00001', 'TA00002']) ` To retrieve information about all stations in a country: `python station_info = ret.get_stations_info(countrycode='KE') `

get_variables()

Retrieves information about available weather variables from an API endpoint.

Returns:

dict: Dictionary containing information about available weather variables, keyed by variable shortcode.

get_variables_xarray(startDate, endDate, variables=None, stations_metadata=None, aggregate='5min', method='mean', quality_flags_filter=[1])

k_neighbours(station, number=5)

multiple_measurements(stations_list, startDate, endDate, variables, dataset='controlled', csv_file=None, aggregate='1D', quality_flags=False, num_workers=4)

Retrieves measurements for multiple stations within a specified date range.

Parameters:

stations_list (list) – A list of strings containing the codes of the stations to retrieve data from.
startDate (str) – The start date for the measurements, in the format ‘yyyy-mm-dd’.
endDate (str) – The end date for the measurements, in the format ‘yyyy-mm-dd’.
variables (list) – A list of strings containing the names of the variables to retrieve.
dataset (str, optional) – The name of the database to retrieve the data from. Default is ‘controlled’, alternatively ‘raw’.
csv_file (str, optional) – Pass the name of the csv file to save the data, otherwise it will return the dataframe.
aggregate (str, optional) – Aggregation frequency. If ‘1D’, aggregate per day.
quality_flags (bool, optional) – If True, return quality flags instead of values.
num_workers (int, optional) – Number of parallel workers. Defaults to 4.

Returns:

A DataFrame containing the aggregated data for all stations.

Return type:

pandas.DataFrame

Raises:

ValueError – If stations_list is not a list.

Example

To retrieve precipitation data for stations in Kenya for the last week and save it as a csv file:

# Import the necessary modules
from datetime import datetime, timedelta
from filter_stations import RetrieveData

# An instance of the RetrieveData class
ret = RetrieveData(apiKey, apiSecret)

# Get today's date
today = datetime.now()
last_week = today - timedelta(days=7)

# Format date as a string
last_week_str = last_week.strftime('%Y-%m-%d')
today_str = today.strftime('%Y-%m-%d')

# Define the list of stations
stations = ['TA00001', 'TA00002']
variables = ['pr']

# Call the multiple_measurements method
aggregated_data = ret.multiple_measurements(
    stations, last_week_str, today_str, variables,
    dataset='raw', csv_file='Kenya_precipitation_data', aggregate='1D'
)

multiple_qualityflags(stations_list, startDate, endDate, csv_file=None)

Retrieves and aggregates quality flag data for multiple stations within a specified date range.

Parameters:

stations_list (list) – A list of station codes for which to retrieve data.
startDate (str) – The start date in ‘YYYY-MM-DD’ format.
endDate (str) – The end date in ‘YYYY-MM-DD’ format.
csv_file (str, optional) – The name of the CSV file to save the aggregated data. Default is None.

Returns:

A DataFrame containing the aggregated quality flag data for the specified stations, or None if an error occurs.

Return type:

pandas.DataFrame or None

Raises:

ValueError – If stations_list is not a list.

station_status()

Retrieves the status of all weather stations

Returns:

pandas.DataFrame: DataFrame containing the status of all weather stations.

trained_models(columns=None)

Retrieves trained models from the MongoDB.

Parameters:

columns (list of str, optional): List of column names to include in the returned DataFrame.
If None, all columns are included. Defaults to None.

Returns:

pandas.DataFrame: DataFrame containing trained models with the specified columns.

filter_stations.filter_stations.process_station_wrapper(args)

Kieni Weather Station Data

class filter_stations.kieni_data_access.Kieni(api_key, api_secret)

Bases: object

kieni_weather_data(start_date=None, end_date=None, variable=None, method='sum', freq='1D')

Retrieves weather data from the Kieni API endpoint and returns it as a pandas DataFrame after processing.

Parameters:

start_date (str, optional) – The start date for retrieving weather data in ‘YYYY-MM-DD’ format. Defaults to None (returns from the beginning of the data).
end_date (str, optional) – The end date for retrieving weather data in ‘YYYY-MM-DD’ format. Defaults to None (returns to the end of the data).
variable (str, optional) – The weather variable to retrieve (same as the weather shortcodes by TAHMO e.g., ‘pr’, ‘ap’, ‘rh’).
method (str, optional) – The aggregation method to apply to the data (‘sum’, ‘mean’, ‘min’, ‘max’ and custom functions). Defaults to ‘sum’.
freq (str, optional) – The frequency for data aggregation (e.g., ‘1D’ for daily, ‘1H’ for hourly). Defaults to ‘1D’.

Returns:

DataFrame containing the weather data for the specified parameters, with columns containing NaN values dropped.

Return type:

pandas.DataFrame

Examples

To retrieve daily rainfall data from January 1, 2024, to January 31, 2024:

# Instantiate the Kieni class
api_key, api_secret = '', '' # Request DSAIL for the API key and secret
kieni = Kieni(api_key, api_secret)

kieni_weather_data = kieni.kieni_weather_data(
    start_date='2024-01-01',
    end_date='2024-01-31',
    variable='pr',
    freq='1D',
    method='sum'
)

To retrieve hourly temperature data from February 1, 2024, to February 7, 2024:

kieni_weather_data = kieni.kieni_weather_data(
    start_date='2024-02-01',
    end_date='2024-02-07',
    variable='te',
    method='mean',
    freq='1H'
)

Forecasting and Seasonality

Medium Range forecasting will be extracted from Google Weather API Seasonal Forecasting will be extracted from IRI

class filter_stations.Forecasting.MediumForecaster(auth_session)

Bases: object

Handles medium-range (1 to 10 days) weather forecasting by securely retrieving credentials and querying the Google Weather API.

get_weather_api_forecast(lat, lon, days=None, hours=None)

Retrieves raw forecasting data from the Google Weather API. Automatically handles pagination. You must specify either ‘days’ (max 10) OR ‘hours’ (max 240).

Parameters:

lat (float) – Latitude of the location.
lon (float) – Longitude of the location.
days (int, optional) – Number of days to forecast (1 to 10). Defaults to None.
hours (int, optional) – Number of hours to forecast (1 to 240). Defaults to None.

Returns:

A dictionary containing the forecast data and timezone information.

Return type:

dict

Raises:

ValueError – If both days and hours are specified, or if values are out of range.
ConnectionError – If the API request fails.

class filter_stations.Forecasting.SeasonalForecaster(auth_session=None, catalog_path=None)

Bases: object

get_forecast(lat, lon, models=None, forecast_months=3, hourly=None, daily=None, weekly=None, monthly=None)

Fetches seasonal forecast data across any requested temporal resolution.

Parameters:

lat (float) – Latitude of the location.
lon (float) – Longitude of the location.
models (list[str], optional) – List of weather models to use for the forecast. Defaults to None.
forecast_months (int, optional) – Number of months to forecast. Defaults to 3.
hourly (list[str], optional) – List of hourly variables to request.
daily (list[str], optional) – List of daily variables to request.
weekly (list[str], optional) – List of weekly variables to request.
monthly (list[str], optional) – List of monthly variables to request.

Returns:

A list of dictionaries, where each dictionary represents the forecast data for a specific model and location. Keys include ‘latitude’, ‘longitude’, ‘model_index’, and DataFrames for requested timeframes.

Return type:

list[dict]

Raises:

ValueError – If no variables (hourly, daily, weekly, or monthly) are requested.

list_aggregations(): Returns the raw FlatBuffer aggregation types supported by Open-Meteo.

list_all_models()

Returns every model identifier defined in the Open-Meteo SDK.

Returns:: A list of model names sorted alphabetically.
Return type:: list[str]

list_api_variables()

Returns the pre-formatted variable strings to pass directly to the API.

Returns:: A dictionary of API variables available in the catalog.
Return type:: dict

list_base_variables(): Returns the raw physical variables defined in the Open-Meteo FlatBuffers.

list_probabilities(): Returns all supported probability thresholds.

list_units(): Returns all supported measurement units.