Further information on processing#
Generalities#
Processing consists in identifying significant rain event periods and evaluating reflectivity differences (at a relevant range for the studied station/radar) over these periods in order to monitor them.
Processed files are produced daily, whether there is a rain event identified or not.
Processing step is made by the command line ccres_disdrometer_processing process ; it executes the file preprocessed_file2processed.py that can be found in the processing sub-directory.
Block diagram of the processing#
Focus on the preprocessing step (needs to be slightly modified)
Recap of the inputs and outputs for the processing :#
Recap of the inputs for the processing : [To process day D ]
3 daily outputs from preprocessing for days D-1, D, D+1
configuration file for the station to process
Path where to save the output file
option no_meteo to downgrade the processing (i.e. not to use weather station data for some reason, even when it is available and used for preprocessing)
The output is a netCDF file, with three dimensions :
range, the vector of altitudes at which DCR data and Delta between DCR and disdrometer-modeled reflectivity data are provided
two time bases :
time, the same time vector as the one used in preprocessing output files, from 0:00 to 23:59 (UTC) with a 1-minute resolution
events, a dimension dedicated to the storage of identified significant rain events (see later). The shape corresponds to the number of events identified for the day to process. Events has a dimension of 0 if no event is identified.
Content of the output file#
The following variables can be found in the output processing file :
Variable |
Dimensions |
Description |
|---|---|---|
time |
time |
time |
range |
range |
ranges at which to consider DCR/disdrometer data comparison |
Zdcr |
time, range |
DCR reflectivity at the ranges specified in configuration file (3 in general) |
DVdcr |
time, range |
DCR Doppler velocity at the ranges specified in configuration file |
Zdd |
time |
Disdrometer forward-modeled reflectivity to use for comparison with DCR data |
fallspeed_dd |
time |
Average droplet fall speed seen by the disdrometer |
Delta_Z |
time, range |
Difference between DCR and disdrometer-modeled reflectivity (dBZ) |
flag_event |
time |
Flag to describe if a timestep belongs to a detected rainfall event |
ams_cp_since_event_begin |
time |
Pluviometer rain accumulation (in mm) since last event start |
disdro_cp_since_event_begin |
time |
Disdrometer rain accumulation (in mm) since last event start |
QF_rainfall_amount |
time |
Quality flag for minimum rainfall amount |
QC_pr |
time |
Quality check for rainfall rate |
QC_vdsd_t |
time |
Quality check for coherence between fall speed and DSD |
QC_ta |
time |
Quality check for air temperature |
QC_ws |
time |
Quality check for wind speed |
QC_wd |
time |
Quality check for wind direction |
QC_hur |
time |
Quality check for relative humidity |
QF_rg_dd |
time |
Quality flag for discrepancy between rain gauge and disdrometer |
QC_overall |
time |
Overall quality check |
events |
events |
Dimension for the storage of identified events |
start_event |
events |
Event start epoch |
end_event |
events |
Event end epoch |
event_length |
events |
Event duration |
rain_accumulation |
events |
AMS rain accumulation (in mm) over the whole event |
QF_rain_accumulation |
events |
Flag on event rain accumulation |
QF_rg_dd_event |
events |
Flag on deviation between rain gauge and disdrometer |
nb_dz_computable_pts |
events |
Number of timesteps at which Delta Z can be computed, for a given event |
QC_vdsd_t_ratio |
events |
Ratio of timesteps where check on relationship between fall speed and DSD is good |
QC_pr_ratio |
events |
Ratio of timesteps where precipitation rate QC is good |
QC_ta_ratio |
events |
Ratio of timesteps where air temperature QC is good |
QC_ws_ratio |
events |
Ratio of timesteps where wind speed QC is good |
QC_wd_ratio |
events |
Ratio of timesteps where wind direction QC is good |
QC_hur_ratio |
events |
Ratio of timesteps where relative humidity QC is good |
QC_overall_ratio |
events |
Ratio of timesteps where all checks are good |
good_points_number |
events |
Number of timesteps where all checks are good |
dZ_mean |
events |
Average value of Delta Z for good timesteps |
dZ_med |
events |
Median value of Delta Z for good timesteps |
dZ_q1 |
events |
First quartile of Delta Z distribution for good timesteps |
dZ_q3 |
events |
Third quartile of Delta Z distribution for good timesteps |
dZ_min |
events |
Minimum value of Delta Z for good timesteps |
dZ_max |
events |
Maximum value of Delta Z for good timesteps |
reg_slope |
events |
Slope of the linear regression Zdd/Zdcr for each event |
reg_intercept |
events |
Intercept of the linear regression Zdd/Zdcr for each event |
reg_score |
events |
R-squared of the linear regression Zdd/Zdcr for each event |
reg_rmse |
events |
RMSE of the linear regression Zdd/Zdcr for each event |
weather_data_used |
[0 or 1] : use of weather data for processing ? |
Processing steps#
Step 0 : concatenation of the preprocessing data files used as input (days D-1, D, D+1)
This operation is made by the function merge_preprocessed_data().
Step 1 : significant rain event selection
This selection is made by the function rain_event_selection().
The preprocessed data from day D-1 is included in the input data for rain event selection to avoid that a subset of an event detected during the processing of day D-1 is detected again during the processing of day D. Two variants of the rain event selection are used, depending on if weather data is available :
If we have weather data available after preprocessing, the algorithm is based on the rain accumulation data “seen” by the pluviometer.
Otherwise, or if no_meteo option is True, we use disdrometer rain accumulation data. But we have no insight on the revelance of this source of data (in general, pluviometer rain accumulation data is more accurate).
The criteria for rain event selection are the following :
Variables |
Thresholds |
Objective(s) |
|---|---|---|
Event duration |
> 3h |
Ensure that we have a significant event i.e. robust statistics on Delta Z |
Rain accumulation |
> 3mm |
Same |
Maximum time between two consecutive rain records |
60mn |
Ensure rain event continuity |
The efficiency of the algorithm as it is implemented now is probablu low. The implementation could be reviewed to enhance the efficiency, hopefully this step is not very time-consuming so this problem is not critical.
The algorithm outputs two lists, containing begin and end dates of the identified events. It outputs empty lists if no events satisfying the criteria are identified.
Step 2 : computation of output variables
Time variables produced from preprocessing file variables :
Zdcr, Dvdcr : extraction of DCR data on a small subset of range values (specified in configuration files)
Zdd : \(10 * log(Zdcr)\)
fallspeed_dd : 1-minute average droplet fall speed
Delta_Z := Zdcr – Zdd (i.e. calculé pour les valeurs de range spécifiées en configuration)
flag_event : 1 if the timestep belong to an identified event, 0 otherwise
ams_cp_since_event_begin / disdro_cp since_event_begin : rain accumulation since last start of an event ( useful for vizualisations notably). ams_cp_since_event_begin is a series of NaN if no weather station data is provided.
Computation of “Quality Checks” time series
These variables are filters that we used to determine at which time steps it is relevant to keep Delta_Z values for the monitoring of the calibration. They have a 1-minute sampling.
QC_pr : Control on the rainfall rate. Aim is to remove time periods with heavy rain to limit the risk or DCR saturation or wet radome. Whether pluviometer data is provided or not, the computation is the same and is based on disdrometer rainfall rate, which is available at 1-minute sampling. The threshold is given in configuration file, by default 3mm/h.
QC_vdsd_t : Control on the precipitation size distribution (PSD) given by the disdrometer. Aim is to check the consistency between droplets fall speed and drop size distribution, to remove snow situations. The values compared are :
on one hand, the weighted average of droplet fall speed seen by the disdrometer @ 1-minute, computed from drop size distribution and a fall speed model (Gun and Kinzer) ;
on the other hand, the average droplet fall speed computed from the disdrometer speed distribution.
If weather data is provided, further quality control can be performed on weather variables : QC_ta, QC_hur, QC_ws, QC_wd : checks based on air temperature, relative humidity, wind speed and wind direction. The thresholds used are given in configuration file for each station, for the moment the default values are :
température > 2° (remove snow cases)
relative humidity ∈ [80,99] % : avoid cases with evaporation (which induce modification of the droplet size distribution between the ground and the radar range) and fog
wind speed < 10m/s
wind direction : +- 45° from the normal to disdrometer optical axis (because the disdrometer operates optimally for droplets perpendicular to the optical axis)
To compute these quality controls :
(1) if weather data has a 1-minute sampling, the computation is easy, with a direct comparison with the threshold
(2) if weather data has a lower sampling (e.g., 10 minute @ Lindenberg), we use the closest value of the corresponding variable.
Global quality control to sum up all the quality checks : QC_overall
with weather data : QC_overall = QC_pr & QC_vdsd_t & QC_ta & QC_ws & QC_wd & QC_hur without weather data : QC_overall = QC_pr & QC_vdsd_t
The above table sums up the quality controls and thresholds implemented :
Variables |
Limits |
With WS and DD |
Only with DD |
Objectives |
|---|---|---|---|---|
Air temperature |
> 2°C |
✅ |
❌ |
Remove solid precipitation |
Relative humidity |
> 80% and < 98% |
✅ |
❌ |
Avoid fog cases, evaporation |
Wind speed |
< 1 h |
✅ |
❌ |
Ensure rain continuity |
Wind direction |
< 30% |
✅ |
❌ |
Quality control on disdrometer measurement |
Relationship fall speed/drop size |
< 30% vs Gun and Kinzer |
✅ |
✅ |
Ensure robustness of Delta Z statistics |
Precipitation rate |
< 3 mm/h |
✅ |
❌ |
Remove heavy rain cases |
Storage of macroscopic information on detected rain event
2 possible scenarii :
at least 1 event ending at day D is detected ;
no event ending at day D is detected
In both cases, a dimension “events” is initialized in the output Dataset, whose length is the number of detected events.
events is empty and et related variables (indexed on this dimension) are empty.
The following information is computed and stored :
start_event, end_event : begin and end dates of detected events
event_length : event duration in minutes
rain_accumulation / QF_rain accumulation : cumulated rainfall amount over the event / flag to ensure that the rainfall amount is significant enough (default threshold : 3mm). In latest version of the code, this control on rainfall amount is directly implemented in the rain event selection algorithm, so that detected events with lower amount than the threshold are thrown automatically. The variable QF_rain accumulation could be deleted.
QF_rg_dd_event : boolean, checks if the relative error between disdrometer and pluviometer rainfall amount is lower than a defined threshold (default : 30%, in configuration file). Aim is to flag events with high discrepancies between pluviometer and disdrometer when weather data is provided, to have a critical look on disdrometer data reliability.
nb_dz_computable_points : number of timesteps for which the variable Delta_Z has a finite value at the comparison range (DCR_DZ_RANGE) given in configuration file for each station and instrument setup. The choice of this comparison range follows a logic : it must not be “too high” to ensure representativeness of Droplet Size distribuution and minimize Delta_Z variability, and it must not be “too low” so that we avoid radar antenna near field effects. One needs to look at the DCR and disdrometer-modeled reflectivity data during significant rain periods in order to find a good trade-off for this comparison range.
QC_[var]_ratio: ratio of timesteps for which the quality control for the variable [var] is satisfied.
good_points_number : number of timesteps for which QC_overall is satisfied and with a finite value of Delta_Z (so, we have necessarily good_points_number \(\leq\) nb_dz_cmputable_points). It corresponds to the number of data points kept for Delta_Z analysis and monitoring i.e. the number of timesteps used to compute Delta_Z statistics over the event.
dZ_min / dZ_max / dZ_med / dZ_mean / dZ_q1 / dZ_q3 : min and max values, mean and quartiles for Delta_Z distribution over the event
reg_slope / reg_intercept / reg_score / reg_rmse : statistics of the linear regression Zdcr vs. Zdd (for the subset of points kept for the analysis i.e. with QC OK)
Add of attributes
See function add_attributes()
Considerations for the monitoring :#
First aim of the code is to provide long-term DCR calibration monitoring over time (for all the period of labeling of a station).
We can sweep all the events detected in daily prcessing files to gather all the significant rain events on the period for which we want to monitor the DCR calibration. Inside this set of events saved in the daily processing outputs, we only keep the subset of events which satisfy two conditions :
QF_rg_dd_event is verified (quality flag on consistency between disdrometer and pluviometer rainfall amounts, when weather is provided )
good_points_number \(\geq 50\) : ensure robustness of Delta_Z distribution over an event.
Remark : We may add a variable to flag this criterion directly in the processing files. A field called MIN_POINTS is already set in the station configuration files but is not used in the code yet.
Get tables for Quality checks and flags from CCRES presentations
Display a list of variable, dimensions, units in the netCDF daily output files
Section for monitoring products