Skip to content

Conversion Tools (netCDF)

Evan Thomas edited this page Jan 21, 2019 · 2 revisions

Original Author: R. J. Barnes (JHU/APL)

Converting to netCDF

The netCDF format has no concept of a "record". A netCDF dataset is just a collection of arrays of fixed length, named, dimensions. One dimension can be defined as "unlimited". The unlimited dimension does not have a fixed length and it's index can effectively be thought of as the record number.

In translating dmap files, each record in the file is mapped to an index of the unlimited dimension;scalars become 1 dimensional arrays, and arrays gain one extra dimension.

NetCDF has a simple text based scripting language called CDL (Common Data form Language) that can be used to describe the layout of a dataset. The utility ncgen can convert a CDL file to an empty dataset ready to store data.

The tool dmaptocdl first analyses a dmap file to determine the size of the arrays required and produces an outline CDL file. The user can edit this CDL file to correct any problems and add any meta-data (attributes), before creating an empty CDF file using ncgen. The user then populates the dataset by using dmaptoncdf.

Let's go through this process step by step:

dmaptocdl inputfile cdlfile mapfile

The first step is to analyse the data file using dmaptocdl:

dmaptocdl 20021219.kap.fitacf fitacf.cdl fitacf.cdfmap

The program takes one input file (20030120k.fitacf in this example), and produces two output files (fitacf.cdl and fitacf.cdfmap). The first of the output files is the CDL framework for the dataset. The second file maps the dmap variables to there CDF equivalent in the CDL file. For the example above the CDL file looks like this:

cat fitacf.cdl
netcdf dmap {

dimensions:
	block=unlimited;
	combf=256;
	ptab=7;
	ltab=2;
	ltab002=18;
	pwr0=70;
	slist=67;
	nlag=67;
	qflg=67;
	gflg=67;
	p_l=67;
	p_l_e=67;
	p_s=67;
	p_s_e=67;
	v=67;
	v_e=67;
	w_l=67;
	w_l_e=67;
	w_s=67;
	w_s_e=67;
	sd_l=67;
	sd_s=67;
	sd_phi=67;
	x_qflg=58;
	x_gflg=58;
	x_p_l=58;
	x_p_l_e=58;
	x_p_s=58;
	x_p_s_e=58;
	x_v=58;
	x_v_e=58;
	x_w_l=58;
	x_w_l_e=58;
	x_w_s=58;
	x_w_s_e=58;
	phi0=58;
	phi0_e=58;
	elv=58;
	elv_low=58;
	elv_high=58;
	x_sd_l=58;
	x_sd_s=58;
	x_sd_phi=58;
variables:
	char radar_revision_major(block);
	char radar_revision_minor(block);
	short cp(block);
	short stid(block);
	short time_yr(block);
	short time_mo(block);
	short time_dy(block);
	short time_hr(block);
	short time_mt(block);
	short time_sc(block);
	short time_us(block);
	short txpow(block);
	short nave(block);
	short atten(block);
	short lagfr(block);
	short smsep(block);
	short ercod(block);
	short stat_agc(block);
	short stat_lopwr(block);
	float noise_search(block);
	float noise_mean(block);
	short channel(block);
	short bmnum(block);
	short scan(block);
	short offset(block);
	short rxrise(block);
	short intt_sc(block);
	short intt_us(block);
	short txpl(block);
	short mpinc(block);
	short mppul(block);
	short mplgs(block);
	short nrang(block);
	short frang(block);
	short rsep(block);
	short xcf(block);
	short tfreq(block);
	int mxpwr(block);
	int lvmax(block);
	int fitacf_revision_major(block);
	int fitacf_revision_minor(block);
	char combf(block,combf);
	float noise_sky(block);
	float noise_lag0(block);
	float noise_vel(block);
	short ptab(block,ptab);
	short ltab(block,ltab,ltab002);
	float pwr0(block,pwr0);
	short slist(block,slist);
	short nlag(block,nlag);
	char qflg(block,qflg);
	char gflg(block,gflg);
	float p_l(block,p_l);
	float p_l_e(block,p_l_e);
	float p_s(block,p_s);
	float p_s_e(block,p_s_e);
	float v(block,v);
	float v_e(block,v_e);
	float w_l(block,w_l);
	float w_l_e(block,w_l_e);
	float w_s(block,w_s);
	float w_s_e(block,w_s_e);
	float sd_l(block,sd_l);
	float sd_s(block,sd_s);
	float sd_phi(block,sd_phi);
	char x_qflg(block,x_qflg);
	char x_gflg(block,x_gflg);
	float x_p_l(block,x_p_l);
	float x_p_l_e(block,x_p_l_e);
	float x_p_s(block,x_p_s);
	float x_p_s_e(block,x_p_s_e);
	float x_v(block,x_v);
	float x_v_e(block,x_v_e);
	float x_w_l(block,x_w_l);
	float x_w_l_e(block,x_w_l_e);
	float x_w_s(block,x_w_s);
	float x_w_s_e(block,x_w_s_e);
	float phi0(block,phi0);
	float phi0_e(block,phi0_e);
	float elv(block,elv);
	float elv_low(block,elv_low);
	float elv_high(block,elv_high);
	float x_sd_l(block,x_sd_l);
	float x_sd_s(block,x_sd_s);
	float x_sd_phi(block,x_sd_phi);
}

The converter has done the best job it can, but the CDL file needs some work. The major problem is that dmap files can have independent dimensions for each variable and those dimensions can vary between records. The converter assumes that each variable has its own set of dimensions and names them accordingly. In the case of a fitacf file, the dmap dimensions are not independent, in fact only one dimension, used for the range, is required. The other problem is that in this particular fitacf file, the maximum number of stored ranges is 69, when in fact the theoretical maximum number is 75. For consistency, the dimensions of the pulse table and lag table have also been adjusted and the names of the lag table dimensions have also been changed to make more sense.

Below is the corrected CDL file:

cat fitacf.cdl.corrected
netcdf dmap {

dimensions:
	block=unlimited;
	combf=256;
	ptab=16;
	ltabX=2;
	ltabY=22;
	range=75;
variables:
	char radar_revision_major(block);
	char radar_revision_minor(block);
	short cp(block);
	short stid(block);
	short time_yr(block);
	short time_mo(block);
	short time_dy(block);
	short time_hr(block);
	short time_mt(block);
	short time_sc(block);
	short time_us(block);
	short txpow(block);
	short nave(block);
	short atten(block);
	short lagfr(block);
	short smsep(block);
	short ercod(block);
	short stat_agc(block);
	short stat_lopwr(block);
	float noise_search(block);
	float noise_mean(block);
	short channel(block);
	short bmnum(block);
	short scan(block);
	short offset(block);
	short rxrise(block);
	short intt_sc(block);
	short intt_us(block);
	short txpl(block);
	short mpinc(block);
	short mppul(block);
	short mplgs(block);
	short nrang(block);
	short frang(block);
	short rsep(block);
	short xcf(block);
	short tfreq(block);
	int mxpwr(block);
	int lvmax(block);
	int fitacf_revision_major(block);
	int fitacf_revision_minor(block);
	char combf(block,combf);
	float noise_sky(block);
	float noise_lag0(block);
	float noise_vel(block);
	short ptab(block,ptab);
	short ltab(block,ltabX,ltabY);
	float pwr0(block,range);
	short slist(block,range);
	short nlag(block,range);
	char qflg(block,range);
	char gflg(block,range);
	float p_l(block,range);
	float p_l_e(block,range);
	float p_s(block,range);
	float p_s_e(block,range);
	float v(block,range);
	float v_e(block,range);
	float w_l(block,range);
	float w_l_e(block,range);
	float w_s(block,range);
	float w_s_e(block,range);
	float sd_l(block,range);
	float sd_s(block,range);
	float sd_phi(block,range);
	char x_qflg(block,range);
	char x_gflg(block,range);
	float x_p_l(block,range);
	float x_p_l_e(block,range);
	float x_p_s(block,range);
	float x_p_s_e(block,range);
	float x_v(block,range);
	float x_v_e(block,range);
	float x_w_l(block,range);
	float x_w_l_e(block,range);
	float x_w_s(block,range);
	float x_w_s_e(block,range);
	float phi0(block,range);
	float phi0_e(block,range);
	float elv(block,range);
	float elv_low(block,range);
	float elv_high(block,range);
	float x_sd_l(block,range);
	float x_sd_s(block,range);
	float x_sd_phi(block,range);
}

If we wanted to we could also define attributes for the variables in the CDL file:

v:units="meters per second";
p_l:units="decibels";
w_l:units="meters per second";

The corresponding cdfmap file looks like this:

cat fitacf.cdfmap
	char 0 "radar.revision.major"=radar_revision_major;
	char 0 "radar.revision.minor"=radar_revision_minor;
	short 0 "cp"=cp;
	short 0 "stid"=stid;
	short 0 "time.yr"=time_yr;
	short 0 "time.mo"=time_mo;
	short 0 "time.dy"=time_dy;
	short 0 "time.hr"=time_hr;
	short 0 "time.mt"=time_mt;
	short 0 "time.sc"=time_sc;
	short 0 "time.us"=time_us;
	short 0 "txpow"=txpow;
	short 0 "nave"=nave;
	short 0 "atten"=atten;
	short 0 "lagfr"=lagfr;
	short 0 "smsep"=smsep;
	short 0 "ercod"=ercod;
	short 0 "stat.agc"=stat_agc;
	short 0 "stat.lopwr"=stat_lopwr;
	float 0 "noise.search"=noise_search;
	float 0 "noise.mean"=noise_mean;
	short 0 "channel"=channel;
	short 0 "bmnum"=bmnum;
	short 0 "scan"=scan;
	short 0 "offset"=offset;
	short 0 "rxrise"=rxrise;
	short 0 "intt.sc"=intt_sc;
	short 0 "intt.us"=intt_us;
	short 0 "txpl"=txpl;
	short 0 "mpinc"=mpinc;
	short 0 "mppul"=mppul;
	short 0 "mplgs"=mplgs;
	short 0 "nrang"=nrang;
	short 0 "frang"=frang;
	short 0 "rsep"=rsep;
	short 0 "xcf"=xcf;
	short 0 "tfreq"=tfreq;
	long 0 "mxpwr"=mxpwr;
	long 0 "lvmax"=lvmax;
	long 0 "fitacf.revision.major"=fitacf_revision_major;
	long 0 "fitacf.revision.minor"=fitacf_revision_minor;
	string 0 "combf"=combf;
	float 0 "noise.sky"=noise_sky;
	float 0 "noise.lag0"=noise_lag0;
	float 0 "noise.vel"=noise_vel;
	short 1 "ptab"=ptab;
	short 2 "ltab"=ltab;
	float 1 "pwr0"=pwr0;
	short 1 "slist"=slist;
	short 1 "nlag"=nlag;
	char 1 "qflg"=qflg;
	char 1 "gflg"=gflg;
	float 1 "p_l"=p_l;
	float 1 "p_l_e"=p_l_e;
	float 1 "p_s"=p_s;
	float 1 "p_s_e"=p_s_e;
	float 1 "v"=v;
	float 1 "v_e"=v_e;
	float 1 "w_l"=w_l;
	float 1 "w_l_e"=w_l_e;
	float 1 "w_s"=w_s;
	float 1 "w_s_e"=w_s_e;
	float 1 "sd_l"=sd_l;
	float 1 "sd_s"=sd_s;
	float 1 "sd_phi"=sd_phi;
	char 1 "x_qflg"=x_qflg;
	char 1 "x_gflg"=x_gflg;
	float 1 "x_p_l"=x_p_l;
	float 1 "x_p_l_e"=x_p_l_e;
	float 1 "x_p_s"=x_p_s;
	float 1 "x_p_s_e"=x_p_s_e;
	float 1 "x_v"=x_v;
	float 1 "x_v_e"=x_v_e;
	float 1 "x_w_l"=x_w_l;
	float 1 "x_w_l_e"=x_w_l_e;
	float 1 "x_w_s"=x_w_s;
	float 1 "x_w_s_e"=x_w_s_e;
	float 1 "phi0"=phi0;
	float 1 "phi0_e"=phi0_e;
	float 1 "elv"=elv;
	float 1 "elv_low"=elv_low;
	float 1 "elv_high"=elv_high;
	float 1 "x_sd_l"=x_sd_l;
	float 1 "x_sd_s"=x_sd_s;
	float 1 "x_sd_phi"=x_sd_phi;

The only thing to note here is that dmap files do not have any restrictions on the characters that can be contained in a variable name, so here the period "." in the dmap variable names has been converted to an underscore "_". If you look at the CDL file the same has been done.

ncgen [-o cdffile] cdlfile

The next step is to generate an empty netCDF file using ncgen:

ncgen -o 20021219.kap.netcdf fitacf.cdl.corrected

You can inspect the empty netCDF file using ncdump.

dmaptoncdf inputfile cdlmapfile cdffile

The final step is to populate the netCDF file using dmaptoncdf:

dmaptoncdf 20021219.kap.fitacf fitacf.cdfmap 20021219.kap.netcdf

It should be obvious that the CDL and the cdfmap file only need to be generated once and can be re-used for any other fitacf file, providing that the file format is not changed.

Hopefully the SuperDARN community will agree on "standard" CDL and cdfmap files so that we can provide standard netCDF versions of our fitacf and rawacf files.

Performance and Disk Usage

The dmap file format was designed to be very efficient in terms of disk usage. However, the current version of the software is not optimized for speed and reading and writing dmap files is somewhat slow (Converting one day of data on a 3.2GHz P4 Linux machine takes roughly a minute).

The netCDF was designed for fast access and incorporates many features missing from the dmap format, but is also much less efficient. (For the example shown, the day of data occupied 84Mb when stored ad a dmap file but required 236Mb when converted to netCDF).