How does Madrigal organize data?

Understanding the organization of Madrigal data will help you understand the logic behind the way data is accessed through the web. In this section of the tutorial, a brief overview of the organization of Madrigal data is given, all the way from the highest level (a Madrigal site holding data from one group's instrument(s)) to the lowest level (a single measured value).

A high level view of the Madrigal data model.

Madrigal Site

The highest level of Madrigal is a Madrigal site. A Madrigal site is one particular web site controlled by one particular group, that holds all their own data. At the moment, there are Madrigal sites at Millstone Hill, USA, EISCAT, Sweden, Arecibo, Puerto Rico, SRI International, USA, Cornell University, USA, Jicamarca, Peru, The Institute of Solar-Terrestrial Physics, Russia, and Wuhan Ionospheric Observatory, the Chinese Academy of Sciences. While each Madrigal site stores their own data locally, they also share metadata with all the other sites. This makes it possible for you to search for data at all the Madrigal sites at once no matter which site you visit, and simply follow links to the Madrigal site that has the data you are interested in.

Instrument

The next layer of the Madrigal data model is the instrument. All data in Madrigal is associated with one and only one instrument. Any given Madrigal site will hold data from one or more instruments. Since Madrigal focuses on ground-based instruments, most instruments have a particular location associated with them. However, some Madrigal data is based on measurements from multiple instruments, and so have no particular location. Some examples are "EISCAT Scientific Association IS Radars" which combine data from the multiple EISCAT radars, and "World-wide GPS Receiver Network", which consists of over a thousand individual GPS receivers distributed around the globe.

Experiment

All the data from a given instrument is organized into experiments. An experiment consists of data from a single instrument covering a limited period of time, and, as a rule, is meant to address a particular scientific goal. Madrigal makes the assumption that instruments may be run in different modes, and so the data generated may vary from one experiment to another. By organizing one instrument's data into experiments, the purpose and limitations of each experiment can be made clearer. In Madrigal, you can navigate to a page that contains a particular experiment for a given instrument. This page may contain notes more fully describing the unique features of this experiment, or may contain plots of the data customized to the type of experiment. This level of organization is not presently maintained in the Cedar database.

For simpler instruments that run constantly in the same mode, it is also possible that all data is put in one experiment. For example, the "DST index" instrument consists of a single experiment that is repeatedly updated.

Experiment Files

The data from a given experiment is stored in one or more experiment files. There are two reasons there may be more than one file for a given experiment. The first is that the experimental data may be analyzed in more than one way, leading to files with different sets of measured parameters. The second is that older, historical files can be kept on-line for reference purposes. By default, you will only access the most recent, default file through the web, unless you choose "Show History Files" when navigating Madrigal.

The format of these files is the Cedar database format, but this is not important since you download from the web ascii format. Note that once to choose a particular file, you will be directed to the Madrigal site that has that file. Madrigal does not share experiment files between site; only higher level metadata about those files.

File parameters

Any given file is made up a series of records holding measured parameters. Note that based on which parameters are in the file, Madrigal will automatically derive a large number of other parameters such as Kp and Magnetic field strength that aren't in the file itself. In the web browser, measured parameters are shown in bold, derived parameters in normal font.

File data

The bottom level of the Madrigal data model is of course the data itself. A Madrigal file is made up of a series of records, each with a start and stop time, representing the integration period of measurement (Madrigal tries to enforce the idea that all measurements take a finite time, but sometimes the start time = the stop time). To get data from a file, simply specify the parameters you want (and optionally, any filters to apply to the data). More details are given later in this tutorial.

Each Madrigal record has two parts - scalar parameters and vector parameters. For historical reasons these two parts are sometimes called one-dimensional and two-dimensional parameters. Scalar parameters are easy to explain - each scalar parameters has one measurement per record. An example might be the azimuth of a radar making a measurement. Vector parameters have multiple values in a given record. The Cedar file format specifies that all vector parameters must have the same number of measurements. One of more of the vector parameters represent the independent spatial variable(s). For radars this variable is typically range, but latitude, longitude, and altitude could just as easily be used as the three independent spatial variables. The dependent vector variables must all have the same length as the independent variable(s). The independent parameter should never represent time, since the Cedar format specifies that that one record should should cover one period of time.

For example, a radar might store azimuth and elevation as scalar parameters, and range as the independent vector variable. If the electron density and ion temperature are dependent vector variables, and there are ten range measurements, then there must be ten measurements of electron density, and ten measurements of ion temperature. If at certain ranges it is impossible to determine the ion temperature, the Cedar format defines a special value to represent missing data to fill the gap.

The Cedar file format defines the physical meaning of almost every parameter to be found in a Cedar file. The only exceptions are parameters defined by individual groups. Any parameter found in a Cedar file that is not defined in the Cedar file format should be fully defined in the header record of the file. See the experiment page for a description of how to view a Cedar file's header record.

Each Cedar parameter can also have an associated error value. This error value can have the special values "missing", "assumed", or "known bad". If an error parameter is "assumed", the implication is that the measured value itself is assumed, and does not represent a measured value. If the error value is "known bad", the measured data is known to have a problem.