INSTITUTE FOR WATER QUALITY STUDIES


A SELF-MAINTAINING METADATA SYSTEM FOR GEOGRAPHICAL DATA

Michael Silberbauer: SilberbauerM@dwaf.gov.za
Last updated: 2000-11-28


INTRODUCTION

The Institute for Water Quality Studies has a set of coverages that it uses for geographic information system (GIS) projects. For any particular coverage, users need to know copyright restrictions, date of mapping and accuracy. Initially, this information was simply stored on a text file called source in the coverage directory. The source file consists of key titles (e.g. .Description, .Scale) followed by information. The first problem with this system was that GIS copy and project functions frequently lose any text files within the coverage directory. To overcome this, the system administrator added to the crontab of the server an instruction to run a script which makes backup copies of each source file every night. The other main drawback of this system was that it was not easily cross-referenced: IWQS uses the abbreviated three-letter coverage naming convention recommended by Jonck et al., which is rather obscure.

Our original cataloguing system, catalog.aml, produced a one-page summary of each coverage in a directory, but is slow and few people bothered to use it. A further disadvantage of catalog.aml was that the user needed to start Arc/Info to run it. We needed a system that would be largely self-maintaining, cross-referenced, self-explanatory and usable from the desktop.

Internet browsers, which have become fairly commonplace on PCs, provided a solution to our problem. Our catalog.aml generates HTML (hypertext markup language) files that Internet browsers can interpret.

PROCEDURES

The metadata system requires that the system administrator set up some processes to run automatically using cron (cron tables are edited with crontab -e).

This is the crontab on our server:


0 4 * * * /prjws8/users/michael/script/SOURCE > /dev/console
45 4 * * * rm /hri/db/cover/s-africa/catalog.log
0 5 * * * /prjws8/users/michael/script/catalog.bat > /hri/db/cover/s-africa/catalog.log
0 6 * * * /prjws8/users/michael/script/catitle.bat > /dev/console       

Notes:


The first five fields on each line are the time at which the instruction should execute, e.g. 0 4 * * * means that the instruction should be executed at 04:00 every day (zero minutes, four hours).
SOURCE runs every day at 04:00 to back up metadata ("source") files.
At 04:45 the system removes the old log file for catalog.
At 05:00 the system executes catalog.bat (see below).
At 06:00 the system executes catitle.bat (see below).

Listing of /prjws8/users/michael/script/catalog.bat:
#!/bin/csh
source /export/home/BATCH
echo Starting arc batch job catalog.bat

/opt/arcexe71/bin/arc << eoc
&type Now starting arc

w /hri/db/cover/s-africa
&type [date -vmsfull]
&message &off
&run /prjws8/users/michael/aml/catalog /hri/db/cover/s-africa html a4
quit
eoc
echo Ending arc batch job catalog.bat

Notes:

The apparent comment on the first line is an essential part of the script.
The source on the second line has nothing to do with our metadata source files, but tells the system to read the .login and .cshrc commands in /export/home/BATCH.
The lines between the eoc markers are Arc/Info commands.
Note that catalog.aml takes command-line parameters.


Listing of /prjws8/users/michael/script/catitle.bat:


\rm /hri/db/cover/s-africa/titles.htm
cd /prjws8
date > /hri/db/cover/s-africa/catitle.date
nawk -f /prjws8/users/michael/data/catitle.awk /hri/db/cover/s-africa/*/source
> /hri/db/cover/s-africa/catitle.htm
sort -d -f -o/hri/db/cover/s-africa/catitle.htm
/hri/db/cover/s-africa/catitle.htm
cat /prjws8/users/michael/data/catitle.begin
/hri/db/cover/s-africa/catitle.date /prjws8/users/michael/data/catitle.table
/hri/db/cover/s-africa/catitle.htm /prjws8/users/michael/data/catitle.end >
/hri/db/cover/s-africa/titles.htm

Notes:

This script creates an index to the coverage metadata files, in HTML format. Some commands appear to be split over two lines: this is an illusion.


Listing of /prjws8/users/michael/data/catitle.awk:


BEGIN {title = 0;}
{
if (substr($1,2,4) == "Titl" || substr($1,2,4) == "Desc")
  {
   title = 1;
  }
else
  {
  if (title == 1)
    {
      title = 0;
      print "<TR><TD>"$0"</TD><TD><A HREF =
"substr(FILENAME,1,4)substr(FILENAME,6,132)".htm>"substr(FILENAME,1,length(FI
LENAME)-7)"</A></TD></TR>";
    }
  }
}

Note:

This is an awk script, which uses pattern matching to reformat text files.
Listing of /hri/db/cover/s-africa/*/source:
.Scale
.Date
.Description
.Owner
.Owner_address
.Owner_contact
.Owner_country
.Owner_phone
.Owner_fax
.Owner_email
.Disclaimer
.Copyright_message
.Copyright_warning
.History
.Logo

Note:

This is a skeleton file, showing the title lines. Metadata is entered after each title.


Listing of /prjws8/users/michael/data/catitle.begin:
<HTML>
<TITLE>IWQS GIS Files</TITLE>
<BODY BGCOLOR = BBBBBB>
<H2>List of GIS coverages at IWQS, with links to the
metadata.<HR></H2>
<BR><H3>Please read the copyright details carefully</H3><BR>
<BR>Hint: to see the geographic extent of a coverage in latitude / longitude
coordinates,
look at the file with extension <TT>geo</TT> (the other files are in Albers
Equal Area
 projection, central meridian 24<sup>o</sup>E, standard parallels
 18<sup>o</sup>S and 32<sup>o</sup>S, spheroid
Clarke1880)<BR>
<HR>

Listing of /hri/db/cover/s-africa/catitle.date:
Tue Sep 30 06:00:01 GMT 1997
Listing of /prjws8/users/michael/data/catitle.table:
<HR>
<TABLE BORDER>
<TR><TH>Title</TH><TH>File</TH></TR>

Listing of /hri/db/cover/s-africa/catitle.htm:
<TR><TD>Acocks veld types of South Africa</TD>
<TD><A HREF = /hridb/cover/s-africa/bvg_acks.geo/source.htm>
/hri/db/cover/s-africa/bvg_acks.geo</A></TD></TR>
etc...
<TR><TD>Water Chemistry Management Regions</TD><TD>
<A HREF = /hridb/cover/s-africa/hwq_wqmr/source.htm>
/hri/db/cover/s-africa/hwq_wqmr</A></TD></TR>

PDF listing of /prjws8/users/michael/aml/catalog.aml
and text listing of the AML.