|PD/A CRSP Fourteenth Annual Technical Report|
Table of Contents
Interim Work Plan, Database Management
Doug Ernst and John Bolte
Department of Bioresource Engineering
Oregon State University
(Printed as Submitted)
This document discusses the Central Database status at time of transfer to Oregon State University and the current Database status as of December 1996. Publication of the Database on the Internet and through additional collaborative efforts is also discussed. Data table formats, definitions, and guidelines for submitting new data are discussed under Central Database Organization and Guidelines for Data Submission (1996). For a comprehensive review of the mission, mechanisms, and use of the Database, see the Final Report by the Special Committee on Database Management (Batterson et al., December 1991).
Prior Status of Database
The PD/A CRSP Central Database was maintained by the Management Entity from 1985 until Spring 1993, when it was transferred to Kevin Hopkins at the University of Hawaii at Hilo. The Database was received by John Bolte (DAST PI) and Doug Ernst (Database Manager) at Oregon State University from Kevin Hopkins in May of 1996. Over the period 1985-1996, the Database migrated through a series of computer operating systems and database software. When received at OSU, the Database contained data from 82 experiments, which were completed under Work Plans 1 through 7 from 1983 to 1995. The data were organized by experiment (site and time) and by data type (e.g. weather, water, fish) into 720 separate files, maintained in dbf format using FoxPro database software.
Considerable re-organization and editing of the Database was required to implement relational data structures and provide efficient mechanisms for data access and publication. Most of these problems had been recognized by the prior Database Manager and were brought to the attention of the new Database Manager when the Database was transferred. Prior problems with the Database included:
In addition, the Database required better linkage to the greater PD/A CRSP literature and information base, through cross referencing and shared data files. Specific areas for improvement included:
Figure 1 (85 K image). PD/A CRSP Central Database - Components and Relationships
Current Status of Central Database
The Database is now managed under Microsoft Access and consists of one computer file (crsp.mdb, approx. 60 MB) containing multiple data tables (Figure 1; also see Central Database Organization and Guidelines for Data Submission). Data tables are made up of records (rows) and fields (columns) of data cells, analogous to a computerized spreadsheet. These data tables are related through relational rules and primary keys that, together with data field rules, prevent entry of incompletely defined, duplicate, or conflicting data. A list of all experiments currently in the Database is given in Table 1.
|Table 1. List experiments in thePD/A CRSP Central Database|
|Site Name||Site Country||WP||Exp||Date Start||Date End|
While efforts to solve problems with the Database continue, major problems have been eliminated. Through the use of data submission and data entry protocols, past problems will not be allowed to redevelop. New data submissions for the Database will not be accepted until experimental treatments are defined. Efforts that continue to better define experimental treatments include site and personnel information, and references to PD/A CRSP and other publications. Missing experiments from past Work Plans and the status of data submission from the current Work Plan remain to be determined. Investigators are encouraged to review the Database experiment list (Table 1) and see that all data are submitted through Work Plan Seven (Sept. 1, 1993 to Aug. 31, 1995). Also remaining to be accomplished is determination of the completeness of the data that have been submitted and notification to investigators regarding conflicting data that have been deleted by the Database Manager.
It is important to realize that the pond-specific fish performance and water quality data contained in the Database is relatively meaningless without knowledge of how the ponds were managed (i.e., experimental treatments). It is also useful to know how ponds were organized regarding experiment-treatment-replicate hierarchies. There was some use of pond treatment codes in the Database, but these treatment codes were only partially used and defined. The PD/A CRSP Work Plans and Technical/Annual Reports were of limited use for defining treatment specifications in the Database, especially after Work Plan 3, given their superset relationship to the Database subset and lack of linking references.
To generate experiment treatment specifications to the best degree possible, a multistage procedure is being used: 1) compile all existing treatment data (Database and PD/A CRSP literature); 2) compile all fish stocking and pond application data in the Database and summarize as rate values; 3) develop a table of pond specific treatment specifications; and 4) have investigators review treatment specifications for completeness and accuracy. For development of fish stocking and pond application treatment specifications from the Database, the Database was first searched to find all ponds (sometimes cages) associated with each experiment and then searched to compile all fish stocking and material application data for each pond that was found. This treatment data (Database table ExpTreat) currently shows a grand total of 1311 replicates for the 82 experiments currently in the Database, with an approximate average of four treatments per experiment and four replicates per treatment. Of these 1311 replicates, it was found that a total of 226 had no fish stocked, and a total of 355 had no applications of feed or fertilizer. Whether these results represent actual pond management or data errors/omissions needs to be determined. It was also found that pond areas required to convert total fish stocking numbers to fish per unit area were inadequately reported (Database table PondSpecs). Investigators are asked to submit any past pond depth-area data not submitted to the Central Database and to bring all pond depth-area data up to date with new submissions to the Central Database.
The need to add data tables for sociological and economic data is recognized. Appropriate investigators will be contacted to determine what this socio-economic data would consist of and what is currently available. Collaboration will also be sought with the PD/A CRSP Management Entity to coordinate and standardize data management regarding aquaculture research sites and facilities, technical and annual report references, external publication references, and investigator references. Hopefully, a Database user can easily determine all associated publications (CRSP and external) of a given data subset at the time the data are queried. Investigator reference data are required to properly cite data subsets and to provide contact information to interested Database users, analogous to the information given in journal articles.
Central Database Publication
A user and investigator interface to the Central Database is now provided at the designated Internet Web site of the Database (http:// biosys.bre.orst.edu/crspDB/). This Database interface utilizes Cold Fusion software under a Windows NT operating environment. Tabular data presentation is available and graphical data presentation is under development. A log maintained at the Web site is used to document Database users, including optional contact information, objectives in data use, and comments on data use. The Database Web site is linked to the PD/A CRSP Web Site and to other aquaculture-related web sites. For intensive users of the Database, the entire Database could also be made available on electronic media, e.g., CD or 100 MB Zip Disk, given access to required hardware at the Database and user sites.
In the past, the Central Database has served mainly as a data repository with relatively few requests for data (about 30) as of December 1996. This lack of use by the aquaculture community was likely due to a combination of factors, including lack of awareness, difficulties in Database access, and lack of the necessary Database infrastructure to facilitate the search and extraction of specific data. Certainly, the Database problems itemized above would have been a major hindrance to anyone using the Database. If these assumptions are true, then publication of the Database at a Web site with a user-oriented interface should show clear improvement in the utilization of the Database. It may also be necessary, however, to better advertise the Database Web site beyond the PD/A CRSP community. A particularly under-utilized and promising audience is the aquaculture education community.
In addition to the Database Web site, the Database will be available at a world-wide environmental data Web site maintained by the Consortium of International Earth Science Information Network (CIESIN). Inclusion of some areas of the Database in FISHBASE, maintained by ICLARM/FAO, is being investigated. Potential collaboration with the aquaculture database maintained by the Network of Aquaculture Centers in Asia Pacific (NACA) will also be investigated. Other related databases, such as the Tropsoils CRSP database and the SADC Small Water Body database, will be contacted for the purpose of establishing a network of aquatic related databases and globally standardized data reporting.
The data search strategy supported by the Database Web site interface is based on a site, time, production-methods approach for defining and extracting data subsets. In this context, the "experimental treatment protocols" of investigators are analogous to the "fish production methods" of fish culturists. The components of this methods-based database query include 1) experimental site; 2) inclusive dates; 3) fish/shrimp species; 4) fish/shrimp stocking density (fish/m2) and existence of polyculture; 5) initial fish/shrimp size (g); and 6) pond application fertilizers and feeds, including application frequency (days) and rates (kg/ha/wk).
These management components may be specified and applied in any order, whereby a single set of specifications, or through a series of iterative refinements, a user can define and extract a given subset of data. Investigators may go directly to a specific study by simply specifying site name and inclusive dates.
This approach to data queries provides the Database user with the ability to extract and compare any combination of fish production treatments. This ability is a unique characteristic of the PD/A CRSP research effort, with great potential for leveraging the usefulness of this data beyond report and journal publication. Further work is required to provide statistical tools for the user, including statistical summaries (e.g. minimum-maximum, mean, and standard deviation) and treatment comparisons (e.g. analysis of variance).
Principal investigator reference information is considered critical to properly acknowledge data subsets extracted from the Database. Data citations will appear automatically as users extract specific data sets, with the following format:
Pond Dynamics/Aquaculture Collaborative Research Support Program Central Database, (Diskette or Internet). (Inclusive dates of data used). Bioresource Engineering Dept., Oregon State University, Corvallis, OR USA. Available: Research country and site, principal investigator(s), inclusive dates of data used.
It is hoped that publication of data in the Database will be held in the same regard as its publication in reports and journals, and that equal incentives for investigators will apply. Publication of data in the Database should be viewed as an opportunity for additional research outreach, impact, and recognition.
Table of Contents