REVIEW OF ANNEX 1
OF THE
GREAT LAKES WATER QUALITY AGREEMENT
A workshop sponsored by the
Parties Implementation Work Group
of the Science Advisory Board
of the International Joint Commission
in collaboration with the
Great Lakes Commission
held at The Michigan League
Ann Arbor, Michigan
March 21, 2001
Introduction - How We Got Here and What We Will Do Today. Jay Unwin
Review of Annex 1 and its History. Joel L. Fisher, International Joint Commission
Presentation of Background Report. Joe DePinto and Wendy Larson, Limno-Tech, Inc.
The Science of Standard Setting. Jim Whitaker, EA Engineering, Science and Technology
The Science of Compliance Assessment. Abdel El-Shaarawi, National Water Research Institute
Options Panel.
Isobel Heathcote
Doug Spry, Environment Canada
Paul Horvatin, U.S. Environmental Protection Agency
George Kuper, Council of Great Lakes Industries
Neil Kagan, National Wildlife Federation
Plenary Discussion. Isobel Heathcote
Note: This transcript, prepared by Karen Ure and Marty Bratzel, has been lightly edited to ensure that the substance is clear and correct.
Introduction - How We Got Here and What We Will Do Today
Jay Unwin
Unwin: My name is Jay Unwin. I am the U.S. co-chair of the IJC Science Advisory Board Parties Implementation Work Group. I would like to welcome you on behalf of the Work Group and the SAB, and also on behalf of the Great Lakes Commission who helped us arrange this workshop.
What I want to do is get us started off by explaining a little bit of how we came to the point of holding this workshop and what we hope to accomplish today. In the fall of 1999 we started planning our work for the next biennium, the work cycle we are in now, and tried to think what we could do as a Work Group, given the fairly limited resources available to us. I had just been to a conference in Muskegon, Michigan where I had heard some of the first results from EPA's Lake Michigan Mass Balance Project and I found that very exciting.
Shortly after that meeting, I went to one of our planning meetings for the Work Group and said, why don't see if we can figure out something we can do with these newly available data from Lake Michigan. We decided that we would try to obtain the data and look at how they compared with the Specific Objectives in the Great Lakes Water Quality Agreement, Annex 1. We did that. It took some time to do it but, by mid-summer, we realized that what we were looking at were data from Lake Michigan that were - for water concentrations anyway - literally orders of magnitude below the Specific Objectives for the compounds we looked at. We met with folks at EPA and learned a little bit more about the data, had some discussions back and forth, and came to the conclusion that what we really needed to look at was not the data but at the objectives themselves. One person at the meeting in Chicago made a statement that I remembered and that is that, while the mercury number for Lake Michigan is orders of magnitude below the Specific Objective, yet we still have problems with mercury, including fish consumption advisories. So that led to the trail we are on now.
Shortly after that meeting, we prepared a white paper which we presented to the Commissioners at their semi-annual meeting in Ottawa, and proposed to carry out a review of the Specific Objectives in Annex 1. Part of that proposal was to hold a workshop to hear from stakeholders about what should be done about Annex 1, if anything. So we're glad that you stakeholders are here and we hope you will share your thoughts with us, especially this afternoon.
Another part of the proposal was to have a background report prepared by a contractor. We did do that. Those of you who registered in advance received notice that a draft of that report was available on line. [Note: The final version of the report may be accessed at http://www.ijc.org ] That report will be summarized and presented later today. It does a comparison of the data, not just the few data we looked but data from a number of different sources with the Specific Objectives. It also looks at regulations compared to the Objectives, and some other matters that we'll discuss later.
This morning, we have a series of context presentations. This morning is more like a seminar than a workshop, that is, we will learn what's in Annex 1, where it came from. We will hear a presentation of the background report I just mentioned. We have a couple of scientific presentations - after all, we are the Science Advisory Board - and we thought it would be appropriate to look at how, these days, water quality standards are set or should be set, given the science that's available. And we're also going to have a presentation on compliance assessment, because there is language in the Agreement that talks about how to judge achievement of the Specific Objectives. After lunch, we have a panel of folks who are going to share their perspectives on what they think should be done with Annex 1. We're hoping that will spur a good discussion this afternoon which we're going to capture and make part of our report to the Commission.
The heart of this workshop - the real workshop - is this afternoon where we want people to open up and share with us their opinions of how Annex 1 should be revised, if it should be revised - just what should be done with Annex 1.
After that panel, we have a good amount of time set aside for general discussion. I hope everyone will participate in that.
We put together some questions that are going to be the focus of that discussion. They are:
Please think about them and share with us your answers this afternoon.
I would like to proceed to our first speaker who will give us a background in history about Annex 1. Joel Fisher is the senior scientist of the U.S. Section of the IJC and an environmental advisor to the U.S. Commissioners. He has over 35 years of professional experience in the environmental field, having begun his work during military service where he was assigned to research projects on destruction of chemical and other munitions with minimal environmental impact. He worked on early biotic surveys to the Alaska pipeline before it was built, and he was the charter secretary of EPA's Science Advisory Board when it was established in 1974. He holds degrees from Cooper Union, Vanderbilt University, and University of Pennsylvania and, most importantly for this workshop, he holds a great amount - maybe the most of anyone - of organizational memory in the IJC. So we would like to welcome Dr. Fisher to the podium to share with us his knowledge of Annex 1.
Review of Annex 1 and its History
Joel L. Fisher, International Joint Commission
Fisher: The conveners of this workshop have asked me to give a history of Annex 1. The original Annex 1 was in the 1972 Agreement. Everything proceeds as either an evolution or change from that. That is probably the one thing that has been lost in history, the original Annex 1 and what it looked like and, really, a lot of what we have now is nothing more than the same old thing.
There are actually three such agreements -- the 1972 Agreement, the 1978 Agreement, and the 1987 Protocol to the 1978 Agreement. All of these Agreements reflected the convergence of two separate themes: a political one related to the 1972 Water Quality Amendments of the Clean Water Act (United States Public Law 92-500), and a scientific one related to a constantly changing and evolving state of aquatic toxicology at the time any version of the Agreement came into force.
The 1972 Clean Water Act Amendments called for the promulgation of water quality criteria for the navigable waters of the United States, and the achievement of said quality by July 1983. The 1972 Agreement came into force as an international agreement shortly after the passage of this act. Although the 1972 Agreement contained language that referred to the Boundary Waters Treaty of 1909, President Nixon chose not to submit the 1972 Agreement to the U.S. Senate for ratification as a treaty. Without treaty ratification, the agreement became an international agreement with no force in domestic U.S. law.
However, under Canadian domestic law, where the treaty powers are different, there was a force to the Agreement. So we immediately had a disjunction of how the two countries viewed the Agreement from the start. Later, the 1978 Agreement and the 1987 Protocol followed the same pattern -- the international agreement came into force without any status in domestic law of the U.S.
Under the circumstances, how does a party to an international agreement implement that agreement, if it does not have any force? In the U.S., the Office of Management and Budget, under the president's direction, has the authority to review the terms of an international agreement to make sure that nothing in the agreement would commit the United States to any financial or other requirements which is not in keeping with the existing authorities, appropriations, or legislative priorities in the current laws. The overall effect is that the United States did not have to spend any money on the Great Lakes Agreement unless those programs matched existing requirements under U.S. law.
More relevant to this workshop is, how does one deal with Annex 1 of the Agreement? The answer lies in Annex 1 itself. Each Specific Objective that coincides with, or is more lenient than an existing domestic U.S. objective, criterion, regulation, program requirement, or whatever, theoretically becomes part of the U.S. commitment to the Agreement. If a Specific Objective differs from the associated domestic requirement in the U.S. in a manner which is more stringent, or sometimes more costly, then the United States has no legal obligation to meet the differing objective. Even the choice of language makes any formal commitment tenuous because objectives are criteria or voluntary guidelines, not enforceable under law, that represent desirable goals to be achieved at some future time. In U.S. law, only standards have legal status, as they are converted readily into regulations.
In 1990, the passage of the Great Lakes Protection Act gave the 1987 Protocol version of the Agreement some status under U.S. domestic law. In a very rare appearance before Congress, both co-chairs of the IJC, in a presentation carefully crafted to prevent that presentation from being quoted as, or the establishment of any precedent with regard to Congressional testimony, both Commissioners testified before Senator Levin's committee in support of that law, and the law ultimately passed.
So much for the political and legislative background of Annex 1, and now for its scientific concerns. Annex 1 can be considered as a synoptic example of continuous catch-up. It may have been a marvel of diplomacy to have obtained any environmental agreement that looks systematically at water quality and its various physical, chemical, and biological parameters both as needed objectives to support aquatic life and, later, aquatic ecosystems. But almost as soon as pen was put to paper, and "de foi en quoi" (which means "equally faithful and valid" in both French and English) was sealed on the Agreement, much of the material in Annex 1 was arguably obsolete.
At the time of the 1972 Agreement, there was a certain state of science and practice with regard to water quality criteria, emphasizing microbiological pollutants and some early toxicology for chemical pollutants. Annex 1 was cobbled together according to what was available in federal, state / provincial, and local practices. The slide [to be inserted] shows what Annex 1 looked like.
"Final" means that the knowledge of water quality criteria at the time was sufficiently developed for these parameters that it was possible to agree on objectives of a form -- subject to future findings from research at least five years away -- that you could freeze "final" at least in time for that period.
"Interim" implied that revised or new objectives for these parameters would be forthcoming in the very near future. These included temperature, mercury, and other toxic metals, persistent organic contaminants, settleable and suspended materials, oil, etc.
With respect to "compliance" with specific objectives -- a policy consideration -- and the need for them to be based on a statistical sampling plan. I don't know of a statistician worth his pedigree who understands the concept of "statistical validity" with regard to sampling. Sampling is more art than science. One wishes to validate or verify specific findings or hypotheses, but sampling is performed in a manner appropriate to the problem, with due concern to avoid bias and other known undesirable situations.
This section with mixing zones was an issue. EPA considered them illegal under the Clean Water Act, a variant of "dilution as the solution for pollution." This component of Annex 1 was not even considered by the Parties until 1983, when it became necessary to consider the formulation of a Great Lakes toxics strategy and come to grips once and for all with the mixing zone issue. It has since become a non-issue, as attention was focused elsewhere under the RAP [Remedial Action Plan] and LAMP [Lakewide Management Plan] programs.
In the period of 1972 to 1976, several major references of water quality came on the scene, including the 1974 famous "blue book," which is the National Academy of Sciences Water Quality Criteria, the EPA "red book," which was called Quality Criteria for Water, and the "orange book," which was Eutrophication Causes Correction, etc. from the National Academy of Sciences. The colors refer to the fact that these were all hard-cover books, a rarity for the U.S. Government Printing Office, since everything used to be paperback. So, these really had some status, even within GPO.
The 1978 Agreement objectives basically referred back to the 1976 "red book" which was supposed to update the "blue book." The Specific Objectives for metallic elements, for example, copper, lead, and nickel were based on aquatic toxicity test data "with an appropriate local species." The state of bioassay technology at the time was the flow-through chronic bioassay, and endpoints of reproduction and morbidity were also included, as well as mortality. Within the next four years, bioassay technology advanced so that we could do life cycle studies of certain species, and even call on certain life stages as bioassay test species are called. We could selectively do work on eggs, adults, neonates, smolts, swim-up stage, whatever. Also, the number of fresh-water species expanded. We went beyond the sheepshead minnow and the bluegill to include brown trout, beyond Daphia pulex to include Daphnia magna , we now have Ceriodaphnia and associated species. We started looking at fresh water clams and oysters, the gastropods snails, and several species of algae, including one which became the standard bioassay test to determine if water had eutrophication potential. It even also had toxicity potential. At the macrophyte level, the duckweed became the macrophyte of choice. To this was added Atlantic and Pacific salmon, which had previously been used exclusively for physiological testing, mainly by Wall and Brett out of Nanaimo, British Columbia, who pioneered the temperature standards that later became part of both United States and Canadian thermal pollution requirements.
Another change in Annex 1 between the 1972 and the 1978 Agreement was a limited attempt to tailor an objective to a geographical reality by making the lead requirement different for Lakes Superior, Huron, and the others, starting with 10 ug/L in Lake Superior, moving up to 25 ug/L in Erie and Ontario. This reflected basically the higher water hardness, which is a protective factor against lead toxicity in these other lakes. This type of geographical tinkering was not done for any of the other parameters.
Most of the changes pertained to Specific Objectives for organic compounds. That is somewhat a nightmare when one realizes that all organic compounds -- all 10 million of the Bielstein compendium -- were theoretically included. The organic compounds listed in 1972 were mainly the hard pesticides. These were chemicals of long degradation time in the environment and, for all practical purposes, were essentially non-degradable. They were also responsible for most of the documented fish kills due to pesticides. They included such things as aldrin, dieldrin, chlordane, DDT and its metabolites, endrin, heptachlor and heptachlor epoxide, lindane, methoxychlor, mirex, and toxaphene.
Although EPA had other hard pesticides on its list, notably kepone, which is a congener of mirex, and phosphyl, which was banned from commerce, the 1978 Agreement reflected only those chemicals in current use or previous use in the Great Lakes region. The major problem with kepone was in the James River in southern Virginia. Phosphyl was only allowed to be exported until it was discovered by a researcher at the University of North Carolina that it caused multiple sclerosis and a syndrome in a whole variety of animal species and people, and then it was taken off the list entirely. But that did not stop the possibility that some of these things would be re-imported into North America because what was banned for local use did not necessarily mean that it was banned from being put on crops or seeds and could come back somehow and get back into the Great Lakes.
Also added to the list of organic compounds was a new group, in 1978 called "other compounds." This consisted of phthalic acid esters, PCBs, and "unspecified organics." Concerns about phthalates related to their use as plasticizers and their widespread occurrence in the polymer materials used in automobile seat covers to hospital blood transfusion bags. Phthalates are very soluble in organic materials, including body fluids. They have a slow degradation time, although their bioaccumulation potential has never really been documented. More worrisome is that some phthalates are mildly carcinogenic to laboratory animals and, presumably, to humans. The reason to include phthalates on the list was sparked by the Delaney Amendment, a portion of the Food and Drug Act which specified that an animal carcinogen could not be deliberately added to either the packaging or the foodstuffs or any other aspect of a food material, where it could get into a food supply.
PCBs have since come under bans and phaseouts in the United States and Canada, but they still represent major contaminants because of their widespread use in transformers and electrical equipment and their presence in hazardous waste disposal sites.
"Unspecified organic compounds" is a catchall designation, originally designed to consider chemicals due to commerce but which might pose a threat because of their toxicity, lack of biodegradability, or some other factor. The idea behind this category was to provide a future mechanism to set objectives for chemicals which have bioaccumulation potential, despite their biodegradability.
The non-persistent or biodegradable organic compounds became a new category in Annex 1 which included the three soft pesticides -- guthion, diazanon, and parathion. Noticeably absent was malathion, a congener to parathion. Malathion was widely found in some 90% of the herbicide and weed-killing compounds sold in garden supply stores and hardware stores in North America, and was the standard herbicide test used in aquatic bioassays during the 1970s and 1980s. In fact, it was often the control toxicant in certain tests. One can infer or derive an objective for malathion from one on parathion, because the common mode of degradation has known chemical rate constants.
The use of parathion has virtually disappeared. Its use in a particular form is almost exclusively given over to the destruction of marijuana plants by aerial spraying by DEA officers overseas.
The most widely used pesticide now is atrazine, which would come under the "other" category, because it is biodegradable. However, this pesticide has nitrogen in three valence forms within the compound and that confers a combination of chemical and biological properties not anticipated by the Agreement at the time. First, it is almost equally soluble in aqueous and lipid systems. Two, it can cross biological membranes because the secondary and tertiary amine groups of the parent molecules are found in standard biological carrier molecules. Three, it can interact because of its modes of solubility with many other soluble compounds that would otherwise not react with pesticides.
The objective for the "other" category is the appropriate 96-hour TLM. However, in this case, the appropriate test species turns out to be a terrestrial mammal, the vole, because it turns out that atrazine causes behavioural abnormalities and is synergistic with the nitrates in drinking water. For more information on this, one is referred to the work of Warren Porter at the University of Wisconsin.
Also revised was the Specific Objective for asbestos. The 1978 Agreement retained the 1972 objective but now treated the substance as a "physical parameter." At the time, EPA regulations for asbestos called for a particle count of asbestiform fibres in a water sample. Although the test for asbestos was for a physical property, namely a colligative property of the substance, asbestos is a mineral and really a chemical parameter. The regulation virtually required every water laboratory to have on call an electron microscope to perform the particle count. EPA wound up supporting the purchase of large numbers of electron microscopes at the time, without any understanding that it takes about $50,000 a year to maintain this, not only as a special technician but to power the air conditioning to keep the room cool when it is used.
The reason for including asbestos in Annex 1 related to the asbestos mineral content of taconite tailings in Lake Superior and the possibility that asbestos particles contaminated the source water used in the bioassay work of EPA's Duluth laboratory. EPA was facing the possibility that every bioassay study that it had performed for water quality criteria would have to be revised because of the possible asbestos contamination of the source water.
Another important change between 1972 and 1978 related to the Specific Objective on phosphorus. The 1978 Agreement not only looked at aquatic concentrations but also introduced mass loadings. The control of phosphorus is related to the control of eutrophication of freshwater systems, a condition stimulated by the levels of phosphate in waters acting as a nuisance nutrient.
If we now go to the 1987 Protocol to revise the Agreement, the Protocol in theory was not a revision. The word "protocol" is kind of a diplomatic trick. However, one of the things the Protocol did was add a Supplement to Annex 1.
The Protocol actually addressed several loose ends in the Agreement. First, it addressed the definition of something that was absent from the system, as something not detectable using the best available technology, including biological indicators. Further, the concept would be revised when detection technology became more sensitive.
It authorized the keeping of lists on substances that were known or believed to cause harm to aquatic, terrestrial, or human life. In the 13 years of the Protocol, I have not seen any lists. They may be out there, but I have not seen them.
At the time of the 1987 Protocol, there was considerable discussion about the temperature objective. The EPA had proposed a thermal standard appropriate for salmonid streams subject to thermal pollution, and the entrainment of organisms in the cooling water systems of large power plants subject to run-of-the-river cooling water capacity. This standard was not appropriate as a general temperature objective for the open waters of the Great Lakes. The relevancy of thermal criteria to streams, tributaries, and the thermal effluents from nuclear plants on Lake Michigan were covered by the domestic United States and Canadian regulations. The temperature objective remained, fortunately, unchanged in the various versions of the Agreement.
Although the 1987 Agreement changed the radioactivity objective to one based on alpha, beta, and gamma radiation exposure through drinking water, this was consistent with the International Commission on Radiation Protection (ICRP) standards at the time. It did not, and the 1987 Protocol still does not, consider specific isotopes in the Great Lakes that may be a problem on their own, in particular, tritium. Between 1978 and 1987, the nuclear power plants in the Great Lakes region had, on the average, one "unscheduled" release of radioactive material into the lake. An unscheduled release is a euphemism for a legally allowed, otherwise illegal discharge of an otherwise dangerous material, because the amount that is held in storage exceeds the storage capacity of the nuclear fuel cycle processes. Several groups have asked for a tritium objective for the Great Lakes, without success, and the tritium concern remains simply because a large number of the reactors on the Great Lakes are heavy-water-based systems.
The 1987 Protocol added a new concept of "ecosystem objectives" -- two of them, one on lake trout and one on Pontoporeia hoyi , an anthropod. The lake trout has years of statistical and field testing behind it. Not so for the one for the anthropod. The work that was originally done -- very fine work -- looked at its abundance as a function of depth in the Great Lakes as a food species for the trout. However, no monitoring attempts were made to match the abundance curves of the original research with what was found out in the field. Beware of organisms that are either too fast or too smart that you cannot sample them. They just get away from you. And that's the problem with Pontoporeia hoyi . People do not see it because they do not use equipment that goes fast enough, nets that are fine enough, and those animals are very smart. They can go up and down the water column at will and, to me, the whole thing simply did not make any sense, besides which, again, in the 13 years of the Protocol, I have not seen any data on Pontoporeia hoyi either in any of the monitoring studies.
Revisions of Annex 1 have not been made for selenium or silver, although the Commission-sponsored study groups recommended new or revised Specific Objectives for these parameters. The Commission did not recommend these parameters for new Specific Objectives because of other factors. In the case of selenium, it was discovered that marine fishes, notably tuna and swordfish, selenium acted as a biological protection against mercury and other heavy metals. The Food and Drug Administration at the time was petitioned to change the mercury standard in tuna and swordfish to prevent bankruptcy in the fishing fleets. Biochemical studies with selenium showed that the high selenium content in tuna and swordfish acted as an antagonist to the action of mercury. While the mechanisms of this protection have not been completely determined, it is known that it involves vitamin B-12 and vitamin C. It prompted the FDA to change the standard from mercury to that of methyl mercury, the neurologically active form. The Agreement did not make such a change. The Agreement still talks in terms of plain mercury and, therefore, it becomes a problem if one wishes to look at selenium.
The work on silver was interrupted by the release of the 1987 Protocol which eliminated the Commission-sponsored task forces charged with assembling the information. Further, silver is sufficiently valuable that industrial sources, primarily in the dental industry and photographic industry, found it very profitable to recover silver from waste streams, even mine it from their wastes, and so it has become essentially almost a non-problem, at least as an industrial discharge.
There have been no changes or additions to Annex 1 since the 1987 Protocol, but the state of toxicity and eco-toxicology has moved on. Most agencies now use a risk assessment-based procedure in the regulatory processes or the administrative aspects of determining what priorities for regulation. There is no risk assessment built into Annex 1. Life cycle studies and flow through bioassays, except as the most elementary of testing, have largely been replaced with molecular probes: gene-splicing techniques, cell lines, and a variety of techniques that are very effective but are never even touch whole organisms. The fads about microcosms, mesocosms, and macrocosms have come and gone, with some of them remaining, but multi-tiered bioassays and microcosms also are not built in. The only advanced-level study that is built in is effluent bioassays, and those are strictly toxicity-based TLM testing, albeit for a mixture.
In closing, I wish to note that I personally feel that Annex 1, however pioneering it was in 1972, has somewhat become fossilized over time, thanks to the status of the science. Some people may feel that it is worth keeping, perhaps as a statement of principles rather than as a compendium of numerical goals. Monitoring and surveillance seems to put little store in the numbers anyway, and I hope this workshop will come to grips with this uncomfortable reality. Thank you.
Unwin: Will you answer the first question, "Is Annex 1 still relevant and useful?"
Fisher: My feeling in the matter is that I tend to agree that maybe it is a statement of principles, but I don't think it works as a compendium of numbers. There would need to be too many things to bring it up to date on that score, and I don't know whether it's worth trying to make it scientifically up-to-date, only to watch it go through a rapid obsolescence process with the next evolution in techniques that are looking at these numbers. But perhaps a statement of valid long-term principles that could last beyond the numbers themselves might be the better approach.
Unwin: Thank you. As I said in my introduction, we commissioned a background report by a contractor, that contractor being Limno-Tech Inc., here in Ann Arbor. We will now hear a presentation of that report. I want to note though that I think the contractors validated our choice in that we presented them with a fairly daunting task, a rather short schedule, and a rather short budget to accomplish it. They accomplished the task and they accomplished it ahead of time. We saw this three weeks ago and it's an excellent report, a very good background report. If you are interested in this topic, you should get a copy of it [final version is available on the web at app_d.pdf ]. Appendix D to the report stands alone as a wonderful reference piece for the rules and regulations on water quality in the Great Lakes basin.
Speaking to us today will be Wendy Larson and Joe DePinto. Wendy is a senior project scientist at Limno-Tech, having joined them in 1991 after four years as an environmental scientist with the Metropolitan Waste Control Commission in St. Paul, Minnesota. She's responsible for managing a wide variety of projects in the Great Lakes region and nationwide relating to conventional and toxic chemical modeling, sediment assessment and management, waste-load allocation, TMDLs [total maximum daily loads], and permit negotiations, both public and private clients.
Joe DePinto, who will wrap up the presentation, doesn't need near as much introduction. Most people here have at least heard of Joe if they haven't met him. He is a senior scientist at Limno-Tech and has been since June of 2000. Before that, he spent 27 years in academic realm throughout the Great Lakes basin. During that time, he played an active part in Great Lakes research community and he's continuing to do so in his role at Limno-Tech. He is currently a member of IJC Council of Great Lakes Research Managers and he chairs the International Association for Great Lakes Research's Publications Committee and he is an associate editor for the Journal of Great Lakes Research . So we'll start with Wendy. Thank you.
Presentation of Background Report
Wendy Larson and Joe DePinto, Limno-Tech, Inc.
Click here to access the visuals that accompanied this presentation.
Click here for Background Report
Click here for Appendix D to Background Report
Larson: We appreciate the opportunity to be part of this review of Annex 1. We've spoken with some of you over the past two months and, if we did, I want to say how much we appreciate your help. We realized early on when we began this project that we were going to need a lot of data and a lot of information in a really short period of time from a lot of busy people. So if you were one of those busy people, or your staff, please know we really appreciated your support, cooperation, and it made our job a lot easier. If you don't have a copy of the report, it's on the web.
First, I am going to give you an overview of what Joe and I are going to cover in this half-hour presentation. We're going to start out going through our charge from the Science Advisory Board -- what were the objectives of the review, what questions did they ask us to answer, and then present the approach that we took to accomplish that, and also give you highlights of our findings and refer you to the report for more details. There's an awful lot in there so I'll only be able to give some general overview of the findings. After that, Joe DePinto is going to go through some of the issues that came to light when we were conducting the review, just to stimulate some discussion in this afternoon's session.
The objectives of the review were four-fold. We were asked to assess the current status of the Great Lakes relative to the chemicals listed in Annex 1. The second objective was to assess the relationship of current policy values in the Great Lakes region to the specific objectives. The third was to gather information on the conceptual basis and rationale for those current policy values, and then finally determine if and how each agency assesses compliance with their policy values and this would be for the open waters of the Great Lakes, because that's generally what the Annex 1 objectives are assumed to apply to.
First, a caveat. This is an important one. In the time and resources we had available, we did not conduct an exhaustive compilation of all the data that are out there. We know there are plenty of data sets that we didn't obtain and review. We do, however, feel that the data we obtained were sufficient and adequate to answer the questions that we were asked to answer. It was a screening level comparison. The data do not represent comprehensive spatial or temporal coverage in each lake. We did assume that they QA/QC'd [quality assurance / quality control]. We did not interview every agency with a regulatory mandate in the Great Lakes although we feel that we did a pretty good job with the contacts that we did make. Your comments are welcome related to any key omissions that you see in this report. It is a draft. [Note: The comments received were incorporated into the final version of the report, posted on the web site noted above.]
The Annex 1 Specific Objectives are categorized in four categories: chemical, physical, microbiological, and radiological. This review focused primarily on the chemical objectives. The chemical objectives are broken down into persistent toxic substances, non-persistent toxic substances, and a category of other substances which applies to conventional pollutants, like dissolved oxygen. This simplified version of Table 2-20 and the following two slides show that the specific objectives are written in terms of three media. It's a little hard when you look at the actual document to break down which are in which so we created these tables to show which were in which. This was important because it led to what data we needed to collect.
The first slide shows for organic persistent toxic substances includes pesticides and PCBs. There are specific objectives for water in this category and, for some of them such as DDT and its metabolites, not only is there a specific objective for water but also for whole-body fish tissue. For four of the pesticides, there's a specific objective in edible fish.
The next slide shows the same type of information for inorganic persistent toxic substances and, in this case, this is primarily the metals which are in terms of water concentrations and then for mercury, there's a whole-body fish tissue concentration specified.
The next table shows the non-persistent toxic substances. This would include some pesticides and these unspecified non-persistent toxic substances and complex effluents. There are also other substances not shown on the slide but that are part of this category. These are all written in terms of water concentrations. There are no fish objectives in this category. For a couple of them there are toxicity-based objectives for water.
The first question we were asked was, how do the specific objectives compare to recent data in the Great Lakes. This was a screening-level comparison. We looked at data collected over the past approximately five years. Most of the data we collected, we were able to stick to that time frame. We selected data for the open waters of the Great Lakes only, avoiding as much as possible, the near-shore areas because the Annex 1 objectives are commonly assumed to apply to ambient open waters of the Great Lakes.
We selected data that are representative and sufficient to make these comparisons. We were trying to determine if the data are less than, equal to, or greater than the objectives. We primarily contacted federal agencies that monitor the open waters, but we also got a great deal of information, particularly with respect to the fish data, from Ontario Ministry of the Environment as well as some of the states. When we went to get data, we only collected and reviewed data that related to the way the objective was written. For example, for PCBs, the specific objective is written in terms of whole-body fish tissue. So we went and looked for whole-body fish tissue for PCBs. We didn't collect and review edible fish tissue for PCBs, even though we understand that's important information.
Our primary sources of data, for water, we obtained data Great Lakes National Program Office [GLNPO] and Environment Canada. These two agencies have routinely monitored the open waters of the Great Lakes for a variety of purposes, including monitoring long-term trends, and they use consistent protocols within their own sampling programs. They're both very robust data sets, so we felt these would be sufficient for the comparisons that we were trying to make. For whole-body fish tissue, we obtained data from GLNPO, the Canada Department of Fisheries and Oceans, and the State of Michigan which had quite a bit of data available for the four lakes that border on Michigan. Then, for edible fish tissue, we requested data from GLNPO, the Ontario Ministry of the Environment sport-fish contaminant monitoring program, and also the State of Michigan.
So what did we do when we had multiple data sets available and we were trying to make some sense out of it and what to use. If we had the data, we did present it in this report. You'll see in Appendix B, there are some very detailed tables where we provided all the information that we had, the statistics, and quite a bit of background information, contact information, if you are interested in any particular data sets. There we provided means and ranges, that kind of information.
In the report, for clarity, we also wanted to pick a number that we felt would be representative for screening level comparisons, an average value. We needed to make some decisions about which data sets we wanted to use for that.
For the water data, interestingly enough, for a particular parameter, we usually found that data were available for a particular lake, if they were not available from GLNPO, they were available from Environment Canada. So, we able to put together a fairly complete data set, although we did find some Annex 1 parameters where there were no data available from the agencies that we contacted.
For fish, we were given a lot of data for a variety of species and different types of processing, composited by age, composited by length - a wide range of protocols. Since Annex 1 does not specify the species of fish, we selected adult, top-predator species preferentially, looking for the most conservative species that we anticipated would have the highest concentrations. We chose lake trout for Lake Superior, Michigan, Huron, and Ontario and walleye for Lake Erie, because there are not that many trout in Lake Erie to get a robust data set.
When we had edible fish data, we had skin-on fillet data and dorsal plug data. Which did we use? We looked at the data sets that we had and, in those data sets, we found that, generally, the concentrations were higher in the skin-on fillet data and we chose to use those because they were considered to be more conservative for this comparison.
What did we find? For water, the pesticides in all of the lakes, with really very few exceptions, the concentrations were less than the Specific Objectives. For metals, we found the same thing -- in water, the concentrations were less than the Specific Objectives in all lakes. Not all parameters were monitored in all lakes. One example that comes to mind, when I was listening to Joel's presentation, was phthalate esters. That was one category where we were not able to find any water data from the sources that we contacted.
The next slide shows the results of the data comparisons in whole fish. This category is for the protection of fish-consuming birds in the objectives. For DDT and metabolites, the data from Lake Michigan indicated that the concentrations in whole fish exceeded the Specific Objective of 1 ug/g wet weight and, in the other lakes, the average concentrations were less than the Objective.
For mirex, in all the lakes but Lake Ontario, the Objective was not exceeded, but it was in Lake Ontario. For mirex, the Objective is "shall be substantially absent" and it specifies less than detection, but the detection limit is a moving target. It is different for different data sets, and it does not specify a detection limit, but we did find that, with the detection limits we were provided, this was our result.
For PCBs in whole fish, all the lakes, the PCB concentrations exceeded the Objective of 0.1 ug/g. For mercury in whole fish, in all lakes, it was less than the Specific Objective of 0.5 ug/g.
This is the data comparisons for edible fish tissue. There are four pesticides listed for edible fish tissue in Annex 1. For all four, the Objective is 0.3 ug/g. In all cases, we found that the concentrations were less than the Specific Objective.
I'm going to move on now to the next question that we were asked, which was, how do the Specific Objectives compare to current policy values? You are probably wondering what I mean by that term. Because criteria, guidelines, objectives, standards are different names used by the various agencies, we needed a term, and we decided to call them all policy values, so that we did not have to say all those words every time we referred to them. So, a policy value is any of those that have been promulgated by an agency in the Great Lakes region.
Why are we interested in policy values? They tell us about the current state of the knowledge related to exposure to, and effects of contaminants in the environment. They reflect an interest in protecting multiple uses of water, sediment, and tissue, and they also reflect improvements in analytical methods since Annex 1 was adopted, such as lower detection limits.
In terms of policy values for water, there are some Specific Objectives for water that it is unspecified what they are protecting. There are also some Specific Objectives that specifically protect aquatic life - this is stated in the Objective. Other current policy values that are protective of aquatic life include the Canadian Water Quality Guidelines, Ontario Provincial Water Quality Objectives, and U.S. EPA's Great Lakes Water Quality Guidance [GLI] criteria. All of the states have adopted criteria that are at least as stringent as the GLI, so we have put those into the same category. You can find information on that in the report, showing what the criteria are for each state, compared to the GLI.
There are also policy values for the protection of human health that are written in terms of water, under the Great Lakes Water Quality Guidance. Those are based on fish tissue targets. And there are water quality policy values for the protection of wildlife under the GLI, again based on fish tissue targets.
What did we find when we compared these? Well, it was kind of all over the board. It is hard to make general statements, but I pulled some out. There were a lot of inconsistencies between the Specific Objectives and the policy values. The report contains detailed tables showing how the Objectives compared to each of the guidelines, criteria, and standards that we located.
Compared to the policy values for the protection of aquatic life, the Specific Objectives in terms of protecting aquatic life were often the most stringent value. The Great Lakes Water Quality Guidance criteria for the protection of human health and wildlife usually are the lowest values overall, compared to all other values for the protection of all uses. Some policy values for metals are hardness dependent, in addition to lead. Finally, policy values have been promulgated for many substances that are not listed in Annex 1.
In terms of policy values for whole fish, for the protection of wildlife consumers of fish, there are four parameters under Annex 1 that are specifically for the protection of fish-consuming birds, written in terms of whole fish, and these are for DDT, mirex, PCBs, and mercury. The Canadian tissue residue guidelines specify, for DDT, PCBs, and toxaphene, guidelines for the protection of wildlife consumers of aquatic biota. The Ontario fish tissue residue criteria specify, for DDT and mercury, values for the protection of fish-consuming birds and, in the case of mercury, aquatic life. Under the GLI and in the states, there are water criteria, not whole-body fish tissue numbers, for the protection of wildlife, and these have been derived from fish-tissue triggers for DDT, mercury, PCBs, and dioxin.
For edible fish tissue, these policy values are for the protection of human health. There are five pesticides specified under Annex 1 and, since Annex 1, we have the Uniform Sport Fish Consumption Advisory protocol for PCBs; an abundance of state trigger values, action levels, and consumption guidelines for PCBs, pesticides, mercury, and many other contaminants; FDA action levels for PCBs, pesticides, mercury, other contaminants; and, finally, there are GLI criteria for the protection of human health that are based on fish tissue triggers -- those are in terms of water and are for PCBs, pesticides, mercury, and other contaminants. Details can be found in the report.
This would be lacking if I did not mention the sediment quality policy values that are out there, including the Canadian Sediment Quality Guidelines, Ontario Provincial Sediment Quality Guidelines, U.S. EPA's draft Freshwater Sediment Quality Criteria, and New York state's Sediment Criteria. We have included those in the report as well because there are sediment quality criteria for many of the parameters that are listed in Annex 1.
These sediment quality criteria generally are used in Areas of Concern to assess the impacts of sediment contamination. To remind everybody why we are here today, I show this very depressing picture of a carp collected in the Ottawa River this past summer where sediment contamination is quite a bit of a problem. It is amazing that this adult fish lived as long as it did, considering its mouth and body. This highlights the importance of considering sediment, although this is a topic for discussion because the Annex 1 objectives are commonly for open waters.
The last question we were asked related to the procedures for assessing compliance being used by Great Lakes agencies. We made quite a number of phone calls and talked with some of you. We spoke with the states and Ontario and asked the question, "Do you have a program in place to routinely compare open-lake data to your state or province's standards that apply to those waters?" The answer was no, they primarily focus on nearshore areas. Efforts are directed at sediment contamination, like in the Ottawa River, and the effects on aquatic life. There is no systematic program in place to look at the open waters. They generally turn to the federal agencies to collect those data, although these agencies do collect some open-water data.
In terms of federal agencies, Environment Canada told us that they do routinely review their data and flag parameters for which the 90th percentile value is greater than their own guidelines. They also look at the most sensitive policy values in the U.S. and Canada. There is no formal reporting process for this information. It is found in internal documents such as the Lakewide Management Plans.
U.S. EPA, at least from the folks we spoke with, does not have a systematic program currently in place to review the data and compare them to criteria.
The last slide has contact information, including e-mail and telephone. Limno-Tech would welcome any questions and comments.
DePinto: As Wendy mentioned earlier, we were basically charged with doing a fact-finding project. We were specifically asked, do not make any value judgements. However, we did run across a number of issues that we felt were necessary to bring to you. Most of them are mentioned at some point in the report but, for the purposes of our discussions and deliberations, it would be useful to summarize those issues or questions relative to the review of Annex 1 and what we are going to do with it.
First of all, there are a number of issues related to the overall process of collecting and managing the data that would be used to compare with Annex 1. I was a member of the IJC's Indicators Implementation Task Force, and a lot of these issues are the same as we ran into when we tried to compile data sets that could be valuable in identifying our status relative to indicators in the Great Lakes. It's a very similar type of situation.
First of all, we ran into a variety of ways to deal with censored data. Many of these compounds that we are looking at have, of course, quite low concentrations, particularly in the water, and one oftentimes ends up with a lot of non-detects. It was difficult, particularly when we were just given an average as opposed to being given all the raw data, to know whether that censored data or less-than-detects were given the value of zero, or the value of the detection limit, or half the detection limit, or whether there was a real systematic statistical approach, something like a maximum likelihood, to use to determine what those non-detects might most likely be, and then take that into account in coming up with an average. In most cases, we tried to follow up with the agency or with the contact, to determine exactly what they did with non-detects and, in many of the cases, they were actually set to zero and that, of course, would affect the average, if there were a lot of them in the data set.
Another issue was, there are two sides to the border, two federal agencies, a number of states and provinces, and all of them have various kinds of monitoring programs and their own protocols in general, because they have their own objectives, and the question oftentimes came up, of whose data do we use, or how do we combine from two different sources that maybe don't use exactly the same protocols.
A big issue with the data, of course, was spatial and temporal coverage or reference of the data. What is the definition of open water? Where do we cut the line? What about fish that are traditionally open-water fish but maybe were caught in an Area of Concern or in a nearshore area? How does that factor in? Those kind of things.
Open-water monitoring programs don't cover all seasons. They usually go out in the spring and again in the fall, or something like that, late summer. How does that relate to some sort of annual average, if you will, of the system? When is sampling done and how does that factor in?
Of course, how much spatial coverage there is in the system, the number of stations, is it statistically significant in terms of spatial variability in the system? That may be a lake-specific answer, which adds another level of complexity to the issue.
Finally, with regard to data management, we found variations in the sampling and the analytical protocols, particularly for the persistent organic substances, the hydrophobic organics that there are differences in the analytical protocols for EPA versus Environment Canada. They have looked at these and have talked about them before, but the fact is that there are differences, and does this make a difference? I'm not sure.
The question of whole fish versus edible fish -- what should it be? Of course, that depends on what one's target is and what one is trying to protect. We ran into the whole issue, as Wendy mentioned, of species of fish, their size and age and, again, there are differences between the sampling programs in terms of what size fish, or whether one wants to categorize the data -- all adults, or all fish greater than a certain age or a certain length, versus actually putting them into age categories and doing an analysis that depends on the length or the age of the fish in terms of the concentrations.
The actual policy values, at least for fish consumption advisories, are different in the two countries, because of the difference in how they put the data together that they collect.
That is a summary of the data management issues and questions. There are also a number of policy-related issues that we need to resolve, if we are going to move forward with any revision or review of Annex 1. There are a number of Objectives that specify both water and fish tissue concentrations. In going through those where there are both, the question comes up, and in some cases we have answered it, there is really an inconsistency between the water and the fish objective. In the GLI, the guidelines are given as water concentrations, but actually they were derived from a bioaccumulation factor for fish, so there is an internal consistency in those numbers, if one believes the BAFs that were used to make the conversion. It is not clear that that was done in all cases with the Annex 1 Objectives. We need to look at that. It is also possible that there are different things you are protecting with a water number versus a fish number, particularly if it is an edible-fish number, in which case one may not need that internal consistency. But, in terms of people who have to set or specify regulations, or set targets or develop monitoring programs, they need to know at least what that number is for.
The specification of fish "policy values," we used this term used to cover a large breadth of different things, but basically it includes criteria, targets, management objectives, and similar such, so it is not necessarily a standard which has a legal basis or foundation. There are whole fish versus edible fish. The whole fish generally is used for ecosystem and wildlife protection versus human health protection where one would look at the edible portions of fish. Of course, the actual targets and the resulting advisories that are established, there are variations among the agencies, as we all know, in the Great Lakes as to how they apply these data.
We also found a number of discrepancies -- not surprising, since Annex 1 was established in 1978, the actual numbers that were used were earlier than 1978 -- between the Annex 1 objectives and the Parties' listing of what they called parameters of concern. There were a number of parameters with "policy values" or some sort of criteria but that were not listed in Annex 1. There were also, in a few cases, parameters that are listed in Annex 1 -- the phthalate esters are an example -- that really are not monitored to any extent in the system. For example, we found, in the Canadian Water Quality Guidelines, 44 additional organic substances or groups of substances that are not listed in Annex 1. We found 5 metals and 5 other organic substances or physical properties. With the GLI, we found 11 additional substances not in Annex 1, and there are a whole slew of Tier II options that some states have adopted.
This in itself speaks to the suggestion by Dr. Fisher that maybe we ought not to try to be quantitative about every chemical that could come along with regard to what we do with Annex 1 but, rather, have some sort of statement of general purpose or perspective on this.
Lastly, there are a couple of additional policy issues that we think ought to be mentioned. First of all, in looking at the conceptual basis for the Annex 1 objectives, this was not specified for all of the parameters. In many cases, the conceptual basis that was specified was not necessarily consistent with the policy values of the states or the agencies. How to deal with the variation among those would really be an issue.
Finally, it is important in our deliberations with regard to the Annex 1 objectives, that we recognize and attempt to have some consistency with several other ongoing programs in the Great Lakes, to name a few, the Lakewide Management Plan process, the indicators that are being developed and applied with the SOLEC (State of the Lakes Ecosystem Conference) process, and the Great Lakes Binational Toxics Strategy. Make sure that we consider those sorts of things when we consider revising or reviewing Annex 1.
Pupp: Regarding the monitoring aspect of the work that you did, did you consider what that would contribute to the question of what to do with Annex 1? Of course, we would like to know the state of the lakes, what are the levels and, of course, we would like to compare these with the goals, but if we are questioning whether the goals set in Annex 1 are valid, how would one use the monitoring data that have been collected to answer the question?
DePinto: I think that all we can do with the existing data is what we have done, that is, a screening-level analysis to determine if we are consistently and significantly below or above an existing objective. That is where the monitoring comes in. If we were going to actually or literally apply Annex 1 or some revision of Annex 1 as a hard set of numbers to shoot for, then I am quite sure that Dr. El Shaarawi will tell us that we really need to design a monitoring program that will allow that to be done in a "statistically valid" way, if such exists, but at least in some sort of a well thought out and un-biased way to make those comparisons.
Pupp: If we are going to use the monitoring data at this stage, should we figure which are the quantitative numbers to shoot for, and then look and compare the monitoring data?
DePinto: I think we can do that, perhaps revise the monitoring program, based on this screening-level comparison to a certain extent but, on the other hand -- I'm a modeller, and people always ask, what's wrong with the model, what's the model error, but I turn around and say, there is error with the data as well, and that has to be taken into consideration. That sort of principle is valid here. We first have to decide whether these Objectives we set in 1978 are still valid and whether we even want to make the comparison. In some cases, the concentrations, in the water in particular, for some of these compounds, toxaphene, for example, are well above what we would set, if we were setting today. Those things have to be looked at as well.
Larson: There were also contaminants, we were told, that the agencies stopped monitoring for, because the levels were so low that they did not see any point. Endrin was an example?
Fisher: To what extent did you find that those parameters for which the objectives came from the Food and Drug Administration originally, were really dominating the ones that were not being monitored? In my experience, of all the parameters on the Great Lakes lists that received the least monitoring, the ones that originally came from the Food and Drug Administration were among them, almost universally. It sort of made me wonder that maybe that was one of -- In terms of the phthalates, originally tainting was in the Food and Drug list. By the way, I wouldn't expect to find any fresh-water data. All of the data I've seen is for the marine systems, and not the marine systems here. The most recent data I've seen is for the Houston ship channel. But the origin of a lot of the parameters, you compare the ones that show up on the Food and Drug Administration are the ones that are not being monitored.
DePinto: We have not made that comparison specifically, but it would be a good thing to follow up on. It is potentially that those numbers are really well above what maybe they should be for the lake protection objectives, and maybe that's why they are not being monitored any more.
Pupp: Joel said that it would be worthwhile to drop the numbers from the Specific Objectives, but we need some numbers. Would we use the GLI values, would we call them environmental quality guidelines? We do not agree ...
Fisher: What I'm afraid of is, if we get too married or too stuck in the numbers, the state of the science is continuously changing, and those targets, even if we chose them well, become obsolete almost as soon as we put pen to paper. When we have chemicals like endocrine disruptors, techniques like molecular probe methods, radio-immunoassay techniques which look at excruciatingly small amounts, literally one can detect one molecule of a given substance in a cell by some of these techniques, what does it mean to put an objective on it? The science itself so boggles the interpretation of an objective in philosophical terms, that I would be very leery. I don't think throw them all. We have to be very careful that you run into the trap of continuously obsoleting oneself. When a colleague told me that she could detect a single molecule of insulin in a cell, that's when I knew that Specific Objectives were off the wall.
Unwin: I want to clarify -- when I said that the Work Group looked at a couple of parameters and that they were orders of magnitude below the Specific Objectives, what we looked at was mercury and trans-nonachlor, both from the Lake Michigan Mass Balance data base. We compared mercury with the mercury objective and, indeed, it is about three orders of magnitude below, at least in Lake Michigan. We compared trans-nonachlor, a component of chlordane, with the chlordane objective and, again, it is about three orders of magnitude below. It turns out that those two are among the largest discrepancies that were identified. That's how we got to this review of Annex 1 and this workshop today.
Our next speaker is Jim Whitaker, who is a senior technical consultant and the manager of the Knoxville, Tennessee Office of EA Engineering, Science and Technology. He is a nationally, at least U.S. nationally, recognized leader in state and federal water-quality-based regulations under the Clean Water Act. He works in the area of permitting - whole effluent toxicity, toxicity reduction evaluations, and water quality assessments. Jim served as a member of the public participation group when the Great Lakes Water Quality Guidance, the GLI, was being formulated. Jim is a good choice for this presentation on water quality standard setting, because he showed during those deliberations the deep knowledge of how water quality standards should be set, and had a real influence on how the GLI was finally put together. He is going to talk to us about the science of standard setting. The Work Group felt that it would be appropriate background for this workshop to touch on this topic as a setting for discussions this afternoon which, as the question we had earlier, may involve talking about, "How should you do that? And what standards should you set if you're going to revise the Specific Objectives?" I will turn it over to you now Jim.
The Science of Standard Setting
Jim Whitaker, EA Engineering, Science and Technology
Click here to access the visuals that accompanied this presentation.
Whitaker: Thank you for inviting me to speak today, and I should point out that I was told to share my opinions so you're stuck with that. The topic of this workshop is certainly a timely one. In recent years there has been a great deal of debate and sometimes heated disagreement about what benchmarks to use in determining the health of aquatic ecosystems. One thing we all can agree on is that we need such benchmarks. Now, ideally, we have one number for each chemical that we know that, if it was met, would protect aquatic life, human health, and wildlife. And this number would be based on an extensive database and the latest scientific information on the toxicity of that chemical to all important species in the ecosystem, as well as to humans. Unfortunately, there are few or no data for most chemicals for many important species, and even fewer data on effects to humans directly. So recognizing these limitations, we look for a reasonable scientific path to find benchmarks that best approximate our goals. But the first thing we run into is a roadblock that is created by a communication problem. This was already touched on this morning. What was the term that was used? Policy value. You could use that to cover this whole blanket here. And we've already talked a little bit about the confusion introduced by the terminology. The terms water quality standards and water quality criteria are often used almost interchangeably. As Wendy pointed out earlier, water quality standards are enforceable, whereas criteria are simply guidance. Of course our topic today is the Specific Objectives but, you also see terms like guidelines, action levels and you could probably come up with 10 other terms you've seen that fit this category.
Well, do all these terms mean the same thing? You see different numerical values associated with these different terms. Are they interchangeable? Can you compare one to another? Can you really take a single value and use that for each chemical and say, "We meet this and we know we're okay." The reason I'm here is to talk about - what is the best approach based on the current science?
I want to talk first about water quality standards and spend a little time on what a standard is, what it means. A true water quality standard consists of two parts, and it is often forgotten that this is a two part term. The first is a designated use about (we'll look at designated uses in a moment). The second part is the water quality criteria that are necessary in order to protect that use. Often all we think about is the second part.
Designated uses, I could not really come up with a definition. It's best defined I think by example, because it just kind of makes sense to us what these uses are. Most water bodies will have an aquatic life use assigned to it, based on the habitat that is present and what can be expected to live there. That may be a warm water fishery, it may be a cold water fishery, it may be a trout stream. It may have some kind of special protection status associated to it. Sometimes a habitat is limited to uses only expected to be survival of aquatic organisms. The fish will not live there long enough to reproduce. However, most cases you are looking at a lifetime use of the water by those organisms that they can reproduce and thrive. Of course there are also human health uses. Obviously drinking water is an important use, but non-drinking uses as well - being able to safely eat the fish that you catch from those waters. And wildlife uses, not just birds that were discussed earlier but also piscivorous mammals that feed on the fish that live in these waters.
A couple of other uses we don't think about as often - recreational uses. And this is human recreation, be it wading or swimming, or whatever. And typically these are health-based, they are bacteriological - fecal coliform levels, e coli, that sort of thing. There are also agricultural uses, and a lot of these arose out of the Red Book that Joel talked about this morning for safe levels for irrigation of crops and for watering livestock. And there are even industrial uses of water that require certain quality. The important thing in designated uses is we are not just talking about what concurrently can be achieved in a given body of water, but what is attainable, what is possible, based on the habitat and the natural conditions. If there is a man-made problem that is causing the uses not to be attained right now, what would be possible in the absence of that environmental insult? And the process for going about determining this is called Use Attainability Analysis .
Moving on to the criteria - there are two types. The first is the narrative criterion that appears in nearly every state's water quality standards in the United States, something to the effect of "no toxics in toxic amounts." There are also a wide number of numeric criteria. These are the chemical concentrations that have been determined to be necessary to protect different designated uses I talked about a minute ago.
When you start looking at the different water quality criteria for a variety of different uses, you can see a very wide range. I just kind of picked this one out almost at random from the state of Ohio - the various water quality criteria for total chromium for different designated uses. There are aquatic life uses. This is for warm water fishery. These are based on a hardness of 100 mg per litre, and there are aquatic life criteria for both acute protection for short-term exposures and chronic for long-term exposures. You see the huge difference between acute and chronic criteria. There are also two different human health criteria for drinking and non-drinking uses. They differ by two orders of magnitude. And there is an agricultural water supply criterion. In many cases, it is the agricultural use that actually drives the whole permitting process, that number back from the Red Book. When I look at this example, I look back to the Specific Objectives which, frankly, I have to admit I don't use on a daily basis in water-quality-based permitting in the U.S. But the number for total chromium is 50 parts per billion and it is stated that that is the number for protection of raw water for drinking water supply. So, it is a very different number, there for a very different reason, but it is more stringent than all the others.
Which of these numbers is right? That's the question. There are three components to a proper water quality criterion and all too often we only pay attention to the first one and that is magnitude. How much of a chemical can be present before you expect to see an effect on the organisms using that use. But the other two are important as well. Duration - how long can organisms be exposed to that concentration before you expect to see an effect. And finally, frequency - how often can that number be exceeded before you would expect to see any kind of effect. Magnitude in and of itself means nothing if these other things are not taken into account.
I want to talk about two particular types of criteria in looking at the U.S. EPA approach, just to give examples of the type of scientific process that goes on in deriving criteria. I'm not going to do this in an exhaustive sense, but just kind of an introductory overview. For aquatic life criteria, EPA guidelines have been pretty much established for many years. They have been virtually unchanged since 1985, except for a little bit of tweaking. It is a very rigorous statistical procedure and the data requirements are pretty stringent. They require that you have toxicity data for at least eight different families of aquatic life - a real diversity. And these procedures are the basis for the aquatic life procedures in the GLI and virtually all the states in the U.S.
The way the process works is basically EPA or the state will assemble all the aquatic toxicity data for a given chemical and then rank the species for sensitivity, from the most sensitive all the way up or down, depending on how you want to look at it, to the least sensitive species. The procedures are designed to protect the 95 th percentile of the most sensitive species out of that database. The procedures develop both acute and chronic criteria, as I said before. In many cases, particularly for the metals, there is a strong relationship between water hardness and toxicity, so that the criteria are not just a single number, they are expressed as a function of hardness - the softer the water, the more toxic the metal.
What really drives the numbers? What is the procedure sensitive to? Well, the way the procedures are set up, the calculations actually only use the data for the four most sensitive species. You are only really looking at the data for four species. The rest of it is pretty much ignored, except for in fact that it enters into the total number of species (which is the second bullet). If those four most sensitive species are kind of consistent, you have a continuum of data. That is one case. But if your four most sensitive species are far more sensitive than the rest, than they may not be as representative of the remainder of the database.
The number of species is also important. The way the procedures work, the more species for which you have data, the less conservative the procedures are - and it drives a higher criterion, if you will - less safety factor built in, whatever terminology you want to use. But if you have only a handful of species, it tends to make the criteria much more stringent.
Acute-chronic ratio - typically for most chemicals you have acute data, short-term exposure data for a lot of different species, but very few for chronic exposures. So you look at the relationship between acute and chronic toxicity for the handful of species for which you have both, and then assume that that relationship will be the same for all of the other species. There is generally not any data to support that, one way or the other.
What about duration? EPA's guidance has traditionally said that acute criteria should be expressed as a one-hour average concentration, and the chronic criteria as four-day average concentrations. How do these compare to the actual exposures in the toxicity tests? Acute tests are typically anywhere between 24 hours to maybe a week long, depending on species that you are dealing with but, generally, they are in the magnitude of a couple to four days. The chronic can be as long as several months, depending on whether you might be doing a life-cycle test. Yet we reduce those down to express the criteria as a one-hour or four-day average. Why is that? There is not really anything magical about the one hour and the four day. It is an assumption that all chemicals act very quickly and, if you are exposed in that first short critical period of time, that is when the effect is going to be seen - even if you continue the exposure over a long period of time, or put them back in clean water. If they are exposed for that short period in the beginning, you are going to see the same effect. The assumption that all chemicals are that fast acting really is not true. Chemicals act in many different ways. EPA has begun to recognize this, and has begun to back away somewhat from the one-hour average and has allowed the states more flexibility in saying it needs to be short in duration - whatever that means.
Frequency is a bit of a mystery also. EPA guidance has always said, "This criterion can be exceeded once every three years without expecting there to be any problem." Where does that come from? There is really not an ecological basis to that exceedence frequency. It is kind of similar to what they expect the frequency to be for the designed stream flows that they use for water quality modeling, but it doesn't really have a relationship to ecological recovery. It would seem intuitive that, in order to look at frequency, you would have to know how great was that exceedence. If your criterion is 100 ppb, if you measure 101 versus measuring 500, it would seem you could have 101 more often than you could have 500, but that magnitude difference isn't something that is built into the guidance. And again, would the relationship be the same for all chemicals?
One thing that EPA provides is a set of established procedures for developing site-specific criteria. There are three tools in the toolbox that they give us. One is the recalculation procedure. As I said, you start out with this long list of acute toxicity data for all these different species, and EPA or the state will allow someone to go into that database for a particular water body and say, this one, this one, this one and this one, these are examples - it's a flag fish that is found in streams in very arid areas - it is in a lot of the databases. There are other things that just clearly could never be present in that water body. You could try to make the case that that ought to be removed from the database. One stumbling block to that, and rightfully so, is that EPA says, "We need to know that that sensitive species that is not present in this particular water body might not be representative of other sensitive species that are present but you don't have data for." If there is any way which they are representative of other organisms, they need to be kept in. That is a hard demonstration to make.
The other thing is, as I said, these procedures are very sensitive to the number of species that you test. And as you pull these data out, and drive that total "n" down, you are going to also be tending to drive the criterion to be more stringent. I should point out that this is site-specific - it is not biased to reducing or increasing the criteria, but most of the people that are going out to use this are using it because they have a water-quality-based permit limit that they cannot comply with because of the criterion and they are trying to see if the criterion may be adjustable. Typically that is what people are looking at - bringing the criterion up.
Water-effect ratio is simply going back and doing the toxicity test in site water, collected from the water body you're concerned about, and comparing it to results in laboratory water. Often the natural waters, the toxicity would be somewhat lower, and you need a higher criterion.
Residence species is hardly ever done. You collect the organisms of the species that are present there in that water body, conduct a test on them, and derive the criteria. Very data intensive, difficult to do, and expensive.
Human health criteria. Again, just talking about EPA guidelines. These were updated very recently. I did some review of some of those procedures, and the guidance was final last fall [2000]. In fact, the procedures as they ended up, a lot of the things that they changed from the earlier national guidance were similar to what was done in the GLI. Of course, there are procedures for both carcinogens as well as non-carcinogens.
What drives the numbers of human health criteria? I know you are familiar with some of these issues. Fish consumption rate is a huge issue. The fall in consumption rate now in the national level in the U.S. is about three times what it had been before. But even more importantly, the guidance recognizes that there are particular subgroups, be they subsistence fishermen or whatever, that may have much higher consumption rates, and protection needs to be provided for these sub-populations as well. Also, there may be particularly sensitive groups such as women of child-bearing age or young children. If the toxicity data really points at that being the key group to protect, then consumption levels for that particular group can be used to derive the criteria. It also strongly emphasizes whenever you can, use site-specific fish consumption rate and not national defaults.
The guidance now calls for using bioaccumulation factors rather than bioconcentration factors, and clearly for the bioaccumulative chemicals that leads to much more stringent criteria. When there are good field validated BAFs it certainly is the right direction to go. The problem is there are not really good site-specific field validated BAFs for all the chemicals. There are models available for predicting BAFs however.
Cancer risk level is a policy decision. EPA allows the states to choose from within a range. But what is the appropriate level? Somewhere between one additional cancer in ten thousand to one additional cancer in one million. There are uncertainty factors put in there to extrapolate from mammalian data - mice or whatever - to humans. Relative source contribution is something that is built into the procedures now, recognizing that for a lot of chemicals it is not just the exposure you're getting from the water and the fish, but other environmental sources as well - from the air and so forth.
The duration for human health criteria is quite a bit different from aquatic life. We are talking about a lifetime of exposure here. Lifetime as defined by EPA is 70 years. If you live better than that, I am not sure what EPA says, but you're protected for 70 years anyway. Which right away says, if you are looking at a human health criterion in a given water body and you compare it to a chronic aquatic life criterion - exposure in one case is 70 years the other is four days - how do you compare the numbers? Frequency is something that I have never seen addressed with human health criteria yet. If EPA has said it, I've missed it. How often can that number be expected to be exceeded in a 70-year exposure before you expect there to be a problem? I'm not sure anybody has the answer to that.
Wildlife criteria are so key in this region. We talked earlier about what wildlife criteria are and the fact that there are only limited data to the key target species. That is why, as was pointed out earlier, there are only four wildlife criteria so far. Again, these are the water concentrations we are talking about, derived from BAFs. It is important to point that out that the Specific Objectives include something you would call wildlife criteria. However, they are expressed as tissue concentrations - and here I am strictly talking about water.
What is going on now? What is expected in the near future? At least in the U.S., something which received a great deal of emphasis the last few years is the aquatic life criteria for metals. The key issue here is what is commonly called bioavailability. What concentration does the organism actually see at the point that causes the toxicity? In most cases with metals you are talking about it at the gill. It was recognized several years ago by EPA that dissolved metal much more closely approximates a true concentration than does total metal. Total metal is much over-estimated in many cases. How much metal is really present to cause toxicity? EPA is working on a model now with help from many outside the agency to look even closer about how the metal behaves around that gill surface. There are going to be even more and more changes in how that is dealt with, but it just points out the importance of exposure. What is the actual concentration which the organism sees, not what you measure analytically. There may be a very big difference.
Tier II values, everybody heard a lot about during the GLI, a lot of talk about that. In many cases we don't have data to a whole lot of species for some of these chemicals, and yet we were concerned that there may be a problem. What do you do in the absence of those data? Tier II provides a shortcut procedure, if you will, with some safety factors built in, that allows you to derive a value that you can at least try to compare to the true Tier I criteria. What do you do with those numbers? Can you use them in the same sense that you use Tier I? Would it be appropriate to take them and combine them with a use and call it an enforceable water quality standard?
Nutrient criteria on eco-region level is something that EPA has recently been coming out with> Just two months ago EPA came out with its first tissue-based criterion for methyl mercury. It is interesting to note how long the Specific Objectives have had tissue criteria, this is the first one that EPA has published. The other things you see down there at the bottom [sediment criteria / guidelines, biocriteria] are things that are receiving a lot of attention.
A few issues that come to my mind, and some of these questions are things to keep in the back of your mind in the discussions this afternoon. First of all, what is it that you are trying to protect? It is not a question that can be tossed up really easily. It needs to be considered carefully. Criteria in the absence of designated uses and appropriate and attainable uses really are just numbers; they don't mean anything else. Are you trying to protect every individual - for humans, clearly that's appropriate - but when you are talking about wildlife and fish, are you trying to protect every fish that is out there, or are you trying to protect the populations of those fish? What is the actual exposure? That gets to what I was talking about with bioavailability a minute ago. What is an acceptable level of risk? I mentioned cancer risk, but there is also an assumption built into those aquatic life procedures. We can use a 95 th percentile most sensitive species - you are saying right away, there's 5% you're not protecting. Is that considered to be acceptable?
Where do the criteria apply? That is a biggie. Obviously it applies when you start thinking about, "Should we or should we not have mixing zones?" It also applies when you start thinking about, "Do you use the same criteria for open waters as you do in harbors, as you do in tributaries?" Habitats are very different. Can you use a one-size-fits-all criterion for all of those cases? Another example is with drinking water criteria. Do you need to meet the drinking water criterion everywhere in the lake, or you do you need to ensure that is met at the point of drinking water withdrawal, or even after whatever treatment occurs at drinking water withdrawal? What is the appropriate point?
Are you better off with national one-size-fits-all numbers so there is consistency from state to state? And a lot of the states in the U.S. are very concerned about that. They don't want economic disadvantage in their state, to have more stringent criteria, and have business move out. Do you do them on a regional level like the GLI? Do you do them on a watershed basis, or do you try to make them as site specific as possible? Clearly, the more site specific you make it, the more confidence you have that they are appropriate to your use and your area. However, that is very data intensive and very expensive to try to derive site-specific criteria for all of these chemicals on a site-specific level. What chemicals do you set criteria for? This was another one that was discussed earlier. There are thousands and thousands of chemicals out there. How do we generate all the data that is needed, and how do we go through the laborious process of calculating all of this criteria? What do you do when those minimum data requirements are not met? Do you have Tier II procedures put in place, like the GLI did?
Finally, how do you assess compliance? I know that's the next topic and it is a nice segue. The reason that I put it up here is just to point out again the example of the human health criterion. If you go out and collect a grab sample in Lake Michigan and it exceeds the human health criterion, what does that tell you? If the exposure assumption is a 70-year lifetime exposure, does the grab sample exceedence mean anything? I know many of these are policy, rather than scientific issues, so I will not pontificate without any data. And I always appreciate that.
And again, my opportunity to state my opinion, if I were to just state three characteristics of a "quality" water quality criterion. The first would be that it would be robust. My definition of robust is not a statistical definition but based on extensive database across a range of species in order to minimize the effects of your statistical analyses and to minimize also the use of uncertainty factors. To the greatest extent possible criteria should be localized. They must be appropriate for that ecosystem or whatever level of which you are setting your criteria, and appropriate for the designated use.
Finally they must be flexible, not only as I say here, to be adjustable using reasonable but defensible site-specific procedures, but also to be flexible in the sense, as new data become available that the procedures can be used to constantly update criteria. Again - all the speakers have been great, I can't remember who said what - but it was pointed out that, as you get more and more data, criteria are sort of our moving target. That shouldn't be seen as a bad thing. That is a good thing, that we are learning more, and can come up with better and better criteria as time goes on. Thank you very much.
Unidentified: I want to point out, on the issue of having a multiple species ... drive water quality criteria for protection of wildlife. I know that they had four or five species that they looked at for this for birds, such as kingfisher and eagles and others. And all of them were in a ... or range of a methyl mercury criterion of about 50-100 parts per quadrillion. When it was scaled up to a total mercury criterion it was about 0.9 parts per trillion. There were four or five species and all of them were within not a huge range, so I think having more species by and of itself does not guarantee that it is going to lead to kind of a less protective criterion which I thought was your implication.
Whitaker: That is true for wildlife criteria or human health criteria. However, aquatic life criteria was the example I was giving. Those procedures specifically have "n" in the calculations because those are done to protect populations. Aquatic life criteria are done looking at population-level effects whereas even the wildlife procedures are really looking at what is necessary to protect an individual kingfisher or mink or whatever. You are taking the best toxicity data you have for sets of species, taking that as a reference dose, if you will, and filling in your bioaccumulation factor and so forth. It is more driven by the sensitive species than it is by how many species or data. That's a good point.
Unwin: Article IV of the Great Lakes Water Quality Agreement contains the following statement. Article IV deals with the Specific Objectives. It says, "The determination of the achievement of Specific Objectives shall be based on statistically valid sampling data." Part of that quote was in Dr. Fisher's presentation, and I think he said something to the effect that no one really knows what that means. I have been looking forward to doing this introduction since I got the biographical write-up. Dr. Abdel El-Shaarawi is a research scientist at the National Water Research Institute in Burlington, Ontario, also a professor of Statistics at McMaster University. I am just going to go through a few of his achievements, but these are just the highlights. He is the cofounder and past president of the International Environmetrics Society, cofounder and editor-in-chief of Environmentrics, co-editor and chief of the Encyclopedia of Environmetrics, a Fellow of the American Statistical Association, an elected member of the International Statistical Institute. He received a distinguished achievement medal from the American Statistical Association's section on Statistics in the Environment. He has authored or co-authored more than 120 papers and co-edited or edited eight books and journal special issues. He serves on numerous editorial boards for statistical environmental journals. He may be the only person I have ever met who has a Bachelors, a Masters, and a PhD in statistics. What stamina! If anybody in this room knows what statistically valid sampling data is, I suspect Dr. El-Shaarawi does. Would you please tell us?
The Science of Compliance Assessment
Abdel El-Shaarawi, National Water Research Institute
Click here to access the visuals that accompanied this presentation.
El-Shaarawi: I was told not to talk about statistics even if I know what some of the statistical issues are. But this is the kind of bag that I always dabble with, and I hope that you will have some patience if I present some statistics which you do not really want to hear. What I intend to talk is to give some kind of general, statistical concept and then supplement it with some real examples that I have been looking at in the last week or so, which is connected with the accumulation of PCB in fish. I used the same data sets that Joe DePinto has presented and I think this will be just used as a guidance for what I intend to present as a concept.
The first issue that one should consider is to define what we call the "target population." The previous talk mentioned that when you want something specific for a specific area, what would be your target population? In our case is it related to some media - the sediment, or water or biota? Is it related to the nearshore zone, is it related to the open water? What is the target population? It is very important to determine what is your intention to see that you are meeting the criteria or not.
The second thing that you have to determine - what are your objectives? You are dealing with ecosystem health or dealing with human health. As the previous speaker has mentioned, ecosystem health could be a little bit more complicated because you are dealing with multiple species, you are dealing with multi-chemicals. It is a quite complicated issue and you have to look at really stating exactly what is your objective, and supplement it with some kind of robustness in the way that you are going to come up with setting your criteria or setting your sampling design, and so on and so forth.
If we talk about these things, then we come into specific issue of what is the target characteristic that you are interested in measuring. Are you interested in measuring endocrine disruptors? Are you interested in seeing the fish with a tumour - in the last slide that was presented by Joe and Wendy. Are you interested in characterizing that? What is really the characteristic that you are trying to study, or trying to see if the regulation or the criteria is meeting the condition we have. These are the questions that are very important to address.
The next slide gives the PCB concentration as it is accumulated in the lake trout tissue using 1992 data. What you see here is two aspects that are quite interesting. The sport fish has been released in a specific year so the age of fish is completely determined. You know how old the fish is. The thing that you notice is that there is variability, and the variability is connected with two things. If you are looking at the range of concentration in the fish for a fixed age, here you will see this wide range. You have to take this variability into consideration when you are setting a limit. You cannot really just say that this is a number 0.3 the fish should not exceed in the concentration, 0.3 of the PCB accumulated, or whatever it is. You need to look at the precision.
The second thing, when you are dealing with the Great Lakes, you have a series of systems. You have Lake Superior at the upper end and then you have the other lakes. And then you have to look at the level in the different lakes and the uncertainty associated with these different lakes. Two kinds of symbols have been indicated here. The open circle indicates Lake Ontario and the closed circle indicates the concentration in Lake Superior. You can see that the distribution has been limited up to eight years of age for Lake Superior, and we have the distribution going up to 14 years of age. If you are comparing, are you comparing age specific, or are you not comparing age specific? What are the limits that we have? Then if you set some sort of limits - I put here 2 ug/g or 1 g/g. So this really has split the population into two parts, one which is above and one below. What are you going to do? Are you going calculate a summary statistic and try to compare it with this? What really is the means by which you are going to express that you are exceeding? By looking at the regulation in the Great Lakes Water Quality Agreement, I do not really see the mode of the calculations. I see a number - shall not exceed. Is this sufficient? It is my opinion that we have to consider the issue of calculation, we have to consider the issue of where we are going to collect the samples, and all these aspects have to be fed in.
I decided to list for you a number of regulations that I have seen in the Great Lakes Water Quality Agreement. The first one which is related to what I am talking about - the concentration of total PCBs in fish tissues should not exceed 0.1 ug/g for protection of birds and animals which consume the fish. Then I would add a thing - for myself, I call it, absolute. What is the meaning of absolute? Absolute that you cannot really verify that is actually happening in reality. What you are doing, you have one number and this number shall not be exceeded. Or what you can do is take samples and then, if it is exceeded, you say, "We have a problem." But if it doesn't exceed, it doesn't mean that you do not have a problem. So you are not improving things. So this is what I meant by the absolute. Here, for example one, you have to have the entire population with its variability separated on this side to accept and on that side to reject. So this is a little bit of a problem because it is really not a complete definition, in my opinion. It is not a verifiable sample or a verifiable limit. You have to think about the different variety of the way that these limits have been indicated in the system that we have.
The second one, the concentration of unspecified non-persistent pesticides should not exceed 0.05 of the median lethal concentration of the 96-hour test for any sensitive local species. What you are dealing with here, which I indicate there, experimental and then you have random effects. If you get the laboratory in CCIW, or a laboratory on the U.S. side of the Great Lakes producing some data, and then you get some numbers. These numbers would vary from one level right up to the other. What is the meaning of this? You have to take this variability into account. So, you have a population which is coming from that. You have a number of data that people gleaned from an experiment, and you generate a sampling population, and then you go to the field and you have another sampling population, and you go to another field and get another sampling population. So if you have the 0.05 by the median, which is represented by this dotted line, then you are okay this way and not okay the other way. What is the probability of these exceedances? How do you take this probability into account? This specification is not sufficient, in my opinion. You have to determine the degree of risk. When people are talking about a nuclear accident or something like that, they say, well I need the probability to be 10 E-79 or 10 E-50, or 10 E-1, or something like that. Then what you are doing is specifying a level of risk, because you do not have the population. You are doing it through the collection of the data, and the data have variability in it, it has error in it. I think this has to be included.
The third example is determining something which deals with ratio. What you have there, the pH value should not be outside the range of 6.5 to 9. So instead of extrema, you are specifying a range and, within this range, you try to compare it. In addition to this, you see how it complicated it is. It changes the pH at the boundary of a limited use zone. You have to define what is limited use zone and what kind of distribution is there. This will be my range and then, for the ability to accept, you have to have this overlap. It is basically by putting a picture you figure out how you are going to determine what we call acceptable region and rejection region. The basic thing is you are making a decision - accept or reject - and how this is going to be determined.
Sorry for putting the sample of probability there. What I am saying, when you determine a limit, and the limit is indicated by the red line there, then this limit shall not be exceeded. This limit will determine for you a critical population distribution. I call this the critical distribution. Any distribution that falls below this limit is acceptable. If you move in the other direction, it is unacceptable. The thing that one should realize, that the distribution is not only a mean value, it has many different things. If you have the same mean value but different variance, then the acceptable area becomes larger or smaller. One has to try to look at more than one number. One number is a dangerous thing to do. What we are talking about is the probability of meeting an objective when you confront it with data sets. This is the issue: how we verify or how we come up with the conclusion that we are meeting the regulation or we are not meeting the regulation.
What is the precision of the estimates? It depends on a number of things - the number of samples (how many samples you are getting), the sample design (where you are taking, what are the locations of these samples), various sources of variability (spatial, temporal, field, analytical) - all these aspects are why it is important. And it depends on the way that you are looking at the result. If you are interested in estimating an absolute of quantities, the loading to the system, it is very different from comparing one with another. If you have a reference population - you are comparing Lake Superior with Lake Ontario, it is different. You either do it in the framework of hypothesis testing or you do it in the framework of absolute quantities that you are interested in estimating. These things have to be taken into account when you are setting this.
The other aspect which I think is quite important - having universal criteria is quite dangerous in my opinion - because if you have an area which geologically releases those particular substances, by setting this, all that you are doing is showing non-compliance when the source is a natural source. So you have to probably change the criteria to suit the target population. It should not be a constant criteria all over. It is the form of criteria, when you try to protect, let us say the people, don't eat the fish are from this particular part because locally, the fish has been adapted to the environment that they are there and then accumulation is okay. But you do not set criteria for water related to something like this in this regard.
Now I go into the examples that I started with and give you some information that might help you in setting your design. The data that I looked at is the 1992 data, and here is a graph - what we have here I put the variance, the means, and the fitted curve. We have here the mean PCB, we see is that the variability increases as the age of fish increases. You will be uncertain at higher age than at lower age. The variability at Lake Superior which is setting those at the particular age group so we can see that even the two lakes are different. They both fall on the same curve but one belongs to lower levels and the other belongs to the higher levels. You have to take these into consideration. How the variability of the contaminant is changed in the fish tissue or whatever the substance as the level increases. It is important.
Next you develop a model. I developed some sort of a model for the accumulation of the concentration of the contaminant in the fish tissue. This is for Lake Ontario and this is for Lake Superior. The model allows you to interpolate in between, it allows you to do a little bit of extrapolation if you are not going very far. It is basically the form of the model that I found for this particular data set that is quite interesting. It has two components. One is fish-specific, the other concentration-specific in that aquatic system. The one component is independent of the lake - it is the same for Lake Superior and for Lake Ontario. The other one is concentration-specific. The other thing that, when you develop these types of way of looking at the analysis is to try to get the things which are biologically meaningful. What I have, I tried to incorporate some biological information into this. So basically the data helps you in setting the model, then use the model for setting your sampling design, use the model for looking at all the possible consequences in estimating risk and things like that, which I will show in the next set of graphs.
One thing that we have to pay attention to, look for normality of the residual. This is not normal. They have to fall on a straight line. They don't like straight lines. What happens when you have variables that are not normally distributed. What would you do? One thing the statisticians can do is to do a transformation, looking for something that can transform your data to get this kind of thing. The other possibility that you might do is to analyze the data on the scale of measurement that you have. One technique which I think would be quite useful here, is to look at some sort of bootstrapping type of distribution. I will just give you some of the results very quickly of what I want you to see. I am not talking about the technical aspects here. I show three distributions. I look at what are the predicted distribution of the concentration for different age of groups? I have here selected the three age groups - 3, 7 and 14 years old fish. Using the model, generating 1000 samples from this bootstrap distribution to generate the data that I have. This gives you a measure of the degree of risk. If you are looking at the tail area of the distribution you can determine it - if I am interested in 5% or 2% or 1 % at a specific age, what are the limits that you would do? This helps you in setting the limit for the particular distribution. The limit will have higher variability if you go with higher age groups, because you can see the distribution changes quite drastically there. You can see there is a major separation between those distributions. The difference between 4, 7 and 10 is quite substantial that you do not have to worry about the overlap.
If you are setting the regulations for the contaminants at this particular level, the PCB concentration, if you are setting it at 2, then you can safely say one can consume a fish which is 3 years old. You can discriminate. That is quite clear. There is a big separation between the distribution. So this kind of analysis is quite useful and helpful in setting what is the degree of risk, what is the probability of exceedance in this situation. I think work like this would be useful for looking at these kinds of variations.
Here we are looking at the changes in different limits. If we accept the limit as 0.1 nanograms/gram then the variability and the distribution for the number of years - I am reversing the problem. I am trying to estimate the age that corresponds to a specific limit, what kind of distribution do you get. Here you have very precise distribution when you are dealing with very low limits. It is about 0.1. But if you move to a limit of 1 nanograms/gram, then the distribution you need between two and three years. If you would limit 2 the distribution is very far. So you get this kind of diagnostic using the same data set. It will help you in understanding the mechanism of how things will look like and work in this regard.
Here is another good example. If I am using the limit as 0.1 and then, instead of using 0.1, I use the limit 0.15. You can there is overlap between the number year that you have. It is very important to study the probability of overlap because you are interested in looking at the probability of detecting at specific levels. This has to do with what we call in statistics the power of the test. If you have the power of the test you are interested in two things: you are looking for meeting the regulation, which is type 1 - you specify your risk and then you are meeting the probability of it. If you are in the opposite side, what is the probability of accepting the sample that you are in compliance when it is not in compliance? The overlap will be very important in this kind of thing.
These are a number of issues that I think would be quite important to consider and I have tried to present it to you with a real data set. Thank you for your attention. I really appreciate the opportunity of coming here.
Unwin: Are there any questions?
Heathcote: If we were to revise the numbers in Annex 1, based on what you have said, would you recommend that we say, this number shall not be exceeded 95% of the time? Would that be a better kind of statement? Or for a