Does Hansen’s Error “Matter”? - guest post by Steve McIntyre
11 08 2007change in the U.S. leaderboard (by which 1934 is the new warmest U.S. year)
in the right-wing blogosphere. In contrast,
realclimate has dismissed it a triviality and the climate blogosphere is
doing its best to ignore the matter entirely.
My own view has been that
matter is certainly not the triviality that Gavin Schmidt would have you
believe, but neither is it any magic bullet. I think that the point is
significant for reasons that have mostly eluded commentators on both sides.
Station Data
First, let’s start with the impact of Hansen’s error on individual station
histories (and my examination of this matter arose from examination of
individual station histories and not because of the global record.) GISS
provides an excellent and popular
online service
for plotting temperature histories of individual stations. Many such
histories have been posted up in connection with the ongoing examination of
surface station quality at surfacestations.org. Here’s an example of this
type of graphic:

Figure 1. Plot of Detroit Lakes MN using GISS software
But it’s presumably not just Anthony Watts and surfacestations.org
readers that have used these GISS station plots; presumably scientists and
other members of the public have used this GISS information. The Hansen
error is far from trivial at the level of individual stations. Grand Canyon
was one of the stations previously discussed at climateaudit.org in
connection with Tucson urban heat island. In this case, the Hansen error was
about 0.5 deg C. Some discrepancies are 1 deg C or higher.

Figure 2. Grand Canyon Adjustments
Not all station errors lead to positive steps. There is a bimodal
distribution of errors reported earlier at
CA here , with many
stations having negative steps. There is a positive skew so that the impact
of the step error is about 0.15 deg C according to Hansen. However, as you
can see from the distribution, the impact on the majority of stations is
substantially higher than 0.15 deg. For users of information regarding
individual stations, the changes may be highly relevant.
GISS recognized that the error had a significant impact on individual
stations and took rapid steps to revise their station data (and indeed the
form of their revision seems far from ideal indicating the haste of their
revision.) GISS failed to provide any explicit notice or warning on their
station data webpage that the data had been changed, or an explicit notice
to users who had downloaded data or graphs in the past that there had been
significant changes to many U.S. series. This obligation existed regardless
of any impact on world totals.

Figure 3. Distribution of Step Errors
GISS has emphasized recently that the U.S. constitutes only 2% of global
land surface, arguing that the impact of the error is negligible on the
global averagel. While this may be so for users of the GISS global average,
U.S. HCN stations constitute about 50% of active (with values in 2004 or
later) stations in the GISS network (as shown below). The sharp downward
step in station counts after March 2006 in the right panel shows the last
month in which USHCN data is presently included in the GISS system. The
Hansen error affects all the USHCN stations and, to the extent that users of
the GISS system are interested in individual stations, the number of
affected stations is far from insignificant, regardless of the impact on
global averages.

Figure 4. Number of Time Series in GISS Network. This includes all versions
in the GISS network and exaggerates the population in the 1980s as several
different (and usually similar) versions of the same data are often
included.
U.S. Temperature History
The Hansen error also has a significant impact on the GISS estimate of U.S.
temperature history with estimates for 2000 and later being lowered by about
0.15 deg C (2006 by 0.10 deg C). Again GISS moved quickly to revise their
online information changing their
US temperature
data on Aug 7, 2007. Even though Gavin Schmidt of GISS and realclimate
said that changes of 0.1 deg C in individual years were “significant”,
GISS did not explicitly announce these changes or alert readers that a
“significant” change had occurred for values from 2000-2006. Obviously they
would have been entitled to observe that the changes in the U.S. record did
not have a material impact on the world record, but it would have been
appropriate for them to have provided explicit notice of the changes to the
U.S. record given that the changes resulted from an error.
The changes in the U.S. history were not brought to the attention of
readers by GISS itself, but in
this post at climateaudit. As a result of the GISS revisions, there was
a change in the “leader board” and 1934 emerged as the warmest U.S. year and
more warm years were in the top ten from the 1930s than from the past 10
years. This has been widely discussed in the right-wing blogosphere and has
been acknowledged at
realclimate as follows:
The net effect of the change was to reduce mean US anomalies by
about 0.15 ºC for the years 2000-2006. There were some very minor knock
on effects in earlier years due to the GISTEMP adjustments for rural vs.
urban trends. In the global or hemispheric mean, the differences were
imperceptible (since the US is only a small fraction of the global
area).There were however some very minor re-arrangements in the various
rankings (see data). Specifically, where 1998 (1.24 ºC anomaly compared
to 1951-1980) had previously just beaten out 1934 (1.23 ºC) for the top
US year, it now just misses: 1934 1.25ºC vs. 1998 1.23ºC. None of these
differences are statistically significant.
In my opinion, it would have been more appropriate for Gavin Schmidt of
GISS (who was copied on the GISS correspondence to me) to ensure that a
statement like this was on the caption to the U.S. temperature history on
the GISS webpage, rather than after the fact at realclimate.
Obviously much of the blogosphere delight in the leader board changes is
a reaction to many fevered press releases and news stories about year x
being the “warmest year”. For example, on Jan 7, 2007, NOAA
announced that
The 2006 average annual temperature for the contiguous U.S. was
the warmest on record.
This press release was widely covered as you can determine by googling
“warmest year 2006 united states”. Now NOAA and NASA are different
organizations and NOAA, not NASA, made the above press release, but members
of the public can surely be forgiven for not making fine distinctions
between different alphabet soups. I think that NASA might reasonably have
foreseen that the change in rankings would catch the interest of the public
and, had they made a proper report on their webpage, they might have
forestalled much subsequent criticism.
In addition, while Schmidt describes the changes atop the leader board as
“very minor re-arrangements”, many followers of the climate debate are aware
of intense battles over 0.1 or 0.2 degree (consider the satellite battles.)
Readers might perform a little thought experiment: suppose that Spencer and
Christy had published a temperature history in which they claimed that 1934
was the warmest U.S. year on record and then it turned out that they had
been a computer programming error opposite to the one that Hansen made, that
Wentz and Mears discovered there was an error of 0.15 deg C in the Spencer
and Christy results and, after fiixing this error, it turned out that 2006
was the warmest year on record. Would realclimate simply describe this as a
“very minor re-arrangement”?
So while the Hansen error did not have a material impact on world
temperatures, it did have a very substantial impact on U.S. station data and
a “significant” impact on the U.S. average. Both of these surely “matter”
and both deserved formal notice from Hansen and GISS.
Can GISS Adjustments “Fix” Bad Data?
Now my original interest in GISS adjustments did not arise abstractly,
but in the context of surface station quality. Climatological stations are
supposed to meet a variety of quality standards, including the relatively
undemanding requirement of being 100 feet (30 meters) from paved surfaces.
Anthony Watts and volunteers of surfacestations.org have documented one
defective site after another, including a weather station in a parking lot
at the University of Arizona where MBH coauthor Malcolm Hughes is employed,
shown below.

Figure 5. Tucson University of Arizona Weather Station
These revelations resulted in a variety of aggressive counter-attacks in
the climate blogosphere, many of which argued that, while these individual
sites may be contaminated, the “expert” software at GISS and NOAA could fix
these problems, as, for example
here .
they [NOAA and/or GISS] can “fix” the problem with math and
adjustments to the temperature record.
This assumes that contaminating influences can’t be and aren’t
being removed analytically.. I haven’t seen anyone saying such
influences shouldn’t be removed from the analysis. However I do see
professionals saying “we’ve done it”
“Fixing” bad data with software is by no means an easy thing to do (as
witness Mann’s unreported modification of principal components methodology
on tree ring networks.) The GISS adjustment schemes (despite protestations
from Schmidt that they are “clearly outlined”) are not at all easy to
replicate using the existing opaque descriptions. For example, there is
nothing in the methodological description that hints at the change in data
provenance before and after 2000 that caused the Hansen error. Because many
sites are affected by climate change, a general urban heat island effect and
local microsite changes, adjustment for heat island effects and local
microsite changes raises some complicated statistical questions, that are
nowhere discussed in the underlying references (Hansen et al 1999, 2001). In
particular, the adjustment methods are not techniques that can be looked up
in statistical literature, where their properties and biases might be
discerned. They are rather ad hoc and local techniques that may or may not
be equal to the task of “fixing” the bad data.
Making readers run the gauntlet of trying to guess the precise data sets
and precise methodologies obviously makes it very difficult to achieve any
assessment of the statistical properties. In order to test the GISS
adjustments, I requested that GISS provide me with details on their
adjustment code. They refused. Nevertheless, there are enough different
versions of U.S. station data (USHCN raw, USHCN time-of-observation
adjusted, USHCN adjusted, GHCN raw, GHCN adjusted) that one can compare GISS
raw and GISS adjusted data to other versions to get some idea of what they
did.
In the course of reviewing quality problems at various surface sites,
among other things, I compared these different versions of station data,
including a comparison of the Tucson weather station shown above to the
Grand Canyon weather station, which is presumably less affected by urban
problems. This comparison demonstrated a very odd pattern discussed
here. The adjustments show that the trend in the problematic Tucson site
was reduced in the course of the adjustments, but they also showed that the
Grand Canyon data was also adjusted, so that, instead of the 1930s being
warmer than the present as in the raw data, the 2000s were warmer than the
1930s, with a sharp increase in the 2000s.


Figure 6. Comparison of Tucson and Grand Canyon Versions
Now some portion of the post-2000 jump in adjusted Grand Canyon values
shown here is due to Hansen’s Y2K error, but it only accounts for a 0.5 deg
C jump after 2000 and does not explain why Grand Canyon values should have
been adjusted so much. In this case, the adjustments are primarily at the
USHCN stage. The USHCN station history adjustments appear particularly
troublesome to me, not just here but at other sites (e.g. Orland CA). They
end up making material changes to sites identified as “good” sites and my
impression is that the USHCN adjustment procedures may be adjusting some of
the very “best” sites (in terms of appearance and reported history) to
better fit histories from sites that are clearly non-compliant with WMO
standards (e.g. Marysville, Tucson). There are some real and interesting
statistical issues with the USHCN station history adjustment procedure and
it is ridiculous that the source code for these adjustments (and the
subsequent GISS adjustments - see bottom panel) is not available/
Closing the circle: my original interest in GISS adjustment procedures
was not an abstract interest, but a specific interest in whether GISS
adjustment procedures were equal to the challenge of “fixing” bad data. If
one views the above assessment as a type of limited software audit (limited
by lack of access to source code and operating manuals), one can say firmly
that the GISS software had not only failed to pick up and correct fictitious
steps of up to 1 deg C, but that GISS actually introduced this error in the
course of their programming.
According to any reasonable audit standards, one would conclude that the
GISS software had failed this particular test. While GISS can (and has)
patched the particular error that I reported to them, their patching hardly
proves the merit of the GISS (and USHCN) adjustment procedures. These need
to be carefully examined. This was a crying need prior to the identification
of the Hansen error and would have been a crying need even without the
Hansen error.
One practical effect of the error is that it surely becomes much harder
for GISS to continue the obstruction of detailed examination of their source
code and methodologies after the embarrassment of this particular incident.
GISS itself has no policy against placing source code online and, indeed, a
huge amount of code for their climate model is online. So it’s hard to
understand their present stubbornness.
The U.S. and the Rest of the World
Schmidt observed that the U.S. accounts for only 2% of the world’s land
surface and that the correction of this error in the U.S. has “minimal
impact on the world data”, which he illustrated by comparing the U.S. index
to the global index. I’ve re-plotted this from original data on a common
scale. Even without the recent changes, the U.S. history contrasts with the
global history: the U.S. history has a rather minimal trend if any since the
1930s, while the ROW has a very pronounced trend since the 1930s.


Re-plotted from GISS Fig A and GFig D data.
These differences are attributed to “regional” differences and it is
quite possible that this is a complete explanation. However, this conclusion
is complicated by a number of important methodological differences between
the U.S. and the ROW. In the U.S., despite the criticisms being rendered at
surfacestations.org, there are many rural stations that have been in
existence over a relatively long period of time; while one may cavil at how
NOAA and/or GISS have carried out adjustments, they have collected metadata
for many stations and made a concerted effort to adjust for such metadata.
On the other hand, many of the stations in China, Indonesia, Brazil and
elsewhere are in urban areas (such as Shanghai or Beijing). In some of the
major indexes (CRU,NOAA), there appears to be no attempt whatever to adjust
for urbanization. GISS does report an effort to adjust for urbanization in
some cases, but their ability to do so depends on the existence of nearby
rural stations, which are not always available. Thus, ithere is a real
concern that the need for urban adjustment is most severe in the very areas
where adjustments are either not made or not accurately made.
In its consideration of possible urbanization and/or microsite effects,
IPCC has taken the position that urban effects are negligible, relying on a
very few studies (Jones et al 1990, Peterson et al 2003, Parker 2005, 2006),
each of which has been discussed at length at this site. In my opinion, none
of these studies can be relied on for concluding that urbanization impacts
have been avoided in the ROW sites contributing to the overall history.
One more story to conclude. Non-compliant surface stations were reported
in the formal academic literature by Pielke and Davey (2005) who described a
number of non-compliant sites in eastern Colorado. In NOAA’s official
response to this criticism, Vose et al (2005) said in effect -
it doesn’t matter. It’s only eastern Colorado. You
haven’t proved that there are problems anywhere else in the United
States.
In most businesses, the identification of glaring problems, even in a
restricted region like eastern Colorado, would prompt an immediate
evaluation to ensure that problems did not actually exist. However, that
does not appear to have taken place and matters rested until Anthony Watts
and the volunteers at surfacestations.org launched a concerted effort to
evaluate stations in other parts of the country and determined that the
problems were not only just as bad as eastern Colorado, but in some cases
were much worse.
Now in response to problems with both station quality and adjustment
software, Schmidt and Hansen say in effect, as NOAA did before them -
it doesn’t matter. It’s only the United States.
You haven’t proved that there are problems anywhere else in the world.
Comments : 77 Comments »
Categories : Uncategorized













Recent Comments