Showing posts with label irreproducible results. Show all posts
Showing posts with label irreproducible results. Show all posts

Thursday, April 29, 2010

Three-Valued Logic And Irreproducible Results In Science, Part II

Introduction
In the previous post for this series we discussed how Two-Valued and Three-Valued Logic works. In this post we take a look at how Three-Valued Logic (3VL) interacts with the results of science. We'll be using the properties of 3VL's three truth values, True, False, and Unknown, to do this.

In particular we'll be examining how the truth value Unknown of 3VL spreads through science like a virus. In fact, we'll make the claim that scientific results that have unverifiable claims that cannot be independently reproduced are viruses. Any further work that builds upon them is infected and becomes unverifiable and irreproducible itself.

Scientific Claims Must Be Verified
In science, it's not enough to simply make a claim. Claims must be independently verifiable before they are accepted.

When the procedures of the claim are followed and the results of the claim are reproduced, the claim is said to be True. When the procedures of the claim are followed and the results of the claim are not reproduced, the claim is said to be False. An example of a scientific claim being shown False is the original Cold Fusion hypothesis by Martin Fleischmann and Stanley Pons.

However, in situations where the procedures of the claim cannot be followed, we cannot show the claim to be True or False. It is in these cases that the need for 3VL arises, and the truth of the claim is Unknown.

Technical Aside: Cold fusion research has continued since the days of Fleischmann and Pons. Today it is funded by several governments around the world and many researchers in the field believe that the original claims of Fleischmann and Pons have been vindicated. However, several mainstream scientific organizations still disagree, most notably the U.S. Department of Energy and Science magazine. For more information on the ongoing cold fusion research, see the LENR-CANR website.

Example: Aqua Satellite Channel 4 Virus
It helps to have an example, so we'll be using channel 4 of the AMSU on the Aqua satellite. Channel 4 failed completely around December, 2007. In response to this, NASA created a new algorithm and has used it to synthetically create channel 4 data from October 1st, 2007 onward.

While NASA publishes the algorithm used to create synthetic channel 4 values, that algorithm requires certain data that is not available to anyone outside of NASA. Even the folks at NASA's JPL, who are in charge of the Aqua satellite, have said they don't have access to the data.

Without this data it's impossible to verify if the algorithm for synthesizing channel 4 data is correct, even though the algorithm itself is published. Similarly, we cannot demonstrate that the algorithm fails to correctly synthesis channel 4 data. Therefore, the ability of the algorithm to correctly synthesis data must be classified as Unknown because the statement that the algorithm is accurate cannot independently be shown to be True or False.

How The Virus Spreads
To qualify as a virus, the Unknown values must be capable of spreading to other works. To see how this occurs, let's first take a look at how False research is capable of spreading.

We take the example of research attempting to build upon claims that have been demonstrated False, in this case Cold Fusion. The diagram above shows new research that is correct being combined with the results of Cold Fusion. Because Cold Fusion has been shown to be False, the overall conclusions of the research must be False because they require Cold Fusion.

Logically, this situation is captured by a simple predicate: True AND False = False.

The same situation occurs in 3VL when using Unknown, rather than False, values. Because the creation of synthetic data for Aqua's AMSU Channel 4 cannot be shown to be True or False, it is Unknown.  Any research combined with it, no matter how good it is, produces a final result that is also Unknown.

This too is captured by a simple 3VL predicate: True AND Unknown = Unknown.

A concrete example of the spreading of Unknown results in published research is provided by NASA's claims of increased yield due to synthetic channel 4 data. We'll assume that these claims are True and that yields are in fact increasing. However, even with this assumption, we cannot demonstrate that yields should be increasing. Because it cannot be verified that the synthetic channel 4 data is valid, we cannot verify that the synthetic data causes bad data to pass QA or good data to fail QA. The quality of the data in these increased yields is Unknown. This is because the quality of the synthetic data is Unknown.

This cascading of the Unknown value continues through anything that uses the data from these increased yields. In practice, it turns out that all processes referred to by NASA as "Level 2" or higher that use Aqua AMSU data will be infected by the Unknown values. That is, all such data sets have an Unknown truth value themselves due to their dependence upon the increased yield data. These "Level 2" products include:

● Temperature profile from 3 mbar (45 km) to the surface.
● Water vapor profiles.
● Snow and ice coverage.
● Cloud liquid water.
● Cloud-cleared IR radiances.
● Rain Rate.
● Ozone.
Carbon Dioxide Support Products.

Other Examples Of The Virus Spreading
GHCN Data
A while back Willis Eschenbach made the claim that GHCN data at Darwin station was being manipulated to show a warming where none existed in the raw data. Pro-AGW bloggers jumped on this claiming the adjustments were valid.

The problem is the Australian CSIRO Atmospheric Research Center provided no reason why the adjustments were made, stating only that Darwin is a urban site (which should make adjustments go down, not up).

This is an illustrative example of the problem with GHCN adjustments. Even though GHCN provides its raw data and describes its adjustment procedures, it's adjustments cannot be replicated when reasons for the adjustments aren't given.

For this reason, the validity of GHCN data must be classified as Unknown. This Unknown value spreads to anything using GHCN data. This includes the Intergovernmental Panel on Climate Change 4th Assessment Report.

Deep Impact
Deep Impact was a NASA mission to probe a comet by slamming a probe into the comet Tempel I and analyzing the impact. The Deep Impact team at NASA JPL released a photo of the comet with water photoshopped onto its surface (seen at left), a series of medium resolution images of the event, and a chart of the thermal emission spectra of the debris. The chart is shown below.


However, the chart is made of data that's been modified by NASA and the raw data used to generate the Tempel I spectra has never been released. This makes it impossible to verify that the scanners actually produced the results shown in the chart. The resulting unverifiable claims are therefore Unknown.

Curing The Virus
As far-reaching as the consequences of the examples provided here are, we've covered only a small handful of examples. Many more could be provided. Their flow through related work could be tracked and we'd discover that a significant portion of modern science rests upon unverifiable claims.

I think most people wouldn't consider such science to be science at all, but as a problem that stands in the way of science. Fortunately, it's a problem that's easy to fix.

Simply make the claims verifiable.

By making the raw data and computer code used to generate the claims publicly available, and by noting why changes are made to raw data, claims that are currently unverifiable can be demonstrated to be True or False.

And that is the whole reason science exists in the first place.

References
Three-Valued Logic And Irreproducible Results In Science, Part I
Cold Fusion claims by Martin Fleischmann and Stanley Pons - Wikipedia Entry
NASA Responds To FOIA Request
AIRS/AMSU/HSB Version 5 Modification of Algorithm to Account for Increased NeDT in AMSU Channel 4
AMSU - Wikipedia Entry
AIRS/Aqua Level 2 Carbon Dioxide Support Products
The Smoking Gun At Darwin Zero
Willis Eschenbach caught lying about temperature trends
Updating Australia’s high-quality annual temperature dataset
GHCN V.2 Raw Data
GHCN Quality Control, Homogeneity Testing, and Adjustment Procedures
GHCN-Monthly Version 2 Introduction
Deep Impact
LENR-CANR website

Saturday, April 24, 2010

Three-Valued Logic And Irreproducible Results In Science, Part I

Introduction
This is a short series of two posts that discusses Three-Valued Logic (3VL) and uses it to demonstrate how irreproducible results corrupt not only the scientific work in which they appear, but also spread that corruption to any related work.

In short, this series of posts demonstrates that irreproducible results are viral. My current work on the Aqua satellite is used as an illustrative example in the second of these posts.

In this post we look at Two-Valued Logic and Three-Valued Logic and briefly discuss how they work.

Two-Valued Logic
To start off, let's take a quick look at the more common two valued logic system. This system gets its name from the fact that there are only two possible values for any statement: True or False. Two-Valued Logic is also called Boolean logic.

In regards to this post, we're most concerned with how these logic values of True and False can move through a system of logical predicates. For example, if we have a True statement and a False statement and a logical predicate requiring at least one true statement, than that predicate transforms the one True statement and one False statement into a single statement that is True. This is because the requirement of at least one True statement has been met and the predicate is therefore True.

Such a predicate is commonly referred to as an OR statement. For an OR statement to be True, either statement 1 or statement 2 must be True. There are several types of predicates in Boolean logic. The most common are OR, AND, and NOT. One way to define predicates is using a true table. A truth table shows all possible inputs to the predicate and what the output of the predicate is. The truth tables for OR, AND, and NOT in Boolean logic are shown below.

OR Truth Table.

AND Truth Table

NOT Truth Table.

In a nutshell, the OR predicate accepts two inputs and converts them to a single output. That output will be True if either of its inputs are True, or False if both inputs are False.

The AND predicate also accepts two inputs, but returns TRUE only if both inputs are True. In all other cases it returns False.

The NOT predicate simply flips True inputs to False outputs and False inputs to True Outputs.

TECHNICAL ASIDE: Truth tables are one way to define logical predicates. They use "case analysis", a method of listing every possible case and the corresponding outcome. However, there are other, more elegant ways to define predicates. In his book Predicate Calculus And Programming Semantics, the late Edsger Dijkstra defined the AND predicate using what is known as The Golden Rule, shown below, with the equivalence operator (==) having the lowest binding and meaning "Is The Same As":

      The Golden Rule: p AND q == p == q == p OR q

This type of definition is considered superior to case analysis. Once the properties of the various predicates are captured in this manner, they can be used as building blocks to far more sophisticated theorems. Case analysis, on the other hand, can never demonstrate anything more than what is provided by the cases themselves.

Three-Valued Logic
Three-Valued Logic (3VL) builds upon Boolean Logic. There are several conventions for 3VL. We'll be looking at a particular convention, one in which the third truth value is Unknown. So 3VL, as discussed here, has three truth values: True, False, and Unknown.

TECHNICAL ASIDE: Relational database products make heavy use of 3VL, and refer to the Unknown truth value as NULL. The late E.F. Codd invented relational databases while working for IBM, He introduced the convention of using the word NULL to represent Unknown, and assigned NULL the symbol of the Greek lowercase omega, shown in the image at the beginning of this post.


Like Boolean Logic, 3VL has predicates that can be defined using truth tables. The truth tables for 3VL AND, OR, and NOT predicates are shown below.

When the only values in use are True and False, 3VL gives the exact same answers as Boolean logic. However, when an Unknown truth value is used as inputs it can also show up as the outputs for a predicate. The result of the predicate "False OR Unknown" is Unknown. The result of the predicate "NOT Unknown" is always Unknown. The result of an AND predicate with an Unknown input is always Unknown or False, never True.

The Point Of All This
It is this last result that interests us here. The result of the predicate "X AND Unknown" can never be True, no matter what the value of X is.

This is our stopping point for now. The next post in this series will discuss the relationship between Three-Valued Logic and science.

References
Boolean Logic - Wikipedia Entry
Ternary Logic - Wikipedia Entry
SQL NULL - Wikipedia Entry
Predicate Calculus And Programming Semantics - Edsger Dijkstra
A Relational Model of Data for Large Shared Data Banks - Edgar F. Codd