Thursday, March 18, 2010

QAing AMSUExtract

In A NutShell
So, I've been QAing AMSUExtract the last couple days. The end results is that several bugs were found and fixed in my code. But even with these fixes, a lot of data was still failing limb validation. Even after increasing the limb validation tolerance from the original 0.5 to 10.0 (giving over 100 degrees tolerance in some cases), there were still more failures than I would consider acceptable.

This told me that having a limb validation in AMSUExtract was pointless. To get a decent number of records to pass this QA check, I'd need to make the check so loose that nearly any imaginable value would pass.

So limb validation, the -q switch of AMSUExtract, has been taken out. It's been replaced with a -FLOOR switch that lets you set a lower bound for a valid temperature. The default lower bound when you use -FLOOR is 150 degrees K (-123.15° C, -189.67° F), but you can pass any numeric value you want. Scans that are marked as bad by NASA or that fail the optional -FLOOR check are the only ones that will now fail in AMSUExtract.

Limb vaidation is still available as an option for the AMSUQA program.

I want to thank Malaga View for suggesting the feature to save not only scans that pass QA, but also scans that fail QA. This feature is now part of AMSUExtract and was very useful for this QA check. BTW, should anyone else have ideas for features, please feel free to suggest them. If I have the time, I'll add them.

The end result of all this is AMSUExtract is now working and has new features to boot. This will be released in Update 4. Now I move on to QAing AMSUSummary.



Final Results With -FLOOR switch. 
The file is clean like this all the way through.

The Painful Details

The part of this post can be skipped. It's the details on how the QA was carried out. It's here for people who just love to read about QA procedures on blogs or who may need to know the details for something they're doing on their own.

Setup For QA
I setup the QA runs by removing all but one folder for the directory I was working in. That folder contained .hdf files for a single month. That folder, in turn, contained a text folder that contained the text extract files created by ncdump from the .hdf files.

As always, ncdump was run with no command line arguments, just the file name of the file being converted.

You can download ncdump from here.


Running Release 3 Version Of AMSUExtract
The amsu_extract script was used to extract data from all the files for the month. The exact command line was:

amsu_extract -extension C5F1-30_QA_TEST -c 5 -f 1-30 -q

This extracts channel 5, footprints 1 through 30 and performs QA checks on them. The -extension argument tells the script how to name the output file. In this case the output extract file will be named 200912_extract_C5F1-30_QA_TEST.csv, where 200912 is the name of the folder storing the monthly data. This file is created in the same directory from which the script is run.

The version of AMSUExtract called by the amsu_extract script was the same as the one in Update 3 with one exception: scans that fail QA are sent to stderr. The amsu_extract script, in turn, saves stderr to a separate file.


Results Of Running Release 3 Version Of AMSUExtract
Step 1: Check Values Being Extracted
The first thing to do is verify that AMSUExtract is pulling the correct data. To do this, use HDFView and open up the original .hdf files downloaded from NASA. You can download HDFView from here.

Use the File/Open command to open the .hdf files you want to look at. Use the tree control on the left to navigate to brightness_temp. Double click on brightness_temp. At the to of the grid window that opens up, use the page scroller to move to page 4. Page 4 is the data for channel 5, the numbering starts at zero. 

You'll notice the rows in the window are numbered 0-44. These are the scan lines. The columns are numbered 0-29. These are the footprints. 

Open up the extract file (200912_extract_C5F1-30_QA_TEST.csv in my case) in a text editor. Visually compare the values in the extract file to the values in HDFView. You don't have to check every value, but you want to check enough cases to convince yourself the correct values are being pulled.

Step 2: Compare File Sizes Of Passed And Failed Data Files
Checking the file sizes of the passed and failed data shows that the failed file is about 2/3s the size of the passed file. Now, a fair portion of that is header information, but most of it is actually failed data. This means a lot of data is failing QA.

Step 3: Look At Failed Data:
Opening up the failed data file, I saw a lot of data that looked reasonable to me. So I was failing data that should have passed. 

To fix this problem, I tracked down a couple of bugs in the QA checks, expanded the number of footprints that don't get a limb validation check, and increased the tolerance of the limb validation from 0.5 to 10.0.

Even with these changes, a lot of data was still failing. I'll talk about that in the next section.

Results Of Running Modified Version Of AMSUExtract
To try to cut down on the number of scans failing, I gradually increased the tolerance. Eventually, I had it all the way up to 10.0 and still had more records failing than I expected. Screen shots of running with a 10.0 tolerance are below.

The error file looks a lot better, but further down the file there are still quite a few scans listed as failing.
At this point I realized it was hopeless to try and salvage the concept of running a limb validation in the extract. So I took it out and replaced it with the -FLOOR option.



References:
Download ncdump
Download HDFView

No comments:

Post a Comment