Author: David Hulslander
I was on vacation last week, visiting family and enjoying a change of scenery. When I left on the trip, it was summer here in the Northern Hemisphere. Returning on Monday, that has all changed. Basically, if you’re not ready for fall now, you’re already very far behind. Classes at CU Boulder start in 18 days, and the Parade Of U-Hauls has begun as students move back and leases expire. Another sign of the looming end of summer: the deadline for submitting an abstract to the 2014 AGU Fall Meeting (in San Francisco 15-19December) is the end of today. It was extended 24 hours due to overwhelming demand crashing their system. So, it’s not just summer that’s speeding along. In spite of meager funding and support in the US, progress in the earth sciences is rapid, too. As of this morning, AGU had already received 19,500abstracts. We’ll see if there’s a new record when the deadline passes tonight.
Time has an extra bit of importance compared to other variables because it has a habit of slipping away so quickly, and you can never get it back. (That was a week of vacation? Felt like two days, three tops…). It has extra significance in geospatial work because how things work and change over time is where the interesting and important results are found. Sea ice extent or current climate are interesting, but how they’re changing and why is the critical part. For #AGU14 I’m co-author with my colleague Robert Schafer on an abstract highlighting a new set of time series tools for geospatial data that will be in our new ENVI release this fall. My first-author submission, “Using Advanced Remote Sensing Data Fusion Techniques for Studying Earth Surface Processes and Hazards: A Landslide Detection Case Study”, is a case study using a new algorithm I’ve developed showing some promising results for landslide, hazards, and change detection on multi-temporal and multi-modal datasets. Vacations and summer might slip by quickly, and there’s not much we can do to stop that. But, what we can now do with geospatial data over time will at least let us make the most of it!
There is at least one instance when time moves slowly. I’ll have to wait until October to find out if I’ll be giving a talk or a poster at the Fall Meeting. Have you submitted an abstract? I hope to see you there! Until then, here’s a sneak preview of that algorithm’s results, run on some Landsat 8data over Sicily.
Categories: ENVI Blog | Imagery Speaks
Tags: Remote Sensing, AGU, Landsat, data analysis, conference
In geospatial work we’re trying to answer questions about where things are on the earth and how they work. Exact scales and applications can vary, and there are only so many measurements we can take or how much data we can get. As a result, a lot of our work becomes getting as much information as we can and then trying to get all that different data to work together, hopefully resulting in a clear picture answering our question. Data transforms are an excellent set of tools for making lots of data help us.
Too often, information on tools and analyses are aimed at the wrong audiences, assuming the user wants to be an expert and derive the algorithm from first principles. It is important that the underpinnings and mathematical derivations analysis be open and available to anyone who needs to see them. However, often, what is needed is a clear description of how to use tools reasonably. You don’t have to know how to make wine to enjoy a glass with dinner.
There is a lot of detailed information on transforms available; this post is a summary of the important parts and difference of data transforms.
Principal Components Analysis (PCA) has been around since the early 20th century. PCA assumes we have some measurements of points we’re interested in. In image analysis it means having some number of spectral band brightnesses for each of the pixels in our image. With no prior knowledge of an answer the smart bet is “average” and PCA assumes this. Their histograms should be classic bell/Gaussian/normal-distribution-ish curves. Here are the histograms for Landsat 5 multispectral bands over a part of coastal Alabama:
Taken a step further and plotting the brightness of each pixel in two bands, a scatterplot, is created:
PCA looks at that scatter plot and says, “Why do we need two bands to describe each pixel, when we could use one number and get most of the information?” So, a new axis is drawn through the average and along the longest axis of the cloud of data points, then all the pixels are scored (shortest distance) on that new axis. That’s the First Principal Component. A second axis is drawn perpendicular to the first, also through the average, to capture remaining information. Roughly, it would look like this:
You always end up with as many Principal Components as bands you started with. While we can’t draw in 4 (or more) dimensions, creating those axes works the same. In the case of Landsat TM data we get 6 Principal Components.
There are several very good reasons why you would go to all this effort. First, because PCA packs as much independent information as possible in to the components, the first ones have the most information. This means you can make an RGB display of the first three PCA bands and have an image containing the maximum amount of information you can put on the screen at one time. In the case of our Alabama Landsat scene, we go from a scene that has a lot of information but can be hard to interpret:
To a PCA composite that maximizes the amount of information and visual separation of what’s going on in the image. Here’s what we get when we put the first three PCA bands in to an RGB composite:
The image content shows up much more distinctly because PCA is packing as much signal as possible in to those three bands. You can see this with the Eigenvalue plot that gets generated when you run PCA:
The short story on the plot is that high eigenvalues (y-axis) mean lots of information in the PCA band (x-axis). Here we’re really not getting much after about the third component. This brings up a second benefit of PCA: “reducing data dimensionality”. We can get almost all of the information from a 6 band image in just 3 well-crafted PCA bands. This reduces data processing, especially with hyperspectral data, taking you from hundreds of bands to tens of bands.
With most content in the first 3 bands, a third benefit of PCA appears, de-noising. Those later bands are mostly noise or faint signal indistinguishable from noise. Note that I did not say they are only noise. They are worth a look.Some interesting sensor artifacts reside in the 5th and 6thPCA bands from our Landsat scene. There is some signal, but a grid pattern appears in the otherwise noisy-looking image, artifacts of the sensor and processing:
PCA helps us get as much information as possible from our data and make it as easy to view as possible. With more advanced work, we could use it for noise filtering or diagnosing sensor problems. But we can build onPCA, which brings us to our second data transform.
MNF, which is Minimum Noise Fraction or Maximum Noise Fraction in various publications, is two PCA transforms in a row. One of them is based on the data statistics, just like PCA, but the other one is based on noise statistics. Using the same idea of drawing our new component axes to maximize when and how we catch signal, but doing it with an eye towards the noise information, MNF does a better job of pushing signal to the first components and noise to the later ones. It is more work, but worth it for the same reasons PCA is a good idea. Here are the first 3 MNF components in an RGB composite:
MNF improves on PCA by doing two transforms and including information about noise. Independent Components Analysis (ICA) improves on it by examining that assumption about our normal distribution of data, all the way back in our first graph. We can see those curves aren’t ideal normal distributions. Perfect bell curves don’t usually happen. ICA accounts for that messiness, or clumping in the data. It looks at more advanced statistics than just the variance when it draws new axes. The results are great for filtering signal and noise. Same scene, first three ICA components:
Capturing and including some of that more subtle signal can make the image harder to interpret than the distinct colors of our MNF results,but it is often an improvement for further processing.
The next time you’re trying to pull information out of an image, give transforms a try. You can get more information on screen, clean up noise, reduce data volumes, and maximize results in further processing. Best of all, you don’t have to be an expert in math and stats to use transforms!
Tags: Image Processing, Landsat, data processing, PCA, data transforms
2014 is a big year for Earth observation satellite launches. It’d be hard to pick a favorite from the many missions, but perhaps the most unusual one is due to fly this Thursday. Sentinel-1 is an imaging Synthetic Aperture Radar (SAR) mission. Though SAR offers a lot to the earth science community, as well as commercial and defense users, there are relatively few sensors compared to the choices for optical imagery.
It’s not too surprising. SAR is more difficult to work with, from a user’s perspective, than optical imagery. Optical sensors can rely on the sun for their illumination source, but SAR sensors have to provide their own. This means SAR sensors are heavier and consume much more power, making them more difficult to launch, shortening the spacecraft’s life, and increasing the difficulty of the aerospace engineering part of the mission. These factors drive up costs, making SAR data harder to find and more expensive than traditional imagery.
That is, until now. The European Space Agency (ESA) has an ambitious environmental monitoring earth science program, Copernicus,which will eventually include 6 major earth observing satellites. Day and night, over land, ice, and ocean, Sentinel-1 will be providing high-quality SAR imagery data. The real top feature, however, is that ESA will be making the data freely available to all.
This has never been done before. We’ve had successful C-band SAR missions before (ERS-1, ERS-2, ENVISAT, and Radarsat). And we’ve had wildly successful open earth imaging data missions (Landsat, among others). But there hasn’t ever been free-to-all SAR data. The explosion of research and discoveries when the Landsat archives were made available has shown us that open access to earth science data is a clearly superior model. I can hardly wait to see what progress and discoveries are made when Sentinel data come online!
Sentinel-1 will launch Thursday. If you haven’t pre-registered for data access, do it for free here. Another highly anticipated SAR mission, PALSAR-2 / ALOS-2, will launch May 24th. I plan on using the data to extend some of my natural hazards mapping and research. What’s on your radar? How will you be making this new level of SAR data work for you?
Tags: SAR data, European Space Agency, SAR
For some reason, water has been a big theme for me lately. Two weeks ago, it was a snow storm causing problems for a class I was teaching in Virginia and delaying my flight home to Colorado. Then I got a bunch of followup work for the arctic coastal erosion and bathymetric projects I’ve been working on. The Winter Olympics started, and brought the usual concerns about snow quantities. Now, I’m back in the DC area again this week to teach another class, and sure enough another snow storm is forecast to make a mess of roads and air traffic.
Precipitation, like most of nature, has a habit of following its own rules and systems which are at best loosely coupled to what we’d like to see. We get too much in some places, and not enough in others. But one project I get to work on in a small way promises to help us work with the water we have a lot more effectively. The first step in understanding an earth system is getting a decent map of it, and that’s not particularly easy. There have been some great earlier missions to develop and test the technology, like TRMM. The new missions, SMAP and GPM, however, will give us frequent global maps of where precipitation is falling,and where that water goes when it hits the ground. My little contribution is to make sure we can help get that data on screen in the ways scientists and end users want. When I get some more of the code finished, I’ll post it as a blog on making use of global data systems through HDF5 and map routines in IDL. But for now, here’s a sneak peek of where I’m at:
There aren’t many geospatial fields that don’t have a heavy dependency on precipitation and water. How will you use the new data from precipitation and soil moisture missions?
Tags: GIS, Environmental Science, geospatial, weather forecasting, data analysis, Climate observation, precipitation
Online access will never be able to replace traditional education and collaboration. But traditional classrooms and labs desperately need the versatility and value online tools can offer. Like so many other things, it's not either-or. It's about finding the right combination.
There are a lot of things I like about living in Boulder, like the mountains, the weather, and so on. But a lot of its great features, like being a real center of action for science, are because it is a university town, with CU Boulder right in the center of things. It’s the first week of classes for spring semester here, so all of a sudden there are a lot more people in town and most of them are very excited to start a new year, new semester, and new classes. They’re probably also excited about all the fresh powder for skiing, too, but on the work side of things people are definitely ready to get going. There will even be a federal budget, for the first time in years, to help get science moving again.
While there’s no substitute for face-to-face, in-person interactions, it’s not always possible to have everyone in one place. Often a researcher or student in one location will come up with something that needs to be shared with another location or with the wider community, or even publicly. Sharing research, discoveries, data, tools, and education online can be an important part of getting the most out of science and making it available to as many people as possible.
Remote learning and networking isn’t new. Correspondence courses date back as far as1840. Granted, it’s a lot faster with a decent laptop and connection than with envelopes and stamps, but it’s the same thing, just with a different delivery system. Cloud-based learning, enterprise training systems, and MOOCs are the latest incarnations and are great improvements over previous generations. They’re not a magic replacement for other means of research,learning, and sharing. They do have their problems. Campuses and offices aren’t going to go away, technology hype and marketing notwithstanding. Online and enterprise applications for education and science are a fantastic additional venue and tool, however, and do offer some unique advantages.
Does anyone else remember having to buy an expensive textbook for one class in one semester, maybe not even in your major and then getting only a small fraction back on re-sale, if you could sell it at all? Software was even like that for awhile, with students buying full commercial packages at full retail price. Students are an amazing, smart, diverse group but the adjective “wealthy” applies to very, very few in academia. The retail, permanent sale model doesn’t win much for vendors, does very little for academics, and prices some promising students out of the market before they can even start. That isn’t good for anyone.
The latest generation of tools for online collaboration in education and research can fill many of the previously too-expensive needs. Online access to a set of software analyses and some datasets can be provided to all the students in many classes. Students can buy one piece of tech, such as a laptop or tablet, and access information and resources as necessary per class per semester. Access to the tools can be set up with any combination of per class, department, campus, university, semester or year, and logins limited to students only, addressing the concerns of vendors about giving away value outside of the research or educational setting. Faculty and staff would only need to coordinate purchasing access to a site rather than administering a lab full of machines and software. And students wouldn’t be stuck with expensive resources that they used for four months and then never again.
It’s not a panacea. The advanced researchers, students, and professionals will still always need their data on their machines with their tools, and the majority of the hard work of research, teaching, and learning will be in classrooms on campuses. There’s no complete substitute for that as we have seen again and again over the years. Anyone who’s suffered through 8 hours of an online power point for some corporate “training certification” will attest to that. But the sort of apps and online access available now can save a lot of money, create future business and customers for vendors, and make science and learning available to many more than before.
I’m thinking of revisiting some of the courses and materials I’ve taught over the last 15 to 20 years, and packaging up some of the key parts as online apps. The first one I’d like to do is an exploration of different approaches to processing multispectral and hyperspectral data for land cover and material identification. What apps or lectures or exercises would you want to see? I’m hoping some of my friends and colleagues will give them a whirl in their classes and labs, especially here at CU Boulder. And then I’ll see what they think, when we meet over beer at one of our great local breweries. Because some things just have to be done in person.
Tags: GIS, Academic, EDU, Remote Sensing, Hyperspectral, Apps, multispectral
Sign Up for News & Updates: Stay informed with the latest news, events, technologies and special offers.