Return to Website

Number Watch Web Forum

This forum is about wrong numbers in science, politics and the media. It respects good science and good English.

Number Watch Web Forum
Start a New Topic 
Author
Comment
Averaging -- BEST Data

A different thread on the same subject. In school, one of the ideas we had to explore was deciding the periodicity of data acquisition. If you sampled the data every second, that made for 3600 data points an hour. Before you knew it you were attempting to store more data than the storage systems of the time could handle. You had to place limits using reason and experimentation. Set up the experiment for several days record and see whether important data would be lost if you lengthened the interval. We still need to do this today. If you visit element14.com, sparkfun.com, jameco.com, or makershed.com you will discover that the amateur enthusiast can with a small amount of money, instrument and data log himself into terrabytes of data in a very short time. The sample rate on an Arduino pin is in the millisecond range. 16 pins on an arduino due (IIRC). For less than $300, you can be sampling room temps, light levels, sound levels, barometric pressure, global Position, acceleration in three axes, and recording it to a data file in millisecond intervals. We still need someone like our bending author to thwap us across the wrists when we gleefully start saving billions of points an hour just because we can....

The flip side to this though is I can save 1 data point per day for 40,000 locations without even a hiccup. It doesn't take excessive resources anymore to analysis trillion record datasets. You can get that level of resource for < $150 / month.

Each recording of a thermometer is a discreet indication of the conditions of the location of the instrument. There is an error associated with that measurement. At the same time, that instrument recorded that value. Store it. Database it. Move on. The data engine can handle the data amalgamation to an interim table if you need it. The primary data should still be there inviolate in it unadulterated form. I should be able to access the daily historical record of any station. We have the tools available to us to make that happen.

Part of the reason we plot averages is that attempting to plot 1 Trillion points of data by hand or with Excel is not really feasible. The data is the data though. The measured point is what was measured. Good or bad. It should still be there in the set.

The computer is here to make it so that we don't have to be as finicky. We can save more data. We can process the data in more ways. There should always be a path back to the raw data. Staples (office supply store) had a bin filled with 4GB thumb drives for $4.99. Disk space for daily data, even hourly data is no longer an issue. Plotting the data discreetly is also possible.

There are reasons to look at averages. There is also a reason to have the raw data staring you in the face to keep you grounded..