# 365 Days of Climate Awareness 70 – Intro to Statistics

Statistics is the branch of math which deals with data. As a fluid dynamics professor of mine put it years ago, when reality is too complicated, and we can’t track every individual event, we resort to statistics. Theoretically we should be able to model the motion of every molecule in the ocean and atmosphere based on the principles of energy and momentum, but there are far too many molecules for any computer to calculate for them all. Whenever we can’t assess the full population of events, we take samples and apply statistics.

***

The science falls into two categories: descriptive, which summarizes data, and inferential, which draws conclusions from it. The first requirement for statistical methods to have value is to ensure that the data sample is as random as possible. Bias is a concept endemic to science, and refers to any influence on a data set due to the sampling and processing methods. Bias can appear in any method of gathering information, from designing a poll and writing the questions, to the design and setup of instruments for use in the field, to processing methods. Human error is very much a potential cause. Bias cannot be eliminated, but can be reduced. As far as possible the data gathering and processing methods must not affect the data itself.

Included in the concept of bias is removing all confounding variables, bringing the objects under observation down to one. When conducting a random poll, selecting the test population is paramount to not skew data toward one demographic group. When testing for the efficacy of a drug, a control group which did not take the drug is critical, to properly evaluate the drug’s effectiveness against no treatment at all. (And the “double blind” concept, where neither the patients nor the treating doctors know who received the drug or the placebo, is to prevent the doctor’s quality of care based on this knowledge becoming a confounding variable.) Bias also exists in physical measurements in the field, such as temperature, depth or humidity. Instruments are carefully calibrated to reduce as far as possible their inaccuracy.

The central limit theorem states that when the measurement methods are adequate, a number of samples will tend to coalesce around an average value, with a symmetrical spread out to both greater and lesser. The normal distribution, or “bell curve”, is the mathematical model for this, an estimate of the frequency of an average value and of the values around it. When data conforms, the normal distribution can be the basis for a number of mathematical tests.

Tomorrow: the bell curve.

Normal distribution

Be well!

This post was previously published on Dailykos.com.

***

## Join The Good Men Project as a Premium Member today.

All Premium Members get to view The Good Men Project with NO ADS.

A \$50 annual membership gives you an all access pass. You can be a part of every call, group, class and community.
A \$25 annual membership gives you access to one class, one Social Interest group and our online communities.
A \$12 annual membership gives you access to our Friday calls with the publisher, our online community.

### Register New Account

By completing this registration form, you are also agreeing to our Terms of Service which can be found here.