Many common software questions are answered in the categories below. If you can’t find a response for yours, try checking our User Forum, or, contact HelpDesk for direct assistance
|Metrix » Tuning|
This is difficult to put into non-mathematical language. I will do my best to keep it simple.
While the R is an index of the entire regression, the t test statistic is an index of the significance of each individual independent variable. For example, if we are tuning an electric meter to both cooling degree days and a user variable such as occupancy, the t test statistic tells whether each variable is a significant predictor of electric consumption. In statistical terminology, t is the ratio of the regression coefficient for the variable to the standard error of the variable. As a rule of thumb, if t < 2.0, that variable should not be used in the tuning regression.
The R2 value has to do with the explainability of the fit. That is, given an R2 value of 90%, 90% variation of the dependent variable (kWh or therms) can be explained by weather data and other variables considered in the regression and 10% is unexplainable.
Suppose rather than monthly bills, you got weekly electricity bills. You would then have 52 bills per year, and consequently, 52 points with which to make a regression. Suppose, you chose 12 at random and got a good R2 value and the following regression equation:
kWh = 0 * #Days + 200 * #CDD (notice in this simple example we assume there is no base load)
For this discussion, let's call 200 coefficient2 (we are not concerned with coefficient1 (0) here). Also for now lets assume that the first coefficient is always zero.
Now suppose you chose 12 other points at random. You would probably get a different set of coefficients. Suppose you continued regressing 12 points at random until you had 1000 combinations of 12 points. Each time you would generate a different set of coefficients. If you made a histogram (bar graph) of the number of times you got each different variable, chances are that you would have a bell curve.
Sixty seven percent of the points are located within a given distance from the center of the bell curve. This distance is called the standard deviation. So if the coefficient in the center of the bell curve was 200, and 67% of the values were within the range of 185 to 215, then 15 would be the standard deviation. This is also called the error of the estimate.
The T statistic is simply the coefficient (200) in the center of the bell curve divided by the standard deviation (15). So in this case we would have a T statistic of 13.3.
Suppose there were 100 CDD in a billing period. So given the above regression, for this particular bill, there is a 67% chance that the actual bill kWh will fall within 15 kWh * 100 CDD of the line (that is, within 1500 kWh of the line).
If you have a baseload, there is an error associated with that too. And if you are regressing based upon more than one coefficient, for example HDD and CDD, then there is an error associated with each of the coefficients. It's unfortunately quite complex. On top of that, the errors don't just add together to get a total error, but this topic is best avoided here.
To sum it up in few words, for the simplified case (where you have no base load, and regress based upon only one coefficient), if you divide the coefficient by the T statistic, you will get the standard deviation (σ). You have a 67% chance that:
(coefficient - σ) * CDD < actual billed kWh < (coefficient + σ) * CDD.
For example, if your coefficient was 200 (as in the above regression equation) and if your T-Statistic was 2, then you would have a 67% chance that the actual billed kWh falls between 100 and 300 kWh. If using the same coefficient, your T-statistic was 10, then you would have a 67% chance that the actual billed kWh falls between 180 and 220 kWh.
(* σ = "Sigma")
I can’t find the answer to your question ? Try checking our User Forums.
Otherwise feel free to contact our Tech Support staff at (805) 329-6565, or via email at firstname.lastname@example.org.