First off, the concept of finding a least-squares fit can be understood simply as finding a linear function that most closely fits a set of points in (x,y) where for all (x,y), the sum of the squares of the differences of y=f(x) to the corresponding point (which is to say the errors) is a minimum.
This turns out to be something natural to solve using matrices, because the set of errors which is y - (mx + b) for all y's and x's turns out to be just a set of matrices:
[y0] - (m[x0] + b) = [e0]
[y1] - (m[x1] + b) = [e1]
. . . .
. . . .
[yn] - (m[xn] + b) = [en]
Some more links:
An example of a C program to compute standard deviation: http://alstatr.blogspot.com/2013/09/cc-variance-and-standard-deviation.html
An example of C code for computing the mean (as well as median and mode): http://stackoverflow.com/questions/20636914/c-mean-median-and-mode
A fairly obtuse presentation of least-squares-fitting basic concepts: http://mathworld.wolfram.com/LeastSquaresFitting.html