Hi Joey,
I will confess to only having basic knowledge of this subject and suggest a thorough review of a good textbook. I'll try to give a brief outline.
Both periodogram and pwelch estimate "the power spectral density (PSD) of a wide-sense stationary random process." It's important to keep in mind that a random process has infinite duration into the past and into the future. In other words, it never decays to zero and stays zero, either in the past or future.
Let x[n] be a discrete time signal over -inf <= n <= inf.
The instantaneous power in the signal at time n is
The energy in the signal is
Energy is the sum of instantaneous power.
If x[n] are samples of a random process, it should be apparent that E = inf, because p[n] >=0 and there are an infinite number of samples where p[n] > 0 because x[n] is random.
The average power in x[n] is defined as
Note that if our random process is zero mean, the average power is basically the sample variance of all of the elements in our signal. We'll assume that our signal has finite average power, so a finite second moment (or variance if a zero mean process).
Of course, we cannot collect an infinite number of samples of our random process. But, because the process is wide sense stationary (WSS) we know, by definition, that it has a finite mean and second moment, and that, loosely speaking, the properties of the process are captured in and can be estimated from a finite length window, or snapshot, of samples of the process, and it doesn't matter where in time we take that snapshot. Of course, the more samples, the better. Both periodogram and pwelch are taking advantage of the WSS assumption to estimate the PSD of the random process based on a finite number of samples of the process.
The "average power" is a property of the underlying random process. It is not referring to any averaging in the estimation of the PSD. The estimate of the PSD is used to estimate the average power in the process, or the fraction of average power over a frequency range, based on a finite number of samples in the process.
Here's an example:
Generate a snapshot of samples of a random process
x = cos(pi/4*n)+randn(size(n));
Estimate the PSD and plot for comparison
[pxxper2,wper2] = periodogram(x,'centered');
[pxxwel2,wwel2] = pwelch(x,[],[],'centered');
plot(wper2,10*log10(pxxper2),wwel2,10*log10(pxxwel2))
The sample mean is small, suggesting we actually have a zero mean process.
Estimate the variance of the process. Because we assume WSS, adding additional previous or future samples shouldn't change this estimate very much as long as we collected enough samples in the first place.
Compute the power by integrating the PSD, which should be the variance, because the process is, or appears to be, zero mean.
[sum(pxxper2)*(wper2(2)-wper2(1)) sum(pxxwel2)*(wwel2(2)-wwel2(1))]
[trapz(wper2,pxxper2) trapz(wwel2,pxxwel2)]
This was only an outline. Again, I strongly recommend a good text book or at least a good set of course notes to get a full understanding of at least the concepts if not the details in the underlying math.
Will be interested to see others' perspectives on this Question.