This is the kind of reason why spectrograms and similar processes take windows into the data.
To prevent artifacts where the windows join, typically overlapping windows are used.
For some kinds of processes, 50% overlap is used -- so for example for [1 2 3 4 5 6], one window would be [1 2 3 4], then the second window would be [3 4 5 6].
For other kinds of processes, a 10% overlap is common.
For audio, instead of a fixed size of overlap, it sometimes make sense to calculate the overlap based upon a particular time. To make up a number, there might be certain cases where some kinds of distortions tend to become perceptible around 5 milliseconds, so the overlap might be chosen in terms of the number of samples that fit 5 milliseconds.
The size of the window, together with the sampling frequency, will determine the frequency resolution.
If you have 48000 samples, then you just might be working with one second of sound at 48000 samples per second. 48000 is one of the "magic numbers" in audio: 48000 samples per second gets used for some kinds of professional audio, such as DVDs, but 24000 samples per second is not nearly as likely to be used. Instead, near that range, 22050 samples per second is more likely, as that is CD quality.