The nuances that are caused by changing components aren't properly recognized until you have about 20 samples at a bare minimum. Even that data is subject to be off enough that I wouldn't compare it to other data with confidence.
For example, I recently tested .223 bolt gun loads with Fed 205, CCI 400, and CCI BR4 primers, keeping everything else the same and shooting 35 shots of each. Velocity averages varied slightly but SD and ES were negligibly different. What is cool, however, is to watch various tests progress with a "running" average and SD plotted in Excel. You see a TON of noise up to about 25-30 samples, then everything starts to level out-- much slower movement.
It's not what anyone wants to hear, but if I pluck any of the sampling I've done at 5 shots, the results are ENTIRELY inconclusive as to what the final result will be at 35+ shots.