Umm, no, mixture of normals does not look like a normal distribution, except in degenerate cases. If you go to 80 pools (e.g. each pool has only one recipe) and have enough data to justify that, then you will see a multimodal distribution (multiple density peaks), with each peak corresponding to a different recipe probability. The only way it will look like a one-pool (regular normal model with one peak) is if all the underlying recipe probabilities are indeed the same. Which so far does not look particularly convincing.The more pools, the more it will look like the one-pool model.
It's true, but it is unlikely to have appreciable impact on the high-level results. There are too many recipes and too many samples already. Even if we expect bias in crafting selection (i.e. @PaNonymeB always crafts certain recipes and not the others), it may impact somewhat deemed probabilities of these recipes, but it won't change the fact that results so far do not support single probability for all recipes. The biggest and obvious problems here are significant skew (you'd expect almost symmetrical results by now), and the fact that expected mode/mean in this case - about 9 for the latest data - literally has a trough in histogram right at that area. So the exact opposite of what you'd expect there.One more thing to keep in mind, is that crafting a recipe past the time that the recipes refresh has an effect of lowering the probability for that recipe as it cannot appear among the remaining options while it is being crafted.
The biggest and obvious problems here are significant skew (you'd expect almost symmetrical results by now), and the fact that expected mode/mean in this case - about 9 for the latest data - literally has a trough in histogram right at that area. So the exact opposite of what you'd expect there.
Sure they do. The histograms that were plotted so far are not normalized for the number of observations, so the X axis numbers will drift upwards with more observations. You're looking at splits of 425 and 298, and total of 723 - of course they will produce different absolute sum numbers. After you normalize for the number of observations all 3 would look similar, and all have a trough at 0.0125 area (=1/80) where you'd expect the peak.Subtract the 425 recipe result from the 723 recipe result and examine the 298 recipe addition to the 425 original. The 298 is also heavily skewed, but many of its focuses and dearths do not line up with the 425 skews.
You're looking at splits of 425 and 298, and total of 723 - of course they will produce different absolute sum numbers.
To make that judgement, there is not nearly enough supporting evidence in these two samples.I am not looking at absolute sum, I am looking at which elements of the recipe result are favored and which are not. The 425 displays differing elements that have favoritism compared to the later 298.
To make that judgement, there is not nearly enough supporting evidence in these two samples.
OK, this is getting way out of scope for this forum, but this is incorrect. You may want to read up on cluster analysis and Gaussian mixture of normals. As much as I would like to take credit for that, these are established statistical techniques and not something that I invented. Clustering is used all other the place for classification in unsupervised contexts. And yes, there is recipe classification into pools, and it is also probabilistic. The probabilistic differences in classification between these two sample sizes are not statistically significant. So while we cannot answer with certainly which recipe belongs to each pool (for most recipes), we can observe that model with 2 or 3 pools (clusters) is a significantly better fit with existing data than 1 pool model. This uses standard information criteria such as AIC and BIC, which both penalize extra variables/pools to minimize overfitting. There is plenty of data to make that determination.When recipes that are thought more probable or more improbable are deemed the opposite in a later data collection, that puts even the assumption of multiple data pools on shaky grounds. And speculating a third and possibly more data pools from a small amount of data becomes useless extrapolation as the complexity of what is being suggested is not something that can be discovered in such a small amount of data.
There is plenty of data to make that determination.
Thanks. I started to collect this data as well, but it will take a while before mine adds up to something meaningful.Updated data :
Updated with my own data, so total is slightly more than 1,500 non-special recipes:Since probability seems not to be linked with rarity, I came back to a sorting by type :
- Coin rain % : 100 ×35, 50 ×21, 33 ×15, 25 ×22
- Supply Windfall % : 100 ×20, 50 ×21, 33 ×27, 25 ×29, 20 ×11, 15 ×12, 10 ×10, 5 ×6
- Portal Profit % : 20 ×5, 15 ×10, 10 ×8, 5 ×8
- Ancient Knowledge : 20 ×7, 15 ×13, 10 ×6, 7 ×11, 5 ×27, 3 ×23
- Ancient Knowledge costing Runeshards : 3 ×25, 10 ×8, 15 ×12
- Knowledge Points : 15 ×5, 10 ×12, 6 ×9, 3 ×14, 1 ×24
- Runeshards : 3 ×11, 2 ×12, 1 ×8
- Relics : marble ×20, steel ×21, planks ×18, crystal ×26, scrolls ×16, silk ×17, elixir ×28, dust ×19, gems ×30
- Spells : PoP ×23, EE ×30, MM ×26, IM ×24
- Pet food costing relics : elixir ×30, dust ×12, gems ×26, spell fragments ×14
- Royal Restorations : 10 ×24, 20 ×8, 30 ×8, 30(costing spell fragments) ×9
- Time boosts : 10m ×22, 15m ×19, 30m ×11, 45m ×9, 1h ×4, 2h ×9, 5h ×11, 8h ×9, 14h ×5, 20h ×7
- Culture/population buildings : rainbow flower cage ×3, lava codex ×24, unicorns : rainbow ×6, silver ×18, crystal ×20
- Military buildings : ELR ×12, MMM ×6, UUU ×7
- Traveling Merchants : I ×9, II ×7, III ×8
- Other buildings : festival merchant ×5, mana sawmill ×9, orc nest ×7, orc strategist ×10, vallorian valor ×10
- Special recipes (evolving buildings, chess set, artifacts) : 100
- Total : 1283 (average for non-special recipes : 14.8)