Discussion about this post

User's avatar
Old Lady Katie's avatar

*SUPER* Interesting Analysis! I love this approach. While I don't think the value of *~vibes~* will ever be completely irrelevant to these distinctions, providing a quantitative classification like this should absolutely be a sanity check *at least* and it's great to finally see one!!

This was so exciting to me that I put on my journal club hat, and I have a few follow up questions. Apologies for going obnoxious academic mode!

1) Would love to know more about feature importance and see a list of all the features used in your PCA analysis. If game size/credit count are most important, how much less important was your third feature? Game length is quantitative and seems super relevant to classification from a *~vibes~* perspective, curious where that fell in feature importance? (I love the SHAP technique for feature importance, but that's me)

2) if k-means preferences spherical clusters, what if less-spherical clusters yield a better fit? That would likely mean using a different k-optimization score, but would be super curious to see-- can we identify a III cluster and prove that it's a real phenomenon?? If you used DBSCAN/Davies-Boudin instead of k-means/CH, do you get more clusters?

3) There is so much variance in log game size vs. credit count goes down-- totally makes sense, smaller game = optimization is less important and you have fewer people to do it. Is there potentially some additional meaning you could get out of that spread? Does a low credit count/high game size mean anything? Is there another feature that gets more important as the size/credit count correlation starts to fall apart?

4) now this one gets political-- I wonder if country of origin has a meaningful impact on this classification. Do countries with better labor laws require more people to make them, thus skewing them towards midi/AA when the vibe is more kei/midi? What if you encoded county of origin as its World Justice Index. I wonder if that has a meaningful negative correlation to the cluster number?

5) like other comments already say, would love a publically available dataset ❤️

Super excited to read about the game budget prediction model!!

Expand full comment
Kaldrin's avatar

Very interesting!

I also feel like AAA of yesterday are not AAA of today. Today for example a small team could remake an Ocarina of Time with a vastly inferior budget, which I imagine can interfere with the dataset.

Expand full comment
12 more comments...

No posts