Discussion about this post

User's avatar
Old Lady Katie's avatar

*SUPER* Interesting Analysis! I love this approach. While I don't think the value of *~vibes~* will ever be completely irrelevant to these distinctions, providing a quantitative classification like this should absolutely be a sanity check *at least* and it's great to finally see one!!

This was so exciting to me that I put on my journal club hat, and I have a few follow up questions. Apologies for going obnoxious academic mode!

1) Would love to know more about feature importance and see a list of all the features used in your PCA analysis. If game size/credit count are most important, how much less important was your third feature? Game length is quantitative and seems super relevant to classification from a *~vibes~* perspective, curious where that fell in feature importance? (I love the SHAP technique for feature importance, but that's me)

2) if k-means preferences spherical clusters, what if less-spherical clusters yield a better fit? That would likely mean using a different k-optimization score, but would be super curious to see-- can we identify a III cluster and prove that it's a real phenomenon?? If you used DBSCAN/Davies-Boudin instead of k-means/CH, do you get more clusters?

3) There is so much variance in log game size vs. credit count goes down-- totally makes sense, smaller game = optimization is less important and you have fewer people to do it. Is there potentially some additional meaning you could get out of that spread? Does a low credit count/high game size mean anything? Is there another feature that gets more important as the size/credit count correlation starts to fall apart?

4) now this one gets political-- I wonder if country of origin has a meaningful impact on this classification. Do countries with better labor laws require more people to make them, thus skewing them towards midi/AA when the vibe is more kei/midi? What if you encoded county of origin as its World Justice Index. I wonder if that has a meaningful negative correlation to the cluster number?

5) like other comments already say, would love a publically available dataset ❤️

Super excited to read about the game budget prediction model!!

Expand full comment
Chase's avatar

Will you be sharing this data to interactive graphs at some point? Curious to see where some games lie, particularly Stardew Valley, which may be the biggest Kei game of them all.

Expand full comment
5 more comments...