How to handle imbalanced datasets effectively in classification problems?

Naomi Teng
Updated 6 hours ago in

I’m working on a classification problem where one class heavily outweighs the others (around 90:10 ratio). My model is achieving high accuracy, but it’s clearly biased toward the majority class.

Here’s a simplified version:

 
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

 

Accuracy looks good, but recall and precision for the minority class are poor.

What I want to understand:

  • What are the best techniques to handle imbalance (SMOTE, class weights, etc.)?
  • When should I prefer resampling vs adjusting model parameters?
  • Which evaluation metrics should I focus on in such cases?

Would appreciate practical advice based on real-world experience.

 
 
  • 0
  • 10
  • 6 hours ago
 
Loading more replies