I’m working on a classification problem where one class heavily outweighs the others (around 90:10 ratio). My model is achieving high accuracy, but it’s clearly biased toward the majority class.
Here’s a simplified version:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
Accuracy looks good, but recall and precision for the minority class are poor.
What I want to understand:
- What are the best techniques to handle imbalance (SMOTE, class weights, etc.)?
- When should I prefer resampling vs adjusting model parameters?
- Which evaluation metrics should I focus on in such cases?
Would appreciate practical advice based on real-world experience.
