How do you add new features in a scikit-learn pipeline with a ColumnTransformer?

Unfollow Follow

James Benett

Updated on April 10, 2026 in

I came across this pipeline setup where feature engineering is being added before a ColumnTransformer, but the new features don’t seem to flow correctly through the pipeline:

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin

class FeatureAdder(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        X['new_feature'] = X['col1'] * X['col2']
        return X

pipeline = Pipeline([
    ('feature_add', FeatureAdder()),
    ('preprocess', ColumnTransformer([
        ('num', StandardScaler(), ['col1', 'col2']),
        ('cat', OneHotEncoder(), ['col3'])
    ]))
])

The issue is:

The newly created new_feature is not included in the ColumnTransformer
This leads to it being dropped during transformation

In a setup like this:

Should the ColumnTransformer be dynamically updated to include new features?
Or is it better to handle feature engineering outside the pipeline altogether?
How do you ensure feature consistency without breaking pipeline modularity?

<p>I came across this pipeline setup where feature engineering is being added before a ColumnTransformer, but the new features don’t seem to flow correctly through the pipeline:</p>
<pre><code class="language-python">from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin

class FeatureAdder(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        X['new_feature'] = X['col1'] * X['col2']
        return X

pipeline = Pipeline([
    ('feature_add', FeatureAdder()),
    ('preprocess', ColumnTransformer([
        ('num', StandardScaler(), ['col1', 'col2']),
        ('cat', OneHotEncoder(), ['col3'])
    ]))
])
</code></pre>
<p>The issue is:</p>
<ul>
<li>
<p>The newly created <code>new_feature</code> is not included in the ColumnTransformer</p>
</li>
<li>
<p>This leads to it being dropped during transformation</p>
</li>
</ul>
<p>In a setup like this:</p>
<ul>
<li>
<p>Should the ColumnTransformer be dynamically updated to include new features?</p>
</li>
<li>
<p>Or is it better to handle feature engineering outside the pipeline altogether?</p>
</li>
<li>
<p>How do you ensure feature consistency without breaking pipeline modularity?</p>
</li>
</ul>

Cancel

0
140
2 months ago
0

Reply

Write your reply here to join the conversation

YOUR PREVIEW

Avatar