Why does my neural network overfit despite using dropout and early stopping?

Unfollow Follow

Tariq

Updated on March 17, 2026 in

I’m training a simple deep learning model, but it still overfits even after applying dropout and early stopping. Training accuracy is high, but validation performance drops.

import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
layers.Dense(128, activation=‘relu’, input_shape=(20,)),
layers.Dropout(0.5),
layers.Dense(64, activation=‘relu’),
layers.Dense(1, activation=‘sigmoid’)
])

model.compile(optimizer=‘adam’,
loss=‘binary_crossentropy’,
metrics=[‘accuracy’])

history = model.fit(X_train, y_train,
validation_data=(X_val, y_val),
epochs=50,
batch_size=32)

What are the common reasons this still happens in practice, and how can it be mitigated beyond basic regularization?

<div class="relative basis-auto flex-col -mb-(--composer-overlap-px) [--composer-overlap-px:28px] grow flex">
<div class="flex flex-col text-sm pb-25">
<article class="text-token-text-primary w-full focus:outline-none [--shadow-height:45px] has-data-writing-block:pointer-events-none has-data-writing-block:-mt-(--shadow-height) has-data-writing-block:pt-(--shadow-height) [&:has([data-writing-block])>*]:pointer-events-auto scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" data-turn-id="request-698d9bbb-4f20-8323-9b61-f2b445b5e95b-3" data-testid="conversation-turn-408" data-scroll-anchor="true" data-turn="assistant">
<div class="text-base my-auto mx-auto pb-10 [--thread-content-margin:var(--thread-content-margin-xs,calc(var(--spacing)*4))] @w-sm/main:[--thread-content-margin:var(--thread-content-margin-sm,calc(var(--spacing)*6))] @w-lg/main:[--thread-content-margin:var(--thread-content-margin-lg,calc(var(--spacing)*16))] px-(--thread-content-margin)">
<div class="[--thread-content-max-width:40rem] @w-lg/main:[--thread-content-max-width:48rem] mx-auto max-w-(--thread-content-max-width) flex-1 group/turn-messages focus-visible:outline-hidden relative flex w-full min-w-0 flex-col agent-turn">
<div class="flex max-w-full flex-col gap-4 grow">
<div class="min-h-8 text-message relative flex w-full flex-col items-end gap-2 text-start break-words whitespace-normal [.text-message+&]:mt-1" dir="auto" data-message-author-role="assistant" data-message-id="8a304d4e-529e-4350-8670-472937a4d7e5" data-message-model-slug="gpt-5-3">
<div class="flex w-full flex-col gap-1 empty:hidden">
<div class="markdown prose dark:prose-invert w-full wrap-break-word light markdown-new-styling">
I’m training a simple deep learning model, but it still overfits even after applying dropout and early stopping. Training accuracy is high, but validation performance drops.
<div class="relative w-full mt-4 mb-1">
<div class="">
<div class="relative">
<div class="h-full min-h-0 min-w-0">
<div class="h-full min-h-0 min-w-0">
<div class="border border-token-border-light border-radius-3xl corner-superellipse/1.1 rounded-3xl">
<div class="h-full w-full border-radius-3xl bg-token-bg-elevated-secondary corner-superellipse/1.1 overflow-clip rounded-3xl lxnfua_clipPathFallback">
<div class="pointer-events-none absolute inset-x-4 top-12 bottom-4">
<div class="pointer-events-none sticky z-40 shrink-0 z-1!">
<div class="sticky bg-token-border-light"> </div>
</div>
</div>
<div class="">
<div class="relative z-0 flex max-w-full">
<div id="code-block-viewer" class="q9tKkq_viewer cm-editor z-10 light:cm-light dark:cm-light flex h-full w-full flex-col items-stretch ͼ5 ͼj" dir="ltr">
<div class="cm-scroller">
<div class="cm-content q9tKkq_readonly">import tensorflow as tf from tensorflow.keras import layers, models
model = models.Sequential([ layers.Dense(128, activation=‘relu’, input_shape=(20,)), layers.Dropout(0.5), layers.Dense(64, activation=‘relu’), layers.Dense(1, activation=‘sigmoid’) ])
model.compile(optimizer=‘adam’, loss=‘binary_crossentropy’, metrics=[‘accuracy’])
history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=50, batch_size=32)</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="">
<div class=""> </div>
</div>
</div>
</div>
</div>
What are the common reasons this still happens in practice, and how can it be mitigated beyond basic regularization?
</div>
</div>
</div>
</div>
</div>
</div>
</article>
</div>
</div>

Cancel

Deep Learning

1
175
2 months ago
0

Write your reply here to join the conversation

YOUR PREVIEW

Avatar

Subscriber

Naomi Teng on March 29, 2026

Overfitting can still happen even with dropout and early stopping if the model capacity is too high or the data is limited.

You can combine multiple strategies instead of relying on just those two:

import tensorflow as tf
from tensorflow.keras import layers, models, regularizers

model = models.Sequential([
    layers.Dense(128, activation='relu', 
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.BatchNormalization(),
    layers.Dropout(0.5),

    layers.Dense(64, activation='relu', 
                 kernel_regularizer=regularizers.l2(0.001)),
    layers.BatchNormalization(),
    layers.Dropout(0.5),

    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

early_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    batch_size=32,
    callbacks=[early_stop]
)

Key things added here:

L2 regularization to penalize large weights
Batch normalization for more stable learning
Reduced layer size to limit model complexity

Also worth checking:

Data leakage between train and validation
Feature quality and noise
Whether your dataset is large enough for the model

Sometimes the fix isn’t more regularization, it’s a simpler model or better data.

Liked by

Overfitting can still happen even with dropout and early stopping if the model capacity is too high or the data is limited. 
You can combine multiple strategies instead of relying on just those two: 
import tensorflow as tf 
from tensorflow.keras import layers, models, regularizers 
 
model = models.Sequential([ 
 layers.Dense(128, activation='relu', 
 kernel_regularizer=regularizers.l2(0.001)), 
 layers.BatchNormalization(), 
 layers.Dropout(0.5), 
 
 layers.Dense(64, activation='relu', 
 kernel_regularizer=regularizers.l2(0.001)), 
 layers.BatchNormalization(), 
 layers.Dropout(0.5), 
 
 layers.Dense(1, activation='sigmoid') 
]) 
 
model.compile(optimizer='adam', 
 loss='binary_crossentropy', 
 metrics=['accuracy']) 
 
early_stop = tf.keras.callbacks.EarlyStopping( 
 monitor='val_loss', 
 patience=5, 
 restore_best_weights=True 
) 
 
history = model.fit( 
 X_train, y_train, 
 validation_data=(X_val, y_val), 
 epochs=100, 
 batch_size=32, 
 callbacks=[early_stop] 
) 
 
Key things added here: 
<ul> 
<li> 
L2 regularization to penalize large weights 
</li> 
<li> 
Batch normalization for more stable learning 
</li> 
<li> 
Reduced layer size to limit model complexity 
</li> 
</ul> 
Also worth checking: 
<ul> 
<li> 
Data leakage between train and validation 
</li> 
<li> 
Feature quality and noise 
</li> 
<li> 
Whether your dataset is large enough for the model 
</li> 
</ul> 
Sometimes the fix isn’t more regularization, it’s a simpler model or better data.

Cancel