Introduction

Today I’ll be walking through the famous Titanic - Machine Learning from Disaster competition on Kaggle.
This is my first-ever Kaggle competition, and I wanted to document everything I learned. I hope this helps anyone starting in a similar position.

If you haven’t already, check out the competition page:
👉 https://www.kaggle.com/competitions/titanic/overview

Downloading the Training Data
Preparing the Data
Setting Up the Parameters
Loss Function
Training a Linear Regression Model
Training a Neural Network
Final Thoughts

Downloading the Training Data

Download both train.csv and test.csv from the Kaggle data section:

👉 https://www.kaggle.com/competitions/titanic/data

Preparing Data

Start by loading the CSV into Excel.

Excel:
Data → Get Data → From File → From Text/CSV → select train.csv

Data Cleaning

“Garbage in, garbage out.”
Clean data is essential for model accuracy.

1. Remove Unnecessary Columns

Identifiers like PassengerId or Ticket do not help predict survival.

We will keep:

survived, pclass, age, sex, sibsp, parch, fare, embarked

Excel Action
Right-click a column → Delete → Table Columns

2. Remove Rows with Missing Values

We can see Age and Embarked contain blanks.
For simplicity, we remove rows with missing entries.

Excel Action
Filter columns → uncheck Blank

Data Scaling

Some features (like Fare) have much larger numerical ranges than others.
We scale them to prevent misleading the model.

Create new columns:

Column	Formula
Age_Ratio	=[@Age] / MAX([Age])
Fare_Log	=LOG10([@Fare] + 1)
Ones	=1

Excel Action:
Create the new scaled columns

One-Hot Encoding

Excel models cannot work with string categories.
We convert categorical data (Sex, Embarked, Pclass) into binary indicator variables.

Create the following columns:

Column	Formula
Male	=IF([@Sex]=“male”, 1, 0)
Pclass_1	=IF([@Pclass]=1, 1, 0)
Pclass_2	=IF([@Pclass]=2, 1, 0)
Embark_S	=IF([@Embarked]=“S”,1,0)
Embark_C	=IF([@Embarked]=“C”,1,0)
Ones	=1

Your table should look like this:

Action: Apply all the one-hot transformations

Setting Up the Parameters

Now for the fun part—creating trainable parameters.

Use: =RAND() * 2 - 1

This generates values in [-1, 1].

Action: Create the parameters table

Loss Function

We’ll use Mean Squared Error (MSE):

\text{MSE} = \frac{1}{n} \sum_{1}^{n} (y - \hat{y})^2

Where:

y = actual (Survived)
ŷ = model prediction

How Do We Compute the Prediction?

To calculate the prediction in a linear regression model, we multiply the input features by their respective weights (parameters).

Column	Formula
Linear Prediction	=SUMPRODUCT(data[@[SibSp]:[ones]], Params)
Loss	=(data[@Survived]-[@Prediction])^2
Average Loss	=AVERAGE(Calculations[Loss])

Training a Linear Regression Model

Now that we’ve set up everything, we can train the model.

In Excel, enable the Solver add-in and configure it as follows:

Set Objective → Average Loss
To → Min
By Changing Variables → Parameters
Non-Negative → Uncheck

Click Solve and watch the Average Loss decrease. Congratulations, you’ve just trained a linear regression model in Excel!

Training a Neural Network

To improve the model, we can extend it to a Neural Network by adding an additional layer and more neurons. This is similar to our linear regression model but with added complexity.

Here’s an illustration of the model architecture:

Figure 1: Neural Network Architecture

Adding a Second Neuron

After defining the network structure, we need to adjust our Excel setup to support multiple neurons, each with its own set of parameters and activation outputs.

Since our neural network contains two hidden neurons, expand the parameters table accordingly:

Add a second row of weights (one weight per input feature).
Include a second bias term.

A typical expanded parameter table looks like this:

Neuron	Weight 1	Weight 2	…	Weight N	Bias
1	w_11	w_12	…	w_1N	b_1
2	w_21	w_22	…	w_2N	b_2

Applying the Activation Function

Each neuron’s output must be passed through a sigmoid activation function, which compresses values into the 0–1 range:

\sigma(y) = \frac{1}{1 + e^{-y}}

In Excel, implement this as: =1/(1+EXP(-[Linear Output]))

Training the Neural Network

The loss function remains the same Mean Squared Error (MSE):

L = \frac{1}{n}\sum_{i=1}^{n}(y - \hat{y})^2

To train the full neural network in Excel, configure Solver:

Set Objective: Average Loss
To: Minimize
By Changing Variables:
- All weights for Neuron 1
- All weights for Neuron 2
- All bias terms
Non-Negative Constraint: Off (leave unchecked)

Click Solve, and Solver will iteratively optimize all parameters.

After training, your network output should resemble:

Final Thoughts

Working through this challenge brought from being completely unfamiliar with Machine Learning to being able to train a fully capable model. I’m simply mind blown that something like this can be done on Excel. I would highly recommend this kaggle challenge for anyone wanting to learn more about machine learning.

Download the Excel Workbook

Download Excel File

Introduction

Table of Contents

Downloading the Training Data

Preparing Data

Data Cleaning

1. Remove Unnecessary Columns

2. Remove Rows with Missing Values

Data Scaling

One-Hot Encoding

Setting Up the Parameters

Loss Function

How Do We Compute the Prediction?

Training a Linear Regression Model

Training a Neural Network

Adding a Second Neuron

Applying the Activation Function

Training the Neural Network

Final Thoughts

Download the Excel Workbook