Q&A 7 How do you train a decision tree classifier?

7.1 Explanation

A decision tree is a simple yet powerful classification model that splits data based on feature values to make predictions. It’s easy to interpret and works well as a baseline model.

We’ll use the Titanic dataset with the cleaned and encoded features to predict the Survived outcome.


7.2 Python Code

# Train a Decision Tree Classifier in Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load and preprocess the data
df = pd.read_csv("data/titanic.csv")
df['Age'] = df['Age'].fillna(df['Age'].median())
df['Embarked'] = df['Embarked'].fillna(df['Embarked'].mode()[0])
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)

# Features and target
X = df[['Pclass', 'Sex', 'Age', 'Fare', 'Embarked_Q', 'Embarked_S']]
y = df['Survived']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
Accuracy: 0.7877094972067039

✅ Takeaway: Decision trees offer a quick way to test your features and modeling setup. They are interpretable and serve as a great baseline before trying more complex models.