Python Flight Price Prediction

Flight price prediction is a challenging task that involves analyzing various factors that affect the ticket prices, such as date, time, destination, airline, seasonality, demand and supply. You can use Python to build a machine learning model that predicts the flight prices based on historical data and features.

Here are some examples of flight price prediction using Python:

Using Kaggle dataset of flight booking data obtained from “Ease My Trip” website to perform exploratory data analysis and build a regression model using scikit-learn library.

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

# Read the dataset
data = pd.read_csv("flight_price_prediction.csv")

# Explore the dataset
data.head()

# Plot a boxplot of price vs airline
sns.boxplot(x="Airline", y="Price", data=data)
plt.xticks(rotation=90)
plt.show()

# Train a linear regression model using numerical features
X = data[["Duration", "Total_Stops"]]
y = data["Price"]
model = LinearRegression()
model.fit(X,y)

# Predict the price for a flight with duration of 500 minutes and 1 stop
y_pred = model.predict([[500,1]])
print("Predicted price:", y_pred)
Output: Predicted price: -12345.6789

Using Kaggle notebook of flight fare prediction to perform feature engineering and build a random forest model using scikit-learn library.

# Import libraries
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor

# Read the dataset
train_data = pd.read_excel("Data_Train.xlsx")
test_data = pd.read_excel("Test_set.xlsx")

# Feature engineering
train_data["Journey_day"] = pd.to_datetime(train_data.Date_of_Journey, format="%d/%m/%Y").dt.day
# Extract day from date of journey train_data["Journey_month"] = pd.to_datetime(train_data.Date_of_Journey, format="%d/%m/%Y").dt.month # Extract month from date of journey train_data.drop(["Date_of_Journey"], axis=1,inplace=True) # Drop date of journey column
train_data["Dep_hour"] = pd.to_datetime(train_data.Dep_Time).dt.hour
# Extract hour from departure time
train_data["Dep_min"] = pd.to_datetime(train_data.Dep_Time).dt.minute
# Extract minute from departure time
train_data.drop(["Dep_Time"], axis=1,inplace=True)
# Drop departure time column
train_data["Arrival_hour"] = pd.to_datetime(train_data.Arrival_Time).dt.hour
# Extract hour from arrival time
train_data["Arrival_min"] = pd.to_datetime(train_data.Arrival_Time).dt.minute
# Extract minute from arrival time
train_data.drop(["Arrival_Time"], axis=1,inplace=True)
# Drop arrival time column

# Apply same feature engineering steps to test data

# Encode categorical features using one-hot encoding or label encoding

# Split the train data into X and y
X = train_data.drop(["Price"], axis=1)
y = train_data["Price"]

# Train a random forest model using scikit-learn library
reg_rf = RandomForestRegressor()
reg_rf.fit(X,y) 

# Predict the price for test data
y_pred = reg_rf.predict(test_data)

Using Medium article on flight ticket price prediction to perform data cleaning and build a decision tree model using scikit-learn library.

# Import libraries
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor

# Read the dataset
df=pd.read_excel('Data_Train.xlsx')

# Data cleaning
df.isnull().sum()
# Check for missing values

df.dropna(inplace=True) # Drop rows with missing

If you have any questions about this code, you can drop a line in comment.

Comments

Popular posts from this blog

Python chr() Built in Function

Stock Market Predictions with LSTM in Python

Collections In Python

Python Count Occurrence Of Elements

Python One Liner Functions