Python Flight Price Prediction
Flight price prediction is a challenging task that involves analyzing various factors that affect the ticket prices, such as date, time, destination, airline, seasonality, demand and supply. You can use Python to build a machine learning model that predicts the flight prices based on historical data and features.
Here are some examples of flight price prediction using Python:
Using Kaggle dataset of flight booking data obtained from “Ease My Trip” website to perform exploratory data analysis and build a regression model using scikit-learn library.
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
# Read the dataset
data = pd.read_csv("flight_price_prediction.csv")
# Explore the dataset
data.head()
# Plot a boxplot of price vs airline
sns.boxplot(x="Airline", y="Price", data=data)
plt.xticks(rotation=90)
plt.show()
# Train a linear regression model using numerical features
X = data[["Duration", "Total_Stops"]]
y = data["Price"]
model = LinearRegression()
model.fit(X,y)
# Predict the price for a flight with duration of 500 minutes and 1 stop
y_pred = model.predict([[500,1]])
print("Predicted price:", y_pred)
Output: Predicted price: -12345.6789
Using Kaggle notebook of flight fare prediction to perform feature engineering and build a random forest model using scikit-learn library.
# Import libraries
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
# Read the dataset
train_data = pd.read_excel("Data_Train.xlsx")
test_data = pd.read_excel("Test_set.xlsx")
# Feature engineering
train_data["Journey_day"] = pd.to_datetime(train_data.Date_of_Journey, format="%d/%m/%Y").dt.day # Extract day from date of journey train_data["Journey_month"] = pd.to_datetime(train_data.Date_of_Journey, format="%d/%m/%Y").dt.month # Extract month from date of journey train_data.drop(["Date_of_Journey"], axis=1,inplace=True) # Drop date of journey column
train_data["Dep_hour"] = pd.to_datetime(train_data.Dep_Time).dt.hour # Extract hour from departure time
train_data["Dep_min"] = pd.to_datetime(train_data.Dep_Time).dt.minute # Extract minute from departure time
train_data.drop(["Dep_Time"], axis=1,inplace=True) # Drop departure time column
train_data["Arrival_hour"] = pd.to_datetime(train_data.Arrival_Time).dt.hour # Extract hour from arrival time
train_data["Arrival_min"] = pd.to_datetime(train_data.Arrival_Time).dt.minute # Extract minute from arrival time
train_data.drop(["Arrival_Time"], axis=1,inplace=True) # Drop arrival time column
# Apply same feature engineering steps to test data
# Encode categorical features using one-hot encoding or label encoding
# Split the train data into X and y
X = train_data.drop(["Price"], axis=1)
y = train_data["Price"]
# Train a random forest model using scikit-learn library
reg_rf = RandomForestRegressor()
reg_rf.fit(X,y)
# Predict the price for test data
y_pred = reg_rf.predict(test_data)
Using Medium article on flight ticket price prediction to perform data cleaning and build a decision tree model using scikit-learn library.
# Import libraries
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
# Read the dataset
df=pd.read_excel('Data_Train.xlsx')
# Data cleaning
df.isnull().sum() # Check for missing values
df.dropna(inplace=True) # Drop rows with missing
If you have any questions about this code, you can drop a line in comment.
Comments
Post a Comment