汽车价格的回归预测项目

发布于:2024-03-21 ⋅ 阅读:(78) ⋅ 点赞:(0)

注意:本文引用自专业人工智能社区Venus AI

更多AI知识请参考原站 ([www.aideeplearning.cn])

问题描述

汽车价格预测是一个旨在预估二手车市场中汽车售价的问题。这个问题涉及到分析各种影响汽车价格的因素,如品牌、车龄、性能参数等。准确的价格预测对于卖家定价和买家预算规划都非常重要。

项目目标

此项目的主要目标是开发一个预测模型,该模型能够根据汽车的各种特征准确预测其市场价值。这个模型应能处理不同类型的数据,包括数值数据和类别数据,并在预测准确度和计算效率之间取得平衡。

项目应用

  • 二手车交易:帮助买家和卖家了解特定车辆的公平市场价值。
  • 汽车评估:为汽车评估公司提供自动化的价值评估工具。
  • 市场分析:分析市场趋势,预测未来价值。
  • 个人决策支持:帮助个人用户在购买或出售汽车时做出更明智的决策。

数据集描述

这个数据集包含以下特征:

汽车ID,符号,汽车名称,燃油类型,吸气,门号,车身,驱动轮,发动机位置,轴距,车长,车宽,车高,整备质量,发动机类型,气缸数,发动机尺寸,燃油系统,硼比,冲程,压缩比,马力,峰值转速,城市英里数,高速公路英里数。

模型选择和科学计算库依赖

本项目使用的模型:

  •         线性回归
  •         决策树回归
  •         随机森林回归

本项目依赖的科学计算库

  • matplotlib==3.7.1
  • pandas==2.0.2
  • scikit_learn==1.2.2
  • seaborn==0.13.0

项目详细代码

#imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
data = pd.read_csv('car_price.csv')
data.head(10)

1. 探索数据特性

print("Rows: ",data.shape[0])
print("Columns: ",data.shape[1])
Rows:  205
Columns:  26
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   car_ID            205 non-null    int64  
 1   symboling         205 non-null    int64  
 2   CarName           205 non-null    object 
 3   fueltype          205 non-null    object 
 4   aspiration        205 non-null    object 
 5   doornumber        205 non-null    object 
 6   carbody           205 non-null    object 
 7   drivewheel        205 non-null    object 
 8   enginelocation    205 non-null    object 
 9   wheelbase         205 non-null    float64
 10  carlength         205 non-null    float64
 11  carwidth          205 non-null    float64
 12  carheight         205 non-null    float64
 13  curbweight        205 non-null    int64  
 14  enginetype        205 non-null    object 
 15  cylindernumber    205 non-null    object 
 16  enginesize        205 non-null    int64  
 17  fuelsystem        205 non-null    object 
 18  boreratio         205 non-null    float64
 19  stroke            205 non-null    float64
 20  compressionratio  205 non-null    float64
 21  horsepower        205 non-null    int64  
 22  peakrpm           205 non-null    int64  
 23  citympg           205 non-null    int64  
 24  highwaympg        205 non-null    int64  
 25  price             205 non-null    float64
dtypes: float64(8), int64(8), object(10)
memory usage: 41.8+ KB
data.isna().sum()
# 没有空值
car_ID              0
symboling           0
CarName             0
fueltype            0
aspiration          0
doornumber          0
carbody             0
drivewheel          0
enginelocation      0
wheelbase           0
carlength           0
carwidth            0
carheight           0
curbweight          0
enginetype          0
cylindernumber      0
enginesize          0
fuelsystem          0
boreratio           0
stroke              0
compressionratio    0
horsepower          0
peakrpm             0
citympg             0
highwaympg          0
price               0
dtype: int64
data.duplicated().sum()
# 没有重复值

0

data.groupby("CarName").sum(numeric_only=True)

# 删除 CarName, CarID 因为它不会给回归任务增加太多价值
data = data.drop(['car_ID','CarName'],axis=1)
data.head(1)

sns.histplot(data=data, x="price")

<Axes: xlabel='price', ylabel='Count'>

                            

plt.figure(figsize=(15,7))
sns.heatmap(data.corr(numeric_only=True), annot=True)
plt.title("Data Correlation",size=15)
plt.show()

                        

#燃料类型对价格的影响
sns.barplot(x="fueltype", y="price", data=data)
<Axes: xlabel='fueltype', ylabel='price'>

#车型对价格的影响
sns.boxplot(x ="carbody", y ="price", data = data)
<Axes: xlabel='carbody', ylabel='price'>

                         

#门数对价格的影响
sns.boxplot(x ="doornumber", y ="price", data = data)
<Axes: xlabel='doornumber', ylabel='price'>

驱动器(FWD、RWD、AWD)对价格的影响
sns.boxplot(x ="drivewheel", y ="price", data = data)
<Axes: xlabel='drivewheel', ylabel='price'>

#绘制热图中最相关属性之间的成对关系
columns=data[['wheelbase','carlength','carwidth','curbweight','price']]
sns.pairplot(columns)
plt.show()
#linear relationship

                          

columns=data[['horsepower','citympg','highwaympg','price']]
sns.pairplot(columns)
plt.show()
#linear relationship

                       

以下属性集具有线性关系:

1.轴距、车长、车宽、整备质量和价格(基本上是所有物理属性)

2.马力、城市英里数、高速公路英里数和价格(基本上是与车辆功率相关的所有属性)

2. 训练模型

encoder = LabelEncoder()
data['fueltype'] = encoder.fit_transform(data['fueltype'])
fueltype = {index : label for index, label in enumerate(encoder.classes_)}
data['aspiration'] = encoder.fit_transform(data['aspiration'])
aspiration = {index : label for index, label in enumerate(encoder.classes_)}
data['doornumber'] = encoder.fit_transform(data['doornumber'])
doornumber = {index : label for index, label in enumerate(encoder.classes_)}
data['carbody'] = encoder.fit_transform(data['carbody'])
carbody = {index : label for index, label in enumerate(encoder.classes_)}
data['drivewheel'] = encoder.fit_transform(data['drivewheel'])
drivewheel = {index : label for index, label in enumerate(encoder.classes_)}
data['enginelocation'] = encoder.fit_transform(data['enginelocation'])
enginelocation = {index : label for index, label in enumerate(encoder.classes_)}
data['fuelsystem'] = encoder.fit_transform(data['fuelsystem'])
fuelsystem = {index : label for index, label in enumerate(encoder.classes_)}
data['enginetype'] = encoder.fit_transform(data['enginetype'])
enginetype = {index : label for index, label in enumerate(encoder.classes_)}
data['cylindernumber'] = encoder.fit_transform(data['cylindernumber'])
cylindernumber = {index : label for index, label in enumerate(encoder.classes_)}
data['fuelsystem'] = encoder.fit_transform(data['fuelsystem'])
fuelsystem = {index : label for index, label in enumerate(encoder.classes_)}
x = data.drop('price', axis=1)
y = data['price']
scaler = MinMaxScaler(copy=True, feature_range=(0, 1))
X = scaler.fit_transform(x)
#train, test split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=30,random_state=0)
  1. 随机森林回归
    rf = RandomForestRegressor(n_estimators=100,max_depth=5, random_state=33)
    rf.fit(x_train, y_train)

print("Training r2_score: ",rf.score(x_train, y_train))
print("Testing r2_score: ",rf.score(x_test, y_test))
Training r2_score:  0.9753559007565417
Testing r2_score:  0.87367804775233
  1. 决策树回归
    dt = DecisionTreeRegressor( max_depth=5,random_state=33)
    dt.fit(x_train, y_train)

print('Training r2_score: ' , dt.score(x_train, y_train))
print('Testing r2_score: ' , dt.score(x_test, y_test))
Training r2_score:  0.9735394081185511
Testing r2_score:  0.8226507572837073

2.线性回归

def evaluate(model,x_train , y_train, x_test , y_test, y_predict):
    print(f'train r2_score:{r2_score(y_train, model.predict(x_train))}' )
    print(f'test r2_score : {r2_score(y_test, y_predict)}')
model = LinearRegression()
model.fit(x_train,y_train)
y_predict=model.predict(x_test)
evaluate(model,x_train , y_train, x_test , y_test, y_predict)
train r2_score:0.889157847638672
test r2_score : 0.7289860743863041

 项目资源下载

详情请见汽车价格的回归预测项目-VenusAI (aideeplearning.cn)

本文含有隐藏内容,请 开通VIP 后查看