[应用开发] 基于随机森林算法的ROI预测模型

4月28日修改

2104

2217

💡

之前碰到个需求，一些always on的转化类广告项目，客户经常会提问：我应该怎么分配每日的预算？在某个广告平台上，一套素材投几天？以往运营团队在没有精细计算的情况下，会给到一个比较模糊的轮廓经验值（比如一天80万，一套素材投14天），但在积累了一定量的数据以后，我们可以尝试用算法去完成这个预测（或者叫总结），给到一些更精确的回答。

这个任务我同样给了高斯过程回归和多项式回归（扩展到了3次方），但实践下来还是随机森林的算法效果更好，所以目前主要推荐用这套算法来完成预测。下面是具体的脚本，同样在colab上运行，需要改到本地运行的同学调整一下文件读取和存储的路径模块即可

步骤1：加载依赖项

代码块

import numpy as np​
2import pandas as pd​
3from sklearn.preprocessing import StandardScaler​
4from sklearn.model_selection import train_test_split, GridSearchCV​
5from sklearn.ensemble import RandomForestRegressor​
6from sklearn.metrics import mean_squared_error​
7from sklearn.gaussian_process import GaussianProcessRegressor​
8from sklearn.gaussian_process.kernels import Matern​
9from google.colab import files​

步骤2：上传文件

这里默认文件命名为"data.xlsx"的文件，colab上避免重复上传，加入了删除历史文件的模块，本地不需要

代码块

# 删除之前保存的文件​
file_path = 'data.xlsx'​
if os.path.exists(file_path):​
    os.remove(file_path)​
    print(f"Deleted existing file: {file_path}")​
​
# 上传Excel文件​
uploaded = files.upload()​
​
# 读取上传的Excel文件​
# 上传的文件名为 'data.xlsx'​
if not uploaded:​
    print("No file was uploaded. Please try again.")​
else:​
    # 上传的文件名为 'data.xlsx'​
    file_name = list(uploaded.keys())[0]​
    if not file_name.endswith('.xlsx'):​
        print(f"Uploaded file is not an Excel file. Please upload a file with '.xlsx' extension.")​
    else:​
        # 读取上传的Excel文件​
        df = pd.read_excel(file_name)​

[应用开发] 基于随机森林算法的ROI预测模型​

[应用开发] 基于随机森林算法的ROI预测模型