ChatGPT predicts Real Estate Prices !

Is it possible to predict real estate rental prices without writing a single line of code? I asked ChatGPT to generate a complete solution. And was stunnded by its answers...

Artificial Intelligence has once again pushed the boundaries of what we thought was possible. The AI field has presented us with a stunning demonstration of its ability to not only assist in writing code, but to single-handedly create a complete, working Random Forest Regression solution. Yes, you read that right. ChatGPT, an AI model developed by OpenAI, is now proficient enough to autonomously develop a complex web application to predict real estate prices. And the most exciting part? You don’t need to write a single line of code!

How Does it Work?

In a fascinating experiment (below at the end of the post), ChatGPT was tasked with developing a complete web solution on PythonAnywhere. The brief was complex: create a webpage where users could upload a CSV file with geographical coordinates and real estate rental prices per square meter, clean the data, generate multiple scatter plot maps, use a Random Forest Regression model to calculate values, display statistics for the model, and predict values based on user input. The result? A fully functioning solution developed entirely by the AI.

The Implications of AI Co-Programming

The potential of ChatGPT for co-programming complex solutions is staggering. Think about it: the development of complex solutions no longer solely rests in the hands of those proficient in a given programming language. Anyone can now provide the instructions, and let the AI do the coding. This democratizes programming and makes it accessible to a much larger pool of people.

Imagine the vast research potential this opens up. Researchers across various fields can now build and customize applications or analyze data without needing to learn to code or hire a software developer. This can significantly speed up the process of testing hypotheses and obtaining results.

No-Code Solutions: A Game-Changer

The emergence of no-code solutions is profoundly shaping the tech industry. It breaks down the barriers and empowers everyone to create. For businesses, this means rapid prototyping and reduced time-to-market. It also democratizes innovation, as the ability to create technology is no longer restricted to the tech department.

For research, no-code solutions offer the potential for cross-disciplinary collaborations like never before. A biologist with a breakthrough idea for an AI model no longer has to wait for the computer science department to have time to collaborate. They can instead work directly with ChatGPT to create a prototype.

And the implications for education are equally significant. Learning to code is a valuable skill, but it can be time-consuming and may not be necessary for everyone. With AI like ChatGPT, students can focus on understanding computational thinking and algorithmic logic, and leave the syntax learning to the AI.

Conclusion: A New Era

Indeed, ChatGPT's ability to generate complex solutions like the Random Forest Regression model is not just a technical feat; it's the harbinger of a new era. An era where everyone, regardless of their coding ability, can create and innovate. It's a giant leap towards making coding and the creation of technological solutions accessible to all, and we can't wait to see what the future holds.

In the words of Sir Arthur C. Clarke, "Any sufficiently advanced technology is indistinguishable from magic." And with AI like ChatGPT, we're indeed witnessing magic unfold.

This post has been cowritten with ChatGPT. Please beware that AI is fascinating and can catch your full attention for hours. Get the balance right! > more?

The experiment

Let's start with the result:

And here is how ChatGPT made it:

Me:

Please write a complete Web solution for PythonAnywhere which corresponds to the following requirements: Display a Web page where the user can upload a csv file with data in the following columns (titles in first row): lat, long and pm2. lat and long are the geographical coordinates and pm2 is the price per square meter. Then the python program should treat this file with the following algorithm: 1. Read the data from the csv 2. Clean the data by removing incomplete rows and by removing 5% of the min and max outliers for the pm2 3. Generate and display a scatter plot map with the original pm2 values placed on the lat and long coordinates. 4. Find a Random Forest Regression model to calculate pm2 values from lat and long. 5. Display complete statistics for the regression model. 6. Predict pm2 values for each row with this model 7. Generate and display a map with the actual pm2 values on the y-axis and the predicted pm2 values on the x-axis and red line for the perfect fit. 8. Generate and display a scatter plot map map with the predicted pm2 values placed on the lat and long coordinates. 9. Generate and display a scatter plot map with the mean absolute errors between the actual and the predicted pm2 values placed on the lat and long coordinates. Generate all maps as scatter plot mapboxes with heat colors according to pm2, a legend scale and the following mouse-over information: lat, long, pm2. Display all the maps, status messages etc on the same Web page. Then let the user input below all this a street address and city in a text area and click on a 'Predict' button. Transform the address to lat and long, calculate the predicted pm2 and display it on the web page. Repeat this (and clear the output section of the predicted pm2) each time the user inputs a new address and click on the button. Comment the major parts directly in the Python code. And finally explain how to deploy this to my PythonAnywhere account.

ChatGPT-4:

This is a quite complex task and involves several different libraries and technologies including flask, pandas, plotly, scikit-learn and geopy for geocoding addresses. Here is the Python code for the Flask application that can do this. Please note that for this to work, you should have the following python libraries installed in your PythonAnywhere environment: pandas, plotly, scikit-learn, geopy, and flask.

After four iterations for improving and debugging the code, here is what ChatGPT programmed. Note that I have not written a single line of code!

(content of file flask_app.py)

import pandas as pd

import numpy as np

from flask import Flask, request, render_template

from sklearn.ensemble import RandomForestRegressor

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import plotly.express as px

from geopy.geocoders import GoogleV3

import pickle

app = Flask(__name__)

regr = None

px.set_mapbox_access_token("my token") # Add your Mapbox token here

def clean_data(df):

df.dropna(inplace=True)

q_low = df["pm2"].quantile(0.05)

q_hi = df["pm2"].quantile(0.95)

df = df[(df["pm2"] < q_hi) & (df["pm2"] > q_low)]

return df

@app.route('/', methods=['GET', 'POST'])

def upload_file():

global regr

if request.method == 'POST':

file = request.files['file']

if not file:

return render_template('index.html', message='No file')

df = pd.read_csv(file)

df = clean_data(df)

fig = px.scatter_mapbox(df, lat="lat", lon="long", color="pm2",

color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10)

X = df[['lat', 'long']]

y = df['pm2']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

regr = RandomForestRegressor(max_depth=2, random_state=0)

regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

r2 = r2_score(y_test, y_pred)

mae = mean_absolute_error(y_test, y_pred)

mse = mean_squared_error(y_test, y_pred)

rmse = np.sqrt(mse)

with open('model.pkl', 'wb') as f:

pickle.dump(regr, f)

with open('metrics.pkl', 'wb') as f:

metrics = {"r2": r2, "mae": mae, "mse": mse, "rmse": rmse}

pickle.dump(metrics, f)

df['predicted_pm2'] = regr.predict(X)

fig_pred = px.scatter_mapbox(df, lat="lat", lon="long", color="predicted_pm2",

color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10)

df['errors'] = abs(df['pm2'] - df['predicted_pm2'])

fig_err = px.scatter_mapbox(df, lat="lat", lon="long", color="errors",

color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10)

return render_template('index.html', plot=fig.to_html(), plot_pred=fig_pred.to_html(), plot_err=fig_err.to_html(),

r2=r2, mae=mae, mse=mse, rmse=rmse)

else:

return render_template('index.html')

@app.route('/predict', methods=['GET', 'POST'])

def predict():

global regr

address = request.form['address']

geolocator = GoogleV3(api_key='my google maps API key')

location = geolocator.geocode(address)

lat, long = location.latitude, location.longitude

with open('model.pkl', 'rb') as f:

regr = pickle.load(f)

with open('metrics.pkl', 'rb') as f:

metrics = pickle.load(f)

r2 = metrics["r2"]

mae = metrics["mae"]

mse = metrics["mse"]

rmse = metrics["rmse"]

predicted_pm2 = regr.predict([[lat, long]])

return render_template('index.html', prediction=predicted_pm2)

if __name__ == '__main__':

app.run(debug=True)

(content of file index.html template)

<!doctype html>

<html>

<head>

<title>PM2 Predictor</title>

</head

<body>

<h1>Upload CSV</h1>

</form>

<h1>{{ message }}</h1>

<h1>Original PM2 Map</h1>

<h1>Predicted PM2 Map</h1>

<h1>Error Map</h1>

<h1>Model Metrics</h1>

<p>R-squared: {{ r2 }}</p>

<p>Mean Absolute Error: {{ mae }}</p>

<p>Mean Squared Error: {{ mse }}</p>

<p>Root Mean Squared Error: {{ rmse }}</p>

<h1>Predict PM2 for an Address</h1>

</form>

<h1>Predicted PM2: {{ prediction }}</h1>

</body>

</html>

And ChatGPT told me how to deploy this on my PythonAnywhere account:

For deployment on PythonAnywhere:

Create an account on PythonAnywhere and go to the Dashboard.
Click on the Web tab and then Add a new web app.
Choose Flask and Python version you are using. The Python version should be the same as the one you have used in your local development.
In the Code section, enter the path to your Python script (it should end with .py) in Source code field and the path to the directory where your script and HTML template reside in Working directory field.
In the WSGI configuration file section, replace the flask_app = Flask('project') line with flask_app = Flask(name).
Make sure that your HTML file is in the templates directory.
Go to the Console tab and open Bash console. Install all necessary libraries using pip. Remember to use the --user flag. For example: pip3.7 install --user pandas plotly sklearn geopy flask
After all these changes, hit the Reload button on the Web tab.

ChatGPT explained how it made it...

Me:

Would you mind to explain the code (last version) for documentation purposes?

ChatGPT-4:

Of course! Here is an explanation of the main components of the code.

1. Libraries Import

These are the libraries necessary to run the application. They include:

Flask: A web framework for Python to create web applications.

Pandas: A data analysis library.

Numpy: A library for numerical computations in Python.

Scikit-learn: A machine learning library in Python.

Geopy: A library for performing geocoding (converting an address into latitude and longitude).

Plotly: A library for creating interactive plots.

2. Initialization of Flask Application

app = Flask(__name__)

This initializes a Flask web application. __name__ is a special variable that gets as value the string "main" when you’re running the script.

3. Definition of Routes

Routes in Flask are used to tell the application which URLs should trigger a response.

'/' (root): This is the default URL. When the server receives a request for this URL, it runs the upload_file() function.

'/predict': This route is used to make predictions based on an address that a user enters.

4. The upload_file() Function

This function does a lot:

If it's a GET request (i.e., the user has navigated to the webpage but hasn't uploaded a file yet), it simply returns the index.html page.

If it's a POST request (i.e., the user has uploaded a file), it reads the file into a pandas DataFrame, cleans the data, and fits a Random Forest Regression model to it. It also calculates some statistics for the model, and generates a series of interactive maps using Plotly.

Finally, it saves the trained model, metrics, and plots as files to be used later.

5. The predict() Function

This function uses the trained model to make predictions. It reads an address that the user has entered, converts it into latitude and longitude using geocoding, and then uses the model to predict the price per square meter for that location. It then renders the index.html template again, including the prediction and the previously saved metrics and plots.

6. Run the App

The line app.run() at the end starts the Flask web server and serves the application.

7. HTML Template

The index.html file is a template for the web page. It uses the Jinja2 templating engine to include variables (like the model metrics and prediction) in the HTML.

{% if prediction %}: This is an example of a conditional statement in Jinja2. If the prediction variable is not None, it includes the prediction in the web page.

{% include 'map.html' %}: This includes an HTML file (the Plotly map) in the template.

8. Deployment on PythonAnywhere

PythonAnywhere is a cloud platform for running Python applications. The code and HTML template need to be uploaded to PythonAnywhere and a web application needs to be created to serve them.

Please note that this explanation is a high-level overview and omits some details for the sake of brevity. If you need further clarification on any specific part of the code, feel free to ask!

And finally, ChatGPT explained the theory behind the model...

Me:

Can you explain the Random Forest Regression method we have used above?

ChatGPT-4:

Certainly! Random Forest Regression is a machine learning method that's used for regression tasks - predicting continuous quantities, like the price per square meter in this case.

Here's an overview of how Random Forest works:

1. Ensemble of Decision Trees:

A Random Forest is an ensemble of Decision Trees. A Decision Tree is a model that makes predictions by splitting the input features into different branches based on certain conditions, similar to a flowchart.

However, a single Decision Tree tends to overfit to the training data (it captures the noise in the data), and as a result, doesn't generalize well to unseen data. Random Forest overcomes this problem by training many Decision Trees and having them work together.

2. Bagging:

The key idea behind a Random Forest is bagging, which stands for Bootstrap Aggregating. The Random Forest algorithm creates multiple subsets of the original dataset by sampling with replacement (i.e., some samples may be repeated in each subset). Each of these subsets is used to train a Decision Tree. This introduces variety into the ensemble, which makes the model more robust and less prone to overfitting.

3. Random Subspaces:

In addition to bagging, Random Forest also randomly selects a subset of features at each split in the Decision Tree learning process. This introduces even more variety into the ensemble and further reduces overfitting.

4. Aggregation:

When making a prediction, each Decision Tree in the Random Forest gives its own prediction, and the final prediction of the Random Forest is the average of these predictions. This is why the method is called a "Forest" - it's a whole bunch of Decision Trees working together!

Random Forest Regression has several advantages:
It can handle large datasets with high dimensionality effectively.
It can handle missing values and maintains accuracy for missing data.
It performs well even with the default hyperparameters.

However, it also has some drawbacks:

Random Forest models are more complex and computationally intensive than simple models like linear regression. They may not be suitable for real-time predictions or when computational resources are limited.
They may overfit datasets that are particularly noisy.
They are not easily interpretable like Decision Trees.

In this application, we've used the Random Forest Regression model to predict the price per square meter (pm2) based on latitude and longitude. The trained model can then be used to predict the pm2 for any given pair of latitude and longitude.

good-morning.ai

Search This Blog