Automated Valuation Models for Real Estate

How to leverage noisy incomplete data from MLS

When we decide to buy or rent a real estate (apartment, room, house, etc), one of the most important search criteria is the price. Its value depends mostly on characteristics, such as location, year of construction, number of rooms, area, central heating, etc.

However, two properties with the same characteristics, for example, can be sold at two totally different prices, and there are deeper reasons for that difference. The seller/buyer urgency in completing the deal, the market context, the real estate agency managing the deal, or the time of the year, all contribute to these differences.

Thus, it can be particularly challenging to determine what is the real selling price of a given property. By analyzing the listing prices of properties in real estate websites, we can get an incorrect idea of the true value of the place. That is especially true, due to overestimation of the realistic value, for selling purposes. This may lead us to end up buying/renting a place for a price way greater than the realistic one.

As such, we will explore an approach to determine the real selling price of a place, by taking into consideration different aspects considered relevant when making an offer.

Investment in real estate can be purchase or rental of a house, an apartment or a room. It can also be for private use or for commercial use. However, we will assume the scenario of purchasing an apartment for private use. Nevertheless, in all these different contexts, the same considerations can be taken into account.



Besides the property characteristics, there are other factors that may influence the selling price, therefore, we should look at other types of indicators and data when making an evaluation, namely:

  • Demographics and Geo-spatial data: a recent boost in the population in a given neighborhood may indicate a higher demand for that area, therefore the price should be higher. As such, the population growth around the respective neighborhood may be an indicator of the property price. Also, the infrastructures in the neighborhoods like the number and variety of stores, malls, public transportation, and schools may help adjust the right price.
  • Unstructured data: the pictures, descriptions and opinions given in the form of images and text can help us capture the condition of the property, which can impact its price.
  • Market behavior: The current market conditions also have an impact on selling prices. The demand or number of similar houses on sale impacts the house value, following the Demand and Supply Law. Thus, the average price of similar houses bought recently can be a good indication of the price. Additionally, the length of time during which the property is on the market, compared to the average for similar properties, may raise a red flag with regards to the current price.
  • Economic Indicators: There are also economic factors that influence house pricing. For example, the increase in the employment rate or in the wage growth can lead to an increase on the property listing price. Also, changes in the interest rate or financial incentives can contribute to buying a property on credit.
  • Selling aspects: urgency in the selling process, conditions of the payment and expertise of the selling agency, are other aspects that may influence the selling price, compared to its real value.


In terms of the data available, we can assume we know the apartment characteristics (e.g., number of rooms, location, area, energy efficiency, etc) and some indicators, like number of infrastructures near the place, employment rate, average price of similar houses, urgency in the selling, images and texts of the apartments, etc.

Furthermore, in terms of pricing, we will assume that we know the listing price of all apartments, and the selling price of some apartments (e.g., the selling price of deals made by a single real estate agency). This information can be structured as follows:

Just for clarification, we refer to the selling price as the value a given property is effectively sold at and the listing price corresponds to the price the place was listed on the market in the first place.

Price Prediction

The selling price prediction has several challenges, namely the following two:

  • The real selling price is often missing in our dataset, as it is not always available
  • The listing price may help us predict the selling price, although the distribution of the ratio between listing price and selling price is not given


Semi-supervised approach

As the selling price is only available in a small set of samples, the exploration of a fully supervised approach is not suitable.

One first approach could be using a semi-supervised approach with the goal of predicting the selling price based on the few samples labeled, as follows:

F(apt features) -> selling price

Where apt features, includes all the aspects previously described, such as demographics and geo-spatial data, market behavior, economic indicators, etc, besides the apartment characteristics. The text or image data could be encoded to be used in a tabular data format.

There are different semi-supervised techniques we could explore (transductive, inductive, wrapper methods, etc) for modeling.

However, this approach would be biased towards the agency from which we gathered the real selling price. Furthermore, we would not be, explicitly, taking advantage of having the listing price available, which can be used as a weak label.


The Machine Learning Spectrum

Learn more about the different types of Learning.

Learn More

Semi-supervised + weakly supervised approach

As such, another approach can be considering the listing price as a weak label and use it to predict the selling price. For making a direct mapping, we would need to determine the distribution of the difference between the real selling price and listing price.

Thus, we can combine both semi-supervised learning and weakly supervised learning, in order to:

  • Adapt our approach, taking into consideration we have few data labeled (semi-supervised approach)
  • Use a noisy and weak label, the listing price, as a starting point to compute the real selling price (weakly supervised approach)

To achieve that, we will customize a loss function that can help us solve this task, taking these challenges into consideration.

Generically, we can model our problem as follows:

F(apt features, listing price) -> selling price

Again, the apt features would consist of all the aspects mentioned before and not only the apartment characteristics.

Distribution of the ratio between selling price and listing price

We will determine the relationship between the listing price and selling price by calculating the distribution of the ratio between them.

A possible example of the price ratio distribution could be:

Loss function/Optimization

The loss function will be customized in order to compare the price ratio distribution using the model predictions with the real price ratio distribution (computed with the known selling prices), combined with evaluation of the predictions of selling price.

To achieve this, we can use the Kullback-Leibler Divergence, which quantifies the difference between probability distributions using the following formula:

     $$DK(p||q) = \sum_{i=1}^{N} p(x_i) * (\log p(x_i) - \log q(x_i))$$

Where p and q correspond to the two probability distributions to be compared.

For evaluating the selling price predictions we can use the Mean Absolute Error (MAE):

     $$MAE(x, y) = \frac{1}{N} \sum_{i=1}^{N} \left | x_i - y_i \right |$$

Where x represents the selling price predictions and the y represents the real selling prices.

Thus, our loss function would be:

     $$MIN \left ( D_{KL}(r_p \parallel r_g) + MAE(selling\_price_{predicted}, selling\_price_{real}) \right )$$

Where r_p refers to the price ratio distribution using the selling price predictions of the model and r_g refers to the real price ratio distribution, using the samples in which we know the real selling price. The selling_pricepredicted represents the selling prices predicted by the model and the selling_pricereal represents the real selling prices.

Do you want to further discuss this idea?

Book a meeting with Kelwin Fernandes

Meet Kelwin Learn More


The task of purchasing a property can be quite impactful in our financial life. Therefore we should put an extra effort to try to get the best deal in terms of value/quality vs price.

This post discusses an approach for determining the correct selling price, based on the different factors considered relevant. There are a lot of aspects that influence a property value, and even more that determine the selling price. Thus, we started by making an overview of the different aspects that may influence the selling price, where the market behavior, demographics and geo-spational data, unstructured data (reviews, pictures and descriptions) and economic indicators are included.

Based on the data that is normally available online we described an approach that combines both weakly supervised and semi-supervised learning, together with a customized loss function that focuses on learning the real price ratio distribution, i.e., the ratio between the listing price and selling price.

This can be a realistic approach for predicting the real selling price. Nevertheless, and, as usual, if you have any comments or ideas about Automated Valuation Models for Real Estate, make sure to reach us!


Like this story?

Subscribe to Our Newsletter

Special offers, latest news and quality content in your inbox once per month.

Signup single post

This field is for validation purposes and should be left unchanged.

Recommended Articles

Link to Leaders Awarded NILG.AI Startup of the Month

NILG.AI is Startup of the Month Link to Leaders awarded NILG.AI the Startup of the Month (check news). Beta-i nominated us after winning two of their Open Innovation challenges: VOXPOP Urban Mobility Initiatives and Re-Source. AI with Geospatial Data At VOXPOP, NILG.AI built an AI-based mobility index for wheelchair users for the municipality of Lisbon […]

Read More
Can Machine Learning Revolutionize Your Business?

Today, the buzz around machine learning (ML) is louder than ever. But what is it exactly, and more importantly, can it revolutionize your business? In essence, ML is a technology that empowers machines to learn from data, improve over time, and make predictive decisions. It has the potential to redefine how businesses operate. In this […]

Read More
Can ‘Old but Gold’ Predictions Minimize AI Costs?

There’s a common pattern in artificial intelligence (AI) where large corporations build massive infrastructures to support their AI use cases. The goal is to make quick predictions and constantly update with new data to scale up your infrastructure. However, this approach often overlooks the trade-off between infrastructure cost and the size of the opportunity that […]

Read More