Quality Control Automation: Your Manufacturing Game-Changer
Jun 5, 2025 in Industry Overview
Master quality control automation with proven strategies that drive real results. Discover practical insights from industry leaders.
Not a member? Sign up now
How to leverage noisy incomplete data from MLS
Daniel Azevedo on Jun 14, 2021
When we decide to buy or rent a real estate (apartment, room, house, etc), one of the most important search criteria is the price. Its value depends mostly on characteristics, such as location, year of construction, number of rooms, area, central heating, etc.
However, two properties with the same characteristics, for example, can be sold at two totally different prices, and there are deeper reasons for that difference. The seller/buyer urgency in completing the deal, the market context, the real estate agency managing the deal, or the time of the year, all contribute to these differences.
Thus, it can be particularly challenging to determine what is the real selling price of a given property. By analyzing the listing prices of properties in real estate websites, we can get an incorrect idea of the true value of the place. That is especially true, due to overestimation of the realistic value, for selling purposes. This may lead us to end up buying/renting a place for a price way greater than the realistic one.
As such, we will explore an approach to determine the real selling price of a place, by taking into consideration different aspects considered relevant when making an offer.
Investment in real estate can be purchase or rental of a house, an apartment or a room. It can also be for private use or for commercial use. However, we will assume the scenario of purchasing an apartment for private use. Nevertheless, in all these different contexts, the same considerations can be taken into account.
Besides the property characteristics, there are other factors that may influence the selling price, therefore, we should look at other types of indicators and data when making an evaluation, namely:
In terms of the data available, we can assume we know the apartment characteristics (e.g., number of rooms, location, area, energy efficiency, etc) and some indicators, like number of infrastructures near the place, employment rate, average price of similar houses, urgency in the selling, images and texts of the apartments, etc.
Furthermore, in terms of pricing, we will assume that we know the listing price of all apartments, and the selling price of some apartments (e.g., the selling price of deals made by a single real estate agency). This information can be structured as follows:
Just for clarification, we refer to the selling price as the value a given property is effectively sold at and the listing price corresponds to the price the place was listed on the market in the first place.
The selling price prediction has several challenges, namely the following two:
As the selling price is only available in a small set of samples, the exploration of a fully supervised approach is not suitable.
One first approach could be using a semi-supervised approach with the goal of predicting the selling price based on the few samples labeled, as follows:
F(apt features) -> selling price
Where apt features, includes all the aspects previously described, such as demographics and geo-spatial data, market behavior, economic indicators, etc, besides the apartment characteristics. The text or image data could be encoded to be used in a tabular data format.
There are different semi-supervised techniques we could explore (transductive, inductive, wrapper methods, etc) for modeling.
However, this approach would be biased towards the agency from which we gathered the real selling price. Furthermore, we would not be, explicitly, taking advantage of having the listing price available, which can be used as a weak label.
As such, another approach can be considering the listing price as a weak label and use it to predict the selling price. For making a direct mapping, we would need to determine the distribution of the difference between the real selling price and listing price.
Thus, we can combine both semi-supervised learning and weakly supervised learning, in order to:
To achieve that, we will customize a loss function that can help us solve this task, taking these challenges into consideration.
Generically, we can model our problem as follows:
F(apt features, listing price) -> selling price
Again, the apt features would consist of all the aspects mentioned before and not only the apartment characteristics.
We will determine the relationship between the listing price and selling price by calculating the distribution of the ratio between them.
A possible example of the price ratio distribution could be:
The loss function will be customized in order to compare the price ratio distribution using the model predictions with the real price ratio distribution (computed with the known selling prices), combined with evaluation of the predictions of selling price.
To achieve this, we can use the Kullback-Leibler Divergence, which quantifies the difference between probability distributions using the following formula:
Where p and q correspond to the two probability distributions to be compared.
For evaluating the selling price predictions we can use the Mean Absolute Error (MAE):
Where x represents the selling price predictions and the y represents the real selling prices.
Thus, our loss function would be:
Where r_p refers to the price ratio distribution using the selling price predictions of the model and r_g refers to the real price ratio distribution, using the samples in which we know the real selling price. The selling_pricepredicted represents the selling prices predicted by the model and the selling_pricereal represents the real selling prices.
Book a meeting with Kelwin Fernandes
Meet Kelwin Learn MoreThe task of purchasing a property can be quite impactful in our financial life. Therefore we should put an extra effort to try to get the best deal in terms of value/quality vs price.
This post discusses an approach for determining the correct selling price, based on the different factors considered relevant. There are a lot of aspects that influence a property value, and even more that determine the selling price. Thus, we started by making an overview of the different aspects that may influence the selling price, where the market behavior, demographics and geo-spational data, unstructured data (reviews, pictures and descriptions) and economic indicators are included.
Based on the data that is normally available online we described an approach that combines both weakly supervised and semi-supervised learning, together with a customized loss function that focuses on learning the real price ratio distribution, i.e., the ratio between the listing price and selling price.
This can be a realistic approach for predicting the real selling price. Nevertheless, and, as usual, if you have any comments or ideas about Automated Valuation Models for Real Estate, make sure to reach us!
Like this story?
Special offers, latest news and quality content in your inbox.
Jun 5, 2025 in Industry Overview
Master quality control automation with proven strategies that drive real results. Discover practical insights from industry leaders.
Jun 5, 2025 in Industry Overview
Explore the best predictive maintenance tools transforming industries in 2025. Maximize asset uptime and efficiency with AI-powered solutions.
Jun 5, 2025 in Industry Overview
Transform operations with supply chain predictive analytics. Proven strategies, real results, and implementation insights from industry leaders.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |