Web Scraping Realtor.com

Realtor is one of the most popular and second-largest real estate listing platforms in the United States. Scraping real estate data can help you perform proper market research before making any decision. It can also be utilized to identify potential future trends in the market and adjust your positions accordingly.

Extracting data from Realtor.com is easy, and in this article, we will take advantage of Python and its dedicated libraries for scraping real estate data to scrape all the property sets available on the target web page.

If you are a beginner and want to get a more profound language on web scraping, please check out this guide: Web Scraping With Python

Requirements

Before scraping Realtor.com, we need to install some libraries to proceed with this tutorial. I assume you have already installed the latest version of Python on your device.

You can start by creating a new directory to store our scraping files:

mkdir realtor_scraper

Next, we will create a new Python file in our folder to deal with our scraping operations.

Then, install the libraries with the following command.

pip install requests 
pip install beautifulsoup4

Requests — To extract the HTML data from the Zillow website.
Beautiful Soup — For parsing the extracted HTML data.

What To Scrape From Realtor.com

It is good practice to decide what you want to scrape from the website in advance. In this tutorial, we will be extracting the following data points from the target page:

Address
Bath Count
Bed Count
Sqft
Plot Size
Pricing

Scraping Property Data From Realtor

We will use BeautifulSoup select and select_one methods to access the DOM elements. Before we start coding our scraper, we need to understand the HTML structure of the web page.

You can easily do this by right-clicking on any element of interest, which will open a menu from which you need to select the Inspect button. This will open the developer’s tools panel on your screen, which can be used to identify the tags containing the required information.

From the above image, we can conclude that all the properties are stored inside the class BasePropertyCard_propertyCardWrap__30VCU of the div tag.

Let us first start with extracting the HTML data.

import requests
from bs4 import BeautifulSoup

l=list()
obj={}
 
headers = {
           "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
          }
 
resp =  requests.get("https://www.realtor.com/realestateandhomes-search/Los-Angeles_CA/type-single-family-home,multi-family-home", headers=headers).text
soup = BeautifulSoup(resp,'html.parser')

Step-by-step explanation:

In the first two lines, we imported the Requests and BeautifulSoup library.
Next, we declare two variables to store the data.
It is important to make our bot mimic a humanoid visitor, so we initialized our header with User Agent, which will be passed with a GET request.
Then, with the help of Requests we made an HTTP connection on the target URL.
Finally, we created an instance of BeautifulSoup to navigate through the HTML and obtain the required information.

Now, we will use the select method to capture all the elements with the class BasePropertyCard_propertyCardWrap__30VCU.

for el in soup.select(".BasePropertyCard_propertyCardWrap__30VCU"):

This for loop will allow us to iterate over all the elements with the given class and extract the data present inside each listing.

Let us now locate the tags for each data point we discussed above.

In the above image, we can see that the pricing of the property is contained inside the div tag with the attribute data-testid="card-price" under the class price-wrapper of the div tag. Now, inside the for loop, add the following code.

for el in soup.select(".BasePropertyCard_propertyCardWrap__30VCU"):
    try:
            obj["pricing"]=el.select_one(".price-wrapper div[data-testid=card-price]").text
    except:
            obj["pricing"]=None

Then, we will get the bed, bath, and sqft information from the HTML page.

The above image shows that all this information is stored inside individual li tags, with different data-testid attributes. The bed property has the attribute data-testid=property-meta-beds, and similarly, the bath property has the attribute data-testid=property-meta-baths.

Copy the following code to extract this information.

    try:
            obj["bed"]=el.select_one("li[data-testid=property-meta-beds]").text
    except:
            obj["bed"]=None
    try:
            obj["bath"]=el.select_one("li[data-testid=property-meta-baths]").text
    except:
            obj["bath"]=None
    try:
            obj["sqft"]=el.select_one("li[data-testid=property-meta-sqft]").find_next().text
    except:
            obj["sqft"]=None
    try:
            obj["plot_size"]=el.select_one("li[data-testid=property-meta-lot-size]").find_next().text
    except:
            obj["plot_size"]=None

But, there might be some cases where you won’t find every category of text in the given position. That is why we are using try and except to escape if any error occurs.

Finally, we are left with the address data point. The process of finding the address will be the same as the methods we have previously followed. As you have learned now, you can try to extract the property address yourself.

So, we can get the address inside the div tag with the attribute card-address-1 and card-address-2.

You can integrate the following code to extract the address.

    try:
            obj["address"]=el.select_one("div[data-testid=card-address-1]").text + " " + el.select_one("div[data-testid=card-address-2]").text
    except:
            obj["address"]=None

So, we are done with extracting each data point. We will then append this object into the l array to store each property data.

    l.append(obj)
    obj={}
  
print(l)

Execute the program in your project terminal. You will get the following results:

[
 {
  'pricing': '$185,000,000',
  'bed': '10bed',
  'bath': '14.5+bath',
  'sqft': '34,380sqft',
  'plot_size': '2.65acre lot',
  'address': '869 Tione Rd Los Angeles, CA 90077'
 },
 {
  'pricing': '$54,995,000$5M',
  'bed': '9bed',
  'bath': '18bath',
  'sqft': '21,000sqft',
  'plot_size': '3.6acre lot',
  'address': '10066 Cielo Dr Beverly Hills, CA 90210'
 },
 {
 'pricing': '$9,500,000$1.5M',
 'bed': '5bed',
 'bath': '7bath',
 'sqft': '9,375sqft',
 'plot_size': '0.74acre lot',
 'address': '13320 Mulholland Dr Beverly Hills, CA 90210'
 }
 ....

Complete Code:

You can make some changes to this code according to your needs. For example, extracting images and links and implementing pagination. But for now, our code will look like this:

import requests
from bs4 import BeautifulSoup

l=list()
obj={}
 
headers = {
           "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
          }
 
resp =  requests.get("https://www.realtor.com/realestateandhomes-search/Los-Angeles_CA/type-single-family-home,multi-family-home", headers=headers).text
soup = BeautifulSoup(resp,'html.parser')

for el in soup.select(".BasePropertyCard_propertyCardWrap__30VCU"):
    try:
            obj["pricing"]=el.select_one(".price-wrapper div[data-testid=card-price]").text
    except:
            obj["pricing"]=None
    try:
            obj["bed"]=el.select_one("li[data-testid=property-meta-beds]").text
    except:
            obj["bed"]=None
    try:
            obj["bath"]=el.select_one("li[data-testid=property-meta-baths]").text
    except:
            obj["bath"]=None
    try:
            obj["sqft"]=el.select_one("li[data-testid=property-meta-sqft]").find_next().text
    except:
            obj["sqft"]=None
    try:
            obj["plot_size"]=el.select_one("li[data-testid=property-meta-lot-size]").find_next().text
    except:
            obj["plot_size"]=None
    try:
            obj["address"]=el.select_one("div[data-testid=card-address-1]").text + " " + el.select_one("div[data-testid=card-address-2]").text
    except:
            obj["address"]=None
    
    l.append(obj)
    obj={}
 
print(l)

Scraping Realtor Using Serpdog

If you continue extracting data from Realtor, your IP may get blocked, and you will encounter the CAPTCHA screen every time you visit their site.

To avoid blocking, you can use Serpdog’s Web Scraping API to scrape data from any website. Serpdog is backed by a massive amount of rotating residential and data center proxies, allowing businesses to bypass any CAPTCHA and focus on the data extraction and product development part.

You can register on Serpdog to claim your 1000 free credits to start scraping Realtor.com without getting blocked.

Let’s see how you can use Serpdog.

After successfully signing up, you will be redirected to our dashboard, where you will get your API Key.

Copy the API key from the dashboard and embed this in the below code to scrape data from Realtor quickly and easily.

import requests
from bs4 import BeautifulSoup

l=list()
obj={}
 
resp =  requests.get("https://api.serpdog.io/scrape?api_key=APIKEY&url=https://www.realtor.com/realestateandhomes-search/Los-Angeles_CA/type-single-family-home,multi-family-home&render_js=false").text
soup = BeautifulSoup(resp,'html.parser')

for el in soup.select(".BasePropertyCard_propertyCardWrap__J0xUj"):
    try:
            obj["pricing"]=el.select_one(".price-wrapper div[data-testid=card-price]").text
    except:
            obj["pricing"]=None
    try:
            obj["bed"]=el.select_one("li[data-testid=property-meta-beds]").text
    except:
            obj["bed"]=None
    try:
            obj["bath"]=el.select_one("li[data-testid=property-meta-baths]").text
    except:
            obj["bath"]=None
    try:
            obj["sqft"]=el.select_one("li[data-testid=property-meta-sqft]").find_next().text
    except:
            obj["sqft"]=None
    try:
            obj["plot_size"]=el.select_one("li[data-testid=property-meta-lot-size]").find_next().text
    except:
            obj["plot_size"]=None
    try:
            obj["address"]=el.select_one("div[data-testid=card-address-1]").text + " " + el.select_one("div[data-testid=card-address-2]").text
    except:
            obj["address"]=None
    
    l.append(obj)
    obj={}
 
print(l)

If any API call fails, you will not be charged for it.

Conclusion

In this tutorial, we learned to scrape property data from Realtor.com including pricing, address, and other features. If your demand grows, and you want to extract more data, you can consider our web scraping API that features proxy rotation and headless browsers.

I hope this tutorial gave you a basic overview of how to scrape Realtor.com using Python.

If you think we can complete your web scraping tasks and help you collect data, please don’t hesitate to contact us.

Please do not hesitate to message me if I missed something. Follow me on Twitter. Thanks for reading!

Frequently Asked Questions

Is it legal to scrape Realtor.com?

Yes, it is legal to scrape Realtor.com, as the property data is publicly available, and we are not scraping any personal information that may raise any questions.

Does the Realtor Provide Any Free API?

No, Realtor.Com Do Not Provide Any Free API, But You Can Try Serpdog’s Web Scraping API, Which Offers 1k Free Requests Credits To Its Users On Registration On Its Website.

Additional Resources

I have prepared a complete list of blogs to learn web scraping that can give you an idea and help you in your web scraping journey.