This question is asked a lot given the growth of web scraping and many recent legal cases related to this topic, it definitely comes as no surprise. First of all, let me tell you what’s not legal. 🚫
A great example of illegal web scraping is when you try to scrape private user data. Private data is usually not accessible to everyone that can access the internet, several examples involve data that would be obtained from a personal Facebook or LinkedIn account.
How Often Will You Scrape The Website?
Depending upon your project’s requirements and when hiring a web scraping consultant, it’s vital to determine how often you will scrape the target websites. For example if you’re looking to scrape uk.hotels.com at least 10,000 times per day, then this is likely to be more unethical than 100 times per day.
The reason is that the more frequently you scrape a website, the more of their resources you’re using to power your product or service. If you do this too much you can rack up high server costs for the website owner, and this can be incredibly expensive for smaller to medium sized businesses.
Therefore by reviewing the volume of daily requests your web scraping project would make can help you to easily gauge whether the project would be ethical or unethical.
How To Reduce The Chances That Your Web Scraping Will Be Unethical / Illegal
Web scraping is not an easy process in most situations, as websites often their own unique design and functionality and it can be a tricky process creating an ethical web scraping solution.
Therefore we created a simple process for ensuring that your next web scraping project is less likely to be unethical/illegal.
- Step 1: Minimise the number of unknowns within your data gathering process.
- Step 2: Run a legal review.
Minimizing the number of unknowns of data gathering
The primary objective of the gathering process is to minimize the number of unknowns if possible making zero assumptions about any aspect so you can construct the best solution for your business needs.
Speak To Your Developers
Effectively describing your project requirements to a developer not only improves the chances of your project being created without revisions but it also means that you’re more aware of what data sources you’ll need to create the final solution.
This is important because by thinking about the initial data sources (public vs private APIs vs web scraping), you can decide upon how ethical this project might be before fully committing to building the first minimum viable product.
One example might be a tool that analyses social media data from either Facebook or Instagram, although this is a cool idea, the data is very centralized and you would need to scrape it with a headless browser whilst being logged into a personal account.
This would make the end product / scraping operation less scalable as you would have to create multiple personal accounts and use VPNs to start scraping the data at scale.
Simply by thinking about the initial sources of data and following what the ‘scalable solution’ would look like will often show you whether the technology stack would be on the ethical or unethical side.
Request A Legal Review
With the increased awareness over the last years regarding data privacy and web scraping, ensuring that your project is legally compliant is now a must. Otherwise, you could land in a lot of trouble for yourself and your business.
On the other hand, there are cases of collecting and scrapping private data that exist in a completely different area of lawfulness.
First, when discussing the legality of web scraping, you’ll want to clearly describe the data accessibility to your legal team. For example web crawling on the internet across websites (public) vs data obtained from a logged in account from LinkedIn (private/personal data).
So, is the whole process legal or not? Yes, unless you’re using it unethically.
Web scraping is just like any other tool in the world, some people will use the technology for bad things and others will use that same technology for good things.
As a matter of fact, web scraping – or web crawling – has historically been linked to well-known search engines such as Google or Bing. Because these search engines founded confidence and helped bring traffic and visibility back to the sites they were crawling, their bots created a favorable view of web scraping.
However web scraping itself isn’t illegal and even big technology giants such as Google and Microsoft crawl the web everyday to power their search engines.
One thing is clear: This is a powerful tool that lets businesses exploit internet data and by all means, it should be done with integrity and respect.
It is all about what data you collect via web scraping and what you do with that data that ultimately matters.Tweet