Screen Scraping: the hidden truth

Neonomics
Neonomics
Published in
4 min readOct 1, 2020

--

Today, a lot of companies use so-called “screen scraping” — is it a viable solution and how does it work? In order to gain a better insight on this, we sat down with our CTO, Kenneth, and asked him to explain things to us in a little more detail…

Hi Kenneth, firstly: why the need for screen scraping?

Financial institutions preparing to meet Regulatory Technical Standards (RTS) of Open Banking are currently facing a shortage of live Open APIs. For any bank that wishes to get ready for the Open Banking era, the only viable choice is using account aggregators, many of whom provide solutions based on “screen scraping”.

How does screen scraping differ from someone simply reading information from a webpage and re-using it elsewhere?

Web pages are sources of information, however, to understand screen scraping we need to remember that whenever a webpage is accessed there are always two “readers” of that information: the computer and yourself. The way humans prefer to digest information vs how computers read information is very different. As humans, we like clear layouts and eye-catching design in the form of fonts and colours to make the information more digestible and appealing.

For this reason, webpages are designed in three layers: firstly, to be understood by the computer, and finally to present the information in a readable fashion for human readers.

Understood. So, can you explain the layers to us?

The raw data itself provides the base-layer, followed by the structure layer which comes from HTML code — whose function is arrange where the data appears on the page — and then finally the presentation layer, which is the only layer available in an easily readable format to humans.

How then, does screen scraping work?

A computer has no need for the structure layer or the presentation layer. Essentially, screen scraping involves creating a filter to subtract the “irrelevant” layers and reveal only the raw data, which can then be used as desired.

In what ways could a person or company use this data?

Once you have access to the raw data you can recycle it and present it as your own page with your own colors, logo, structure, etc. You could even combine the data from the page with other sources of data which will allow you to improve on it — there is virtually no limit to what you can do with it!

Is screen scraping an easy solution?

Screen scrapers need to be very finely tuned to filter out the “useless” information whilst retaining the information you desire, which means they are often designed to be used specifically for individual pages. As a result, if those who provide the page you are filtering change something in their upper two layers, your filter might break, meaning you suddenly lose access to the information you need.

You then must scramble to calibrate your scraper to the new presentation and structure so that you can get your solution up and running quickly again, resulting in potential downtime for your users!

Is screen scraping a cost-effective solution?

Scrapers are used by servers to provide data, often must updating the information at regular intervals — ranging from a few times a day to a few times a minute. The webpages who provide the information are targeted towards users, and subsequently are not built or scaled for the demands of a computer. This makes running a webpage which is being scraped by one or many other computers a lot more expensive than originally intended. Many companies therefore discourage scraping by adjust their presentation and structure layers on a regular basis to thwart the efforts of would-be screen scrapers.

Is screen scraping a legal and viable alternative?

Not at all. The most recent PSD2 directive to have been launched in Europe demands that the banks provide APIs to access their systems. Alongside this, screen scraping has officially been prohibited.

If it is not allowed, then why/how is screen scraping still used?

Whilst screen scraping is not allowed, there are a couple of temporary exceptions made for banks that do not yet have an accessible production level PSD2 API.

One exception is a six-month grace period for third party apps to transition from screen scraping to PSD2 usage. Once this time period is up, however, there are no further exceptions.

The second exception is for banks who struggle to meet the PSD2 deadline; they can allow screen scraping whilst they prepare, providing they have applied for a license to do so. However, in this case it is still necessary to manage the SCA element for consent, so screen scraping could not be performed in the same way as previously, but many of the mechanisms could still be used until the bank has a proper PSD2 interface.

Finally, it is worth noting that there are other banking services outside of the PSD2 scope, which are subject to discussion in many forums on whether a TPP should be allowed to scrape these and whether this is a good idea regarding security concerns, but this is yet to be concluded.

--

--