Comunidad de Madrid

Anonymization of rulings through OCR and NLP

Sector: Public · Cliente: Comunidad de Madrid

Project duration
3 months

Technologies used

  1. Optical Character Recognition (OCR)
  2. AI Models: LLM, entity tagger
  3. Machine Learning Models

What do we resolve?

In this successful case, we addressed the challenge of managing and analyzing a large set of rulings from the Regional Institute for Consumer Arbitration of the Autonomous Community of Madrid. The project aimed to transform 4,677 rulings from the years 2015, 2016, and 2017 into accessible and structured data. The complexity lay in dealing with diverse formats, including scanned documents, and the need to classify and anonymize them for subsequent analysis. 

How do we resolve it?

To tackle this challenge, we implemented an advanced technological solution based on Optical Character Recognition (OCR) models and natural language processing techniques. We employed OCR to convert scanned documents into digital text, enabling subsequent classification and labeling of the rulings based on their content. Additionally, we used language models to identify sensitive entities that required anonymization, such as names of individuals or involved companies.

The process consisted of several phases:

  1. Text Extraction: We utilized OCR technology to extract text from scanned documents, ensuring accurate conversion even for documents of varying quality.
  2. Automated Classification: Machine learning models were employed to automatically classify the rulings based on their content and type.
  3. Data Anonymization: We identified and anonymized sensitive entities present in the documents, ensuring compliance with data protection regulations.

What results do we achieve?

Improved Accessibility:
The 4,677 rulings were transformed into structured digital data that was easy to access and query.

Operational Efficiency: Automating the classification and anonymization process significantly reduced the time and resources required for analyzing the rulings. 

Regulatory Compliance:
We ensured the protection of sensitive data through anonymization, thereby
complying with privacy regulations.

We’re here for you


Subscribe and receive in your inbox the latest news, updates, and content of interest on artificial intelligence.

Join us

Use cases

Blog posts

Book a demo

Madrid office

6 Pollensa Street, ECU Bldg.

2nd floor, Las Rozas,

Madrid 28290. Spain.

Tel.: +34 916492292

We are certified by:

WordPress Cookie Plugin by Real Cookie Banner