Comunidad de Madrid

Anonymization of rulings through OCR and NLP

Sector: Public · Cliente: Comunidad de Madrid

Region
Spain
Project duration
3 months

Technologies used

  1. Optical Character Recognition (OCR)
  2. AI Models: LLM, entity tagger
  3. Machine Learning Models

What do we resolve?

In this successful case, we addressed the challenge of managing and analyzing a large set of rulings from the Regional Institute for Consumer Arbitration of the Autonomous Community of Madrid. The project aimed to transform 4,677 rulings from the years 2015, 2016, and 2017 into accessible and structured data. The complexity lay in dealing with diverse formats, including scanned documents, and the need to classify and anonymize them for subsequent analysis. 

How do we resolve it?

To tackle this challenge, we implemented an advanced technological solution based on Optical Character Recognition (OCR) models and natural language processing techniques. We employed OCR to convert scanned documents into digital text, enabling subsequent classification and labeling of the rulings based on their content. Additionally, we used language models to identify sensitive entities that required anonymization, such as names of individuals or involved companies.

The process consisted of several phases:

  1. Text Extraction: We utilized OCR technology to extract text from scanned documents, ensuring accurate conversion even for documents of varying quality.
  2. Automated Classification: Machine learning models were employed to automatically classify the rulings based on their content and type.
  3. Data Anonymization: We identified and anonymized sensitive entities present in the documents, ensuring compliance with data protection regulations.

What results do we achieve?

Improved Accessibility:
The 4,677 rulings were transformed into structured digital data that was easy to access and query.

Operational Efficiency: Automating the classification and anonymization process significantly reduced the time and resources required for analyzing the rulings. 

Regulatory Compliance:
We ensured the protection of sensitive data through anonymization, thereby
complying with privacy regulations.
 

We’re here for you

We are certified by:

SelloAENORISO27001_NEG
member
IQNet
WordPress Cookie Plugin by Real Cookie Banner