The Atlas Project - Revolutionizing Data Aggregation for Auctions and Tenders

Quintagroup developed an automated data collection system for auctions, offers, and tenders, leveraging Python, Selenium, and AWS technologies

Overview


The Atlas Project was conceived to address a pressing need for our client: the ability to gather and centralize up-to-date information on auctions, commercial offers, tenders, and similar opportunities. By leveraging advanced web scraping and automation techniques, we developed a solution that efficiently compiles relevant data from a vast array of sources.

The Atlas Design


The Challenge


Our client faced several significant burdens:

  • Numerous Sources: With over 4000 websites to monitor, the sheer volume of data sources was overwhelming.
  • Lack of APIs: Most of these websites did not provide APIs, complicating the data extraction process.
  • High Volume of Data: Each website publishes numerous offers daily, requiring extensive filtering and compliance checks against specific criteria.
  • Initial Data Processing: Extracting valuable information from raw data was necessary to make it useful for end-users.

The Solution


To tackle these challenges, we developed a sophisticated automated script employing web scraping. This script allows for efficient and rapid data extraction from a continually growing list of websites. Our approach includes:

  • Simple Websites: Capture a snapshot of the webpage and extract necessary data directly.
  • Complex Websites: Utilize browser automation libraries to simulate user actions and retrieve precise data.
  • Websites with APIs: Integrate with available APIs to streamline data acquisition.

We generated a comprehensive list of pertinent offers by combining the scraped data with targeted keywords.


Implementation


Our solution comprised several key components and tasks:

1. Developing Python Console Commands

  • Gradually transition its functionality to Python.
  • Extract data from the outdated script’s database for analyst processing.
  • Periodically update and extract data from new websites for further analyst review.
  • Maintain the outdated script that gathers data from a fixed set of websites.

2. Custom Instructions with XPath and Selenium

  • Create specific commands to extract information from individual websites.
  • Implement the Robot Framework and Playwright for an alternative method of instruction writing.

3. Artificial Intelligence Integration

  • Utilize AI to process files related to offers and extract useful information.

4. Website-Specific Instructions

  • Develop and maintain tailored instructions for extracting useful data from various websites.

5. CI/CD Process Development

  • Establish a continuous integration and deployment pipeline for code testing, analysis, and deployment.


6. Containerization

  • Migrate from a constantly running virtual machine to an on-demand container-based solution.


Business Value


The Atlas Project delivered substantial value to our client:

  • Automation: Streamlined the search process according to predefined criteria, eliminating human error.
  • Centralization: Consolidated data from various sources into an external system for easier access and management.
  • Efficiency: Automated initial data processing to extract and highlight critical information.
  • Cost Reduction: Lowered infrastructure maintenance costs through efficient resource utilization.


Technologies Used

  • Command Line Interface: Python, Click
  • Automation Frameworks: Selenium, Robot Framework, Playwright
  • Cloud Services: AWS EC2, Fargate, CloudWatch
  • Artificial Intelligence: OpenAI
  • Additional Tools: Docker, Git, CI/CD


Conclusion


The Atlas Project stands as a testament to our commitment to innovative solutions and client satisfaction. By leveraging cutting-edge technologies and methodologies, we transformed a complex data aggregation challenge into a streamlined, automated process that delivers reliable and timely information to our client's customers.

Connect with our experts Let's talk