Documents management tool
Cloud computing, Data science, Web, Fintech
Table recognition from PDF and other sources
Our client is a large-scale hedge fund based in New York City, USA. The company processes a massive amount of documents every day to analyze market trends and increase the accuracy of management decision-making. To accelerate this area of analytics, our client was looking to build a tool that would assist in gathering data from PDF documents.
The project’s goal was the development of a backend tool able to recognize text, which is formatted as a PDF table to allow automated processing of data contained in PDF files. The tool would convert a PDF into CSV or other formats that can be parsed by analytics tools our client uses to generate valuable insights from data.
Dedicated project team
PDF is one of the most popular formats for reports today thanks to its guarantee of compatibility across the different applications. We set up a dedicated project team that included one experienced software engineer to help the company boost its analytics capabilities and beat their competition. Our developer analyzed the problem and delivered a Proof of Concept to be added to our client’s analytics solution.
The solution developed by our team is made of two components. The first one converts PDFs file into a binary format that can be used in backend programming language and recognizing tables. The second one recognizes when it’s parsing a table by using a host of different criteria. Our team equipped the solution with powerful machine learning capabilities that allow dealing with complex cases - for example, reports featuring multiple images and tables.