Do you know that 80% of all data in the world is unstructured. Think about email, office documents, blogs and social media (Tweeter). Im now investigating how to use Natural language processing technology to extract interesting facts from data. The project that im working on now involves scanning requests for job positions and determining what specific skills are needed. Im using the following tools:

  • Open source NLP software from GATE
  • Protege for building a background knowledge base (Ontology)
  • IBM LanguageWare

I will update article soon. If you want to know more take a look at the GATE project.

GATE project