Conversion to data format for crfsharp ...

Mar 20, 2014 at 10:29 AM
I have a a review data set of about 250000 reviews of hotels, I'm planing to extract aspects from it using crfsharp dll, however the data that I have is in normal text paragraph form and I need to convert it into the format of crfsharp given here so I can train and test data to extract aspects. Well can someone tell me what will be the best way to do that, I was thinking of writing a small program for data format conversion .
Mar 20, 2014 at 1:40 PM
What's features and tags will you use in your task ?
There is a simplest example. For a sentence "! Tokyo and New York are major financial centers." If you want to extract location name from it and your only feature is token string, you can generate training corpus as belows:

and NOR
are NOR
major NOR
financial NOR
centers NOR

The first column is the term of the sentence, the second column is the corresponding tags. NOR means normal term, LOCATION means location name. You can generate training corpus as above format and use CRFSharp to train a model.

For more complex example, such as more features, template, adding word position in tags, you can refer another example in home page.

Zhongkai Fu
Jun 7, 2016 at 4:42 AM
I am currently working on Urdu Word Segmentation and wanna to implement this Model for Urdu Word Segmentation. How i train data format for Urdu text in CRFSharp i,e i need Data format for Urdu text??? please give me suggestions if possible.
Thank you.
Sadiq Nawaz
Jun 9, 2016 at 4:30 AM
wiating for reply?????????????????