Conversion to data format for crfsharp ...

Mar 20, 2014 at 11:29 AM
I have a a review data set of about 250000 reviews of hotels, I'm planing to extract aspects from it using crfsharp dll, however the data that I have is in normal text paragraph form and I need to convert it into the format of crfsharp given here so I can train and test data to extract aspects. Well can someone tell me what will be the best way to do that, I was thinking of writing a small program for data format conversion .
Coordinator
Mar 20, 2014 at 2:40 PM
What's features and tags will you use in your task ?
There is a simplest example. For a sentence "! Tokyo and New York are major financial centers." If you want to extract location name from it and your only feature is token string, you can generate training corpus as belows:

! NOR
Tokyo LOCATION
and NOR
New LOCATION
York LOCATION
are NOR
major NOR
financial NOR
centers NOR
. NOR

The first column is the term of the sentence, the second column is the corresponding tags. NOR means normal term, LOCATION means location name. You can generate training corpus as above format and use CRFSharp to train a model.

For more complex example, such as more features, template, adding word position in tags, you can refer another example in home page.

Thanks
Zhongkai Fu
Jun 7, 2016 at 5:42 AM
I am currently working on Urdu Word Segmentation and wanna to implement this Model for Urdu Word Segmentation. How i train data format for Urdu text in CRFSharp i,e i need Data format for Urdu text??? please give me suggestions if possible.
Thank you.
Sadiq Nawaz
Jun 9, 2016 at 5:30 AM
wiating for reply?????????????????