Why can't regex find the "(" in a Japanese string in C++? -
I have a large file of Japanese example sentences. It is set so that there is a line phrase, and then the next line In the sentence separated by the word {}, () and [], in fact, I want to read a line from the file, only find the word in (), store it in a separate file, and then string them Remove from
I am trying to do this with Regexp here that is the text with which I am working:
は 二十 歳 (は た ち) { 20 歳} に な る [01] {に な り ま し た} And here's the code that I'm using to find stuff between ():
std :: smatch m; Std :: regex e ("\ (([^)] +) \)"); // (and) between matches (if std :: regex_search (component, m, e)) {printToTest (m [0] .rr, "what we got"); // Print for a test file "What we found:" & lt; & Lt; I [0] .rstrate () component = m. Prefix () Str (). Attachments (m.suffix (.) (Str)); // Component One String Printtost (Component, "[COMP_AFTER_REMOVAL]"); // print file "[COMP_AFTER_REMOVAL]:" & lt; & Lt; Components} should print here:
What we found: は た ち [COMP_AFTER_REMOVAL]: は 二十 歳 () {20 歳} にな る [01] {に な り ま し た} Here's what:
What we found: は 二十 歳 ([は は た ち [COMP_AFTER_REMOVAL] :) {20 歳} に な る [01] {に な り ま し た} It seems that in some way は be confused (Which goes regexp to は). I believe this is a problem with the method of reading through the file in the line. Maybe it is not being read as utf8 anyhow what I do here:
xml_document finalDoc; String sentence; String components; Infinite Infile; Infile.open ("examples.utf"); Unsigned int line = 0; String Line POS; Bool eof = infile.eof (); While (! Eof & amp; amp; amp; amp; & amp; amp; amp; amp; line & lt; 1) {getline (infile, sentence); Gateline (infiline, component); Mensency (sentence, components, final dock); Line ++; } Is something wrong? Any suggestions? Need more code? Please help. Thank you.
You forgot to avoid your backspace compiler "\ (([^]] +) \\ " and interprets it as (([^)] + /) which you did not want to regex.
You need to type "\\ ([[^ ^]] + \\"
Comments
Post a Comment