ruby - How to match unicode character code point 12288 using regexp -
I found an empty character in the user input text (as shown in the database and the webpage as an empty space) and I Would like to do it out
The unicode code point for the letter I get is 12288 (I think this character is, am I right?).
How can I match it to Ruby using RegExp?
Thank you very much for your help.
Update:
Hi guys, Thanks a lot. I actually learned a lot from your answers, but it still does not work
I have found that my user is not the character character of the input, but . By calling callpoints mails 12288 \ u12288 is not working Why is it?
& gt; Str = note.public_stripped_content = & gt; "权谋 术, 在 古代 称之为 帝王 术 ..." & gt; Str.encoding = & gt; # & Lt; Encoding: UTF-8 & gt; & Gt; Str [0] .codpoint = & gt; [12288] & gt; "\ U12288" = & gt; "ረ 8" & gt; "\ U12288" [0] = & gt; "ረ" & gt; "\ U12288" [1] = & gt; "8" & gt; "\ U12288" [0] .codpoint = & gt; [4648] & gt; "\ U12288" [1] .codpoint = & gt; [56] What have I done with Rail Rail Console (you can ignore Chinese characters and problematic characters are leading spaces.)
& Gt; Str = note.public_stripped_content = & gt; "权谋 术, 在 古代 称之为 帝王 术 ..." & gt; Str.encoding = & gt; # & Lt; Encoding: UTF-8 & gt; & Gt; Str [0] .codpoint = & gt; [12288] & gt; Str.delete ("\ u12288") => "权谋 术, 在 古代 称之为 帝王 术 ..." & gt; Str [0] .codpoint = & gt; [12288] & gt; Print /?u12288/.match(str) = & gt; Zero & gt; Str.gsub (/ \ p {cuneiform} / u, '') = & gt; "权谋 术, 在 古代 称之为 帝王 术 ..."
CodePoint returns an array of integers, which are printed as decimal values. In String Literals, you must use the hexadecimal value to specify the characters by code points. You can do the result of map as the codepoints :
string = "权谋 术, 在 古代 称之为 帝王 术 .. . "String.codepoints # = & gt; [12288, 12288, 26435, ...] string. Codepoints.map {| C | C.to_s (16)} # = & gt; ["3000", "3000", "6743", ...] You have the required real codepages 3000. If you want to remove it, you do not need it, you will make a call to Regexp, Delete (or delete! , if appropriate):
string.delete ("\ u3000") # = & gt; To specify "point", "权谋 术, 在 古代 称之为 帝王 术 ..." update : it should be wrapped in braces :
"\ u12288" .codepoints # = & gt; [4648, 56] "\ u {12288}". CodePoint # = & gt; [74376]
Comments
Post a Comment