vectorization - Vectorizing LIst of Unique Words into 0 or 1 using Python -

March 15, 2013

I am quite new in Python, and recently on a few text processing to have a cozy parallel between two text To do.

I have been able to present the text at present, such as lowercase them, removing text tokenizing stopwords and using NLTK libraries on basic pre-processing on the creation of those words. And now, I've been able to create a list of unique words from all text files.

Then, now I have made a list of unique words, there are only a few words that I have to vector to 1 (and the rest of 0) according to a text file to me.

< P> For example, after vectoring the list of unique words, it should look like the following:

  terrible | Best | Move Elephant | Fly | Home | Irresponsible Vested 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0

I tried googling and here's to see through the stack overflow, but it seems to be using one of the common solutions scikit is knowing - clearance in the list change Has the facility. However, I only want 0 or 1 ... and 1 should be specified by a text file. For example, there is a textfile (after doing all the vectors in 1) which I would like to calculate the similarity with this dictionary ... so it should look something like the following:

Text_to_Compare.txt

  terrible | Fly | Vested 1 | 1 | 1

And then, I will compare "Text_to_Compare.txt" to the list of unique words and calculate the similarity result.

Does anyone please tell me how can I shrink the list of unique words only in 0 or 1, and alert "Text_to_Compare.txt" for all 1?

Thank you!

Do you want to do this?

  text_file = ['hello', 'world', 'test'] term_dict = {'something': 0, 'word': 0, 'world': 0} in the text_file of the word For: If the word is in term_dict: term_dict [word] = 1

you have been tokenized your file ( .split () method in dragon), then they A list will be available. Assuming that you have generalized each word (reduced, hard work, stripped of punctuation marks) in your dictionary and your text_file, then the above code should work. Just set your values to 0, and loop your file, to see if the word is in in the dict . If so, set that value to 1.

  Here is how you can create a word with the values set to 0: 
   new_dict = {word: 0} in text_file} for word  
  This is the one. Again, note that my code assumes that you are normalizing all the conditions - comparison of apples to apples - and that is always important when working with text. 
  Last edit if you have two lists of unique posts (after token and normalization) 
   def general (word): #do stuff - i.e., lower; Stem; Strip punctuation; Etc. passed word_list_one = [text_doc.split for the term ()] word_list_two = [(word) to other_text_doc.split () in general] # If you know the longest list of your list, then you One can create two lists, and the dictionary of zero is word_dict = dict ([(word, 1) if word_list_one words in word (word, 0) for word_list_t 2] # that it is in the above code, word_list_two your two lists There should be more intensity (to handle that I understand your code properly) #n A person with more dragon experience can definitely improve my code. I just wanted to show you another option  
  Please tell me what does this work for you Hope this helps a bit!




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




python - Overriding the save method in Django ModelForm -






March 15, 2014








    I'm having trouble overriding a  ModelForm  save method. I get this error I'm:    Exception type: TypeErier Exception Value: Save () found an unexpected keyword argument 'committed'    My intentions include a form 3 fields Submit several values for, then create an object for each combination of those areas, and to save each object   file  models.py     class collarsult type (models.Model): id = model. AutoField (db_column = 'icontact_result_code_type_id', primary_key = true) callResult = models.ForeignKey ( 'CallResult', db_column = 'icontact_result_code_id') campaign = models.ForeignKey ( 'campaign', db_column = 'icampaign_id') CALLTYPE = models.ForeignKey ( 'CALLTYPE', db_column = 'icall_type_id') agent = models.BooleanField (db_column = 'bagent', default = true) teamLeader = models.BooleanField (db_column = 'bTeamLeader', default = true) active = models.BooleanField (db_column = Django.form...





Read more





html - CSS autoheight, but fit content to height of div -






February 15, 2014








    I have a div in which there are three children- div in the left and middle div:    float: left    while true:    float: true    because it is my layout Actually will spoil, I've used ClearFix Hack:    .cf {zoom: 1; }. Cf: First,. Cf: {content: ""; Display: Table; }. Cf: After {clear: both; }    It still works, but I want to be the right div an indicator. So it should fill 100% of the parent unit of height.   How can I complete this?   PS is the full code:     & Lt; / Div & gt; & Lt; Div class = "mobile_content" & gt; & Lt; H5 & gt; {{Data.titel}} & lt; / H5> & Lt; Table & gt; & Lt; TR & gt; & Lt; Th & gt; Komponist: & lt; / Th & gt; & Lt; TD & gt; {{Data.komponist}} & lt; / TD & gt; & Lt; / TR & gt; & Lt; TR & gt; & Lt; Th & gt; Instrument: & lt; / Th & gt; & Lt; TD & gt; {{Data.instrumente}} & lt; / TD & gt; & Lt; / TR & gt;...





Read more





qt - How to prevent QAudioInput from automatically boosting the master
volume to 100%? -






July 15, 2014








    I have been trying to use Qt5 multimedia to record audio with QAudioInput, however, when I see If  QAudioInput  is started, it increases the master volume of my sound device to 100%.   How can I prevent QAudioInput from changing the master volume?   My current development platform is Linux with PalsAdio (with flat audio disabled).   How can I use  QAudioInput :    QAudioDeviceInfo device_info = QAudioDeviceInfo :: defaultInputDevice (); QAudio format format; Format.setSampleRate (44100); Format.setChannelCount (1); Format.setSampleSize (16); Format.setCodec ("Audio / PCM"); Format.setSampleType (QAudioFormat :: SignedInt); Format.setByteOrder (QAudioFormat :: LittleEndian); Std :: cout & lt; & Lt; Device_info.deviceName (). ToUtf8 (). ConstData () & lt; & Lt; Std :: endl; QAudioInput * default_device = new QAudioInput (device_info, format); QIODevice * default_io_device = default_device-> Start ();       a  QAudioInput.setVolume ()  method is not you I ...





Read more

Search This Blog

Raj T

vectorization - Vectorizing LIst of Unique Words into 0 or 1 using Python -

Comments

Post a Comment

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -