We'll first print the lemmatizations for the words in the two sentences in our dataset: for sentence in annot_doc: If you look at the above script carefully, you can find the POS tags, named entities and lemmatized version of each word. In the output, you should see a JSON object as follows: The final parameter is the timeout in milliseconds which defines the time that the wrapper should wait for the response from the server before timing out. The possible values are json for JSON objects, xml for XML format, text for plain text, and serialize for serialized data. The outputFormat variable defines the format in which you want the annotated text. We pass 'ner, pos' as the value for the annotator parameter which specifies that we want to annotate our document for POS tags and named entities. The annotator parameter takes the type of annotation we want to perform on the text. We use the annotate method of the StanfordCoreNLP wrapper object that we initialized earlier. In the script above we have a document with two sentences. doc = "Ronaldo has moved from Real Madrid to Juventus. The words are then annotated with the POS and named entity recognition tags. In the following script, we will create an annotator which first splits a document into sentences and then further splits the sentences into words or tokens. The StanfordCoreNLP library supports pipeline functionality that can be used to perform these tasks in a structured way. Lemmatization, parts of speech tagging, and named entity recognition are the most basic NLP tasks. Lemmatization, POS Tagging and Named Entity Recognition In this section, we will briefly explore the use of StanfordCoreNLP library for performing common NLP tasks. Nlp_wrapper = StanfordCoreNLP( ' Performing NLP Tasks Look at the following script: from pycorenlp import StanfordCoreNLP The object returned can then be used to perform NLP tasks. To connect to the server, we have to pass the address of the StanfordCoreNLP server that we initialized earlier to the StanfordCoreNLP class of the pycorenlp module. Now we are all set to connect to the StanfordCoreNLP server and perform the desired NLP tasks.
The following script downloads the wrapper library: $ pip install pycorenlp The wrapper we will be using is pycorenlp. Now the final step is to install the Python wrapper for the StanfordCoreNLP library. INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
INFO CoreNLP - to use shift reduce parser download English models jar from: INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/ instead INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/ INFO CoreNLP - setting default constituency parser Once you run the above command, you should see the following output: INFO CoreNLP - StanfordCoreNLPServer#main() called. If you are running a 32-bit system, you might have to reduce the memory size dedicated to the server. It is important to mention that you should be running 64-bit system in order to have a heap as big as 6GB. The parameter -m圆g specifies that the memory used by the server should not exceed 6 gigabytes. The above command initiates the StanfordCoreNLP server. $ java -m圆g -cp "*" .StanfordCoreNLPServer -timeout 10000 Navigate inside the folder and execute the following command on the command prompt: Navigate to the path where you unzipped the JAR files folder. Next thing you have to do is run the server that will serve the requests sent by the Python wrapper to the StanfordCoreNLP library. To download the JAR files for the English models, download and unzip the folder located at the official StanfordCoreNLP website. The JAR file contains models that are used to perform different NLP tasks. Once you have Java installed, you need to download the JAR files for the StanfordCoreNLP libraries. You can download the latest version of Java freely. Therefore make sure you have Java installed on your system. As a matter of fact, StanfordCoreNLP is a library that's actually written in Java. The installation process for StanfordCoreNLP is not as straight forward as the other Python libraries. So before wasting any further time, let's get started. We will see different features of StanfordCoreNLP with the help of examples. In this article, we will explore StanfordCoreNLP library which is another extremely handy library for natural language processing. Before that we explored the TextBlob library for performing similar natural language processing tasks. In the previous article, we saw how Python's Pattern library can be used to perform a variety of NLP tasks ranging from tokenization to POS tagging, and text classification to sentiment analysis.
This is the ninth article in my series of articles on Python for NLP.