Removing Stop Words
I have a task to do word counting for some articles. The detail of the task is getting the list of id from Elasticsearch, get the content from ArangoDB, then do some text processing to clean the content and counting the word frequency. After did it with Scala, Go, and Python, I found out that it is very slow when I am doing it with Python. Doing it with Scala and Go only take around 3-4 seconds to process 12,563 articles. But when we do with Python, it takes around 15-18 seconds. And after do some profiling, finally I found out that it is very slow to remove any stopwords from big number of articles. I am using common method in Python to remove the stopwords. ...