Showing posts with label Sentiment-Analysis. Show all posts
Showing posts with label Sentiment-Analysis. Show all posts

Monday, October 21, 2019

Gavagai Sentiment Analysis and Opinion Mining


.
Sentiment Analysis and Opinion Mining

Sentiment analysis, also known as opinion mining, is a practice of gauging the sentiment expressed in a text, such as a post in social media or a review on Google. Analysts typically code a solution (for example using Python), or use a pre-built analytics solution such as Gavagai Explorer.
What is Sentiment Analysis?
Sentiment analysis or opinion mining is a notoriously difficult sub-field of Natural Language Processing and Data Science. At the most fundamental level, the task is to take a piece of text and automatically score it for the opinions and sentiments contained within.

“I had the most wonderful stay” (= positive/satisfaction).
“I’m really disappointed with the battery life of my device” (= negative/dissatisfaction).
These examples are relatively easy to deal with. However, we soon run into problematic cases.

The phone was well packaged but I had to wait a whole week for delivery.
It is obvious for a human to infer that the customer is dissatisfied with the delivery speed. But, taking a step back, where it is actually mentioned that waiting a week for delivery is bad? There are no overtly negative words.

It is also important to separate the satisfaction with the packaging from the dissatisfaction with the delivery. These are different, unrelated aspects of the product.Sentiment AnalysisThere is an abundance of other difficulties with automatic sentiment analysis, including, but not limited to: lexical ambiguity, domain dependent model overfitting, lack of training data, lack of sufficiently-varied training data.

Why is Sentiment Analysis important?
Automated Sentiment Analysis is essential for properly understanding and quantifying the opinions expressed in the text. With large amounts of data, understanding the feedback in any meaningful way becomes time-consuming and expensive. On an Internet-wide scale, resorting to manual categorisation is impossible.

For online data, the insight lies in how people online are talking about your brand. For proprietary data, such as customer satisfaction or employee satisfaction reviews, the key business insight is in properly gauging the satisfaction level of respondents.

How does Gavagai handle Sentiment Analysis?
The most common sentiment analysis solutions in the industry use a machine learning (or deep learning) approach. An algorithm makes generalisations from large, annotated sets of data which are applied to customer texts. These models function as a ‘black box’ with no possibility of explanation or interpretation. Such an approach does also not transfer well to unseen data from other domains or industries.

Most services offer a binary classification (positive/negative) or a ternary classification (positive/negative/neutral). At Gavagai, we offer a wide spectrum of eight different sentiments: positivity, negativity, scepticism, love, hate, fear, desire and violence. This provides a more nuanced understanding of texts and comments.

We rely on a heuristic-based method which is explainable, interpretable and scalable. It has also proven to work well on gold standard benchmarks from academia. In experiments for customers, the method performs well across a range of different data types, freeing us from the classic Machine Learning problem of overfitting. (This is where model learns patterns that are too specific to the data it was trained on. This is at the expense of generalising well to unseen data. Dealing with new data is extremely important for commercial sentiment analysis).

A more advanced task is to identify how expressed opinions actually relate to the different entities in the text.

The food was delicious but the service was appalling.
In this last example, it is helpful if we can attach the sentiment of ‘delicious’ to ‘food’ and the sentiment of ‘appalling’ to ‘service’. We use a topical sentiment detection algorithm to attach sentiments in the text to the topics they describe. This is sometimes called aspect-based sentiment analysis.

Gavagai Explorer works with sentiment analysis in Azerbaijani, Albanian, Arabic, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Farsi, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Ukrainian, Urdu, and Vietnamese.
.
https://www.gavagai.io/text-analytics/sentiment-analysis-opinion-mining/
Read More

Friday, March 15, 2019

A Study on Sentiment Computing and Classification of Sina Weibo with Word2vec

.
In recent years, Weibo has greatly enriched people's life. More and more people are actively sharing information with others and expressing their opinions and feelings on Weibo. Analyzing emotion hidden in this information can benefit online marketing, branding, customer relationship management and monitoring public opinions. Sentiment analysis is to identify the emotional tendencies of the microblog messages, that is to classify users' emotions into positive, negative and neutral. This paper presents a novel model to build a Sentiment Dictionary using Word2vec tool based on our Semantic Orientation Pointwise Similarity Distance (SO-SD) model. Then we use the Emotional Dictionary to obtain the emotional tendencies of Weibo messages. Through the experiment, we validate the effectiveness of our method, by which we have performed a preliminary exploration of the sentiment analysis of Chinese Weibo in this paper.
.
https://www.researchgate.net/publication/286758692_A_Study_on_Sentiment_Computing_and_Classification_of_Sina_Weibo_with_Word2vec
.

.
Read More

A Joint Model for Chinese Microblog Sentiment Analysis

.
Topic-based sentiment analysis for Chinese microblog aims to identify the user attitude on specified topics. In this paper, we propose a joint model by incorporating Support Vector Machines (SVM) and deep neural network to improve the performance of sentiment analysis. Firstly, a SVM Classifier is constructed using N-gram, NPOS and sentiment lexicons features. Meanwhile, a convolutional neural network is applied to learn paragraph representation features as the input of another SVM classifier. The classification results outputted by these two classifiers are merged as the final classification results. The evaluations on the SIGHAN-8 Topic-based Chinese microblog sentiment analysis task show that our proposed approach achieves the second rank on micro average F1 and the fourth rank on macro average F1 among a total of 13 submitted systems.
.
https://www.researchgate.net/publication/301449007_A_Joint_Model_for_Chinese_Microblog_Sentiment_Analysis
.
Download
.


Read More

Towards Building a High-Quality Microblog-Specific Chinese Sentiment Lexicon

.
Due to the huge popularity of microblogging services, microblogs have become important sources of customer opinions. Sentiment analysis systems can provide useful knowledge to decision support systems and decision makers by aggregating and summarizing the opinions in massive microblogs automatically. The most important component of sentiment analysis systems is sentiment lexicon. However, the performance of traditional sentiment lexicons on microblog sentiment analysis is far from satisfactory, especially for Chinese. In this paper, we propose a data-driven approach to build a high-quality microblog-specific sentiment lexicon for Chinese microblog sentiment analysis system. The core of our method is a unified framework that incorporates three kinds of sentiment knowledge for sentiment lexicon construction, i.e., the word-sentiment knowledge extracted from microblogs with emoticons, the sentiment similarity knowledge extracted from words' associations among all the messages, and the prior sentiment knowledge extracted from existing sentiment lexicons. In addition, in order to improve the coverage of our sentiment lexicon, we propose an effective method to detect popular new words in microblogs, which considers not only words' distributions over texts, but also their distributions over users. The detected new words with strong sentiment are incorporated in our sentiment lexicon. We built a microblog-specific Chinese sentiment lexicon on a large microblog dataset with more than 17 million messages. Experimental results on two microblog sentiment datasets show that our microblog-specific sentiment lexicon can significantly improve the performance of microblog sentiment analysis.
.
https://www.researchgate.net/publication/301902793_Towards_Building_a_High-Quality_Microblog-Specific_Chinese_Sentiment_Lexicon
.
Read More

Sentiment Analysis for Chinese Microblog based on Deep Neural Networks with Convolutional Extension Features

.
Related research for sentiment analysis on Chinese microblog is aiming at the analysis procedure of posts. The length of short microblog text limits feature extraction of microblog. Tweeting is the process of communication with friends, so that microblog comments are important reference information for related post. A contents extension framework is proposed in this paper combining posts and related comments into a microblog conversation for features extraction. A novel convolutional auto encoder is adopted which can extract contextual information from microblog conversation as features for the post. A customized DNN(Deep Neural Network) model, which is stacked with several layers of RBM (Restricted Boltzmann Machine), is implemented to initialize the structure of neural network. The RBM layers can take probability distribution samples of input data to learn hidden structures for better high level features representation. A ClassRBM (Classification RBM) layer, which is stacked on top of RBM layers, is adopted to achieve the final sentiment classification label for the post. Experimental results show that, with proper structure and parameters, the performance of proposed DNN on sentiment classification is better than state of the art surface learning models such as SVM or NB, which proves that the proposed DNN model is suitable for short-length document classification with proposed feature dimensionality extension method.
.
https://www.researchgate.net/publication/303952937_Sentiment_Analysis_for_Chinese_Microblog_based_on_Deep_Neural_Networks_with_Convolutional_Extension_Features
.


Read More

Context-Aware Chinese Microblog Sentiment Classification with Bidirectional LSTM

.
Recently, with the fast development of the microblog, analyzing the sentiment orientations of the tweets has become a hot research topic for both academic and industrial communities. Most of the existing methods treat each microblog as an independent training instance. However, the sentiments embedded in tweets are usually ambiguous and context-aware. Even a non-sentiment word might convey a clear emotional tendency in the microblog conversations. In this paper, we regard the microblog conversation as sequence, and leverage bidirectional Long Short-Term Memory (BLSTM) models to incorporate preceding tweets for context-aware sentiment classification. Our proposed method could not only alleviate the sparsity problem in the feature space, but also capture the long distance sentiment dependency in the microblog conversations. Extensive experiments on a benchmark dataset show that the bidirectional LSTM models with context information could outperform other strong baseline algorithms.
.
https://www.researchgate.net/publication/308188542_Context-Aware_Chinese_Microblog_Sentiment_Classification_with_Bidirectional_LSTM
.


Read More

Chinese Microblog Sentiment Analysis Based on Sentiment Features

.
As the microblog has increasingly become an information platform for netizens to share their ideas, the study on the sentiment analysis of microblog has got scholars’ wide attention both at home and abroad. The primary goal of this research is to improve the accuracy of microblog sentiment polarity classification. With a view to the characteristics of microblog, a new method of semantically related feature extraction is proposed. Firstly, the Chinese word features are selected by text presentation in VSM and computing the weight by TF*IDF. Secondly, the proposed eight microblog semantic features are extracted, including sentence sentiment judgment based on emotional dictionary. Finally, three kinds of machine learning methods are used to classify the Chinese microblog under the feature vector combining the two methods. The experimental results indicate that the proposed feature extraction method outperforms the state-of-the-art approaches, and for this feature extraction algorithm, the classification performance is best when using the Naïve Bayes algorithm.
.
https://www.researchgate.net/publication/308327580_Chinese_Microblog_Sentiment_Analysis_Based_on_Sentiment_Features
.


Read More

Sentiment Target Extraction Based on CRFs with Multi-features for Chinese Microblog

.
Sentiment target extraction on Chinese microblog has attracted increasing research attention. Most previous work relies on syntax, such as automatic parse trees, which are subject to noise for informal text such as microblog. In this paper, we propose a modified CRFs model for Chinese microblog sentiment target extraction. This model see the sentiment target extraction as a sequence-labeling problem, incorporating the contextual information, syntactic rules and opinion lexicon into the model with multi-features. The major contribution of this method is that it can be applied to the texts in which the targets are not mentioned in the sequence. Experimental results on benchmark datasets show that our method can consistently outperform the state-of-the-art methods.
.
https://www.researchgate.net/publication/308499279_Sentiment_Target_Extraction_Based_on_CRFs_with_Multi-features_for_Chinese_Microblog
.


Read More

An approach to sentiment analysis of short Chinese texts based on SVMs

.
... Experimental results have shown that the Naive Bayes classifier performs the best. Approach to analyze the sentiment of short Chinese texts is presented in [9]. By using word2vec tool, sentiment dictionaries from NTU and HowNet are extended. ...
.
https://www.researchgate.net/publication/308868603_An_approach_to_sentiment_analysis_of_short_Chinese_texts_based_on_SVMs
.

Read More

A Dynamic Conditional Random Field Based Framework for Sentence-Level Sentiment Analysis of Chinese Microblog

.

.
https://www.researchgate.net/publication/319051638_A_Dynamic_Conditional_Random_Field_Based_Framework_for_Sentence-Level_Sentiment_Analysis_of_Chinese_Microblog
.
Read More

Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary

.
Micro-blog texts contain complex and abundant sentiments which reflect user's standpoints or opinions on a given topic. However, the existing classification method of sentiments cannot facilitate micro-blog topic monitoring. To solve this problem, this paper presents a sentiment analysis method for Chinese micro-blog text based on the sentiment dictionary to support network regulators' work better. First, the sentiment dictionary can be extended by extraction and construction of degree adverb dictionary, network word dictionary, negative word dictionary and other related dictionaries. Second, the sentiment value of a micro-blog text can be obtained through the calculation of the weight. Finally, micro-blog texts on a topic can be classified as positive, negative and neutral. Experimental results show the effectiveness of the proposed method.
.
https://www.researchgate.net/publication/320682973_Sentiment_analysis_of_Chinese_micro-blog_text_based_on_extended_sentiment_dictionary
.

Read More

Research on sentiment analysis of microblogging based on LSA and TF-IDF

.
... The anonymity of Weibo makes peo- ple being willing to express their real sentiments. Many existing techniques of sentiment analysis are based on sen- timent lexicons and traditional feature engineering [1]- [8]. Most of these methods need resort to external resource or manually preprocess features of words. ...
.
https://www.researchgate.net/publication/324462809_Research_on_sentiment_analysis_of_microblogging_based_on_LSA_and_TF-IDF
.
Read More

Multi-label Chinese Microblog Emotion Classification via Convolutional Neural Network

.
Recently, analyzing people’s sentiments in microblogs has attracted more and more attentions from both academic and industrial communities. The traditional methods usually treat the sentiment analysis as a kind of single-label supervised learning problem that classifies the microblog according to sentiment orientation or single-labeled emotion. However, in fact multiple fine-grained emotions may be coexisting in just one tweet or even one sentence of the microblog. In this paper, we regard the emotion detection in microblogs as a multi-label classification problem. We leverage the skip-gram language model to learn distributed word representations as input features, and utilize a Convolutional Neural Network (CNN) based method to solve multi-label emotion classification problem in the Chinese microblog sentences without any manually designed features. Extensive experiments are conducted on two public short text datasets. The experimental results demonstrate that the proposed method outperforms strong baselines by a large margin and achieves excellent performance in terms of multi-label classification metrics.
.
https://www.researchgate.net/publication/308187960_Multi-label_Chinese_Microblog_Emotion_Classification_via_Convolutional_Neural_Network
.

Read More

Sentiment Analysis of Chinese Microblog Based on Stacked Bidirectional LSTM

.
Sentiment analysis on Chinese microblogs has received extensive attention recently. Most previous studies focus on identifying sentiment orientation by encoding as many wordproperties as possible while they fail to consider contextual features (e.g., the long-range dependencies of words), which are however essentially important in the sentiment analysis. In this paper, we propose a Chinese sentiment analysis method by incorporating Word2Vec model and Stacked Bidirectional long short-term memory (Stacked Bi-LSTM) model. We first employ Word2Vec model to capture semantic features of words and transfer words into high dimensional word vectors. We evaluate the performance of two typical Word2Vec models: Continuous Bag-of-Words (CBOW) and Skip-gram. We then use Stacked Bi-LSTM model to conduct the feature extraction of sequential word vectors. We next apply a binary softmax classifier to predict the sentiment orientation by using semantic and contextual features. Moreover, we also conduct extensive experiments on real dataset collected from Weibo (i.e., one of the most popular Chinese microblogs). The experimental results show that our proposed approach achieves better performance than other machine learning models.
.
https://www.researchgate.net/publication/331752074_Sentiment_Analysis_of_Chinese_Microblog_Based_on_Stacked_Bidirectional_LSTM
.
Read More