We also varied the recognition features provided to the techniques, using both character and token n-grams.
For all techniques and features, we ran the same 5-fold cross-validation experiments in order to determine how well they could be used to distinguish between male and female authors of tweets.
006 12 29 Points 1316 Partenaires vivaocs target blanc baznas FWD V4 solid 000 safiweb hostma 00px 3px vertical love jiji bientot hichamtoldo skyblog blank siro tssalo mehdibono wesh houssam salam sarah slt tt monde lkhassar sqal 07 wlad asfi t9admo walah mdintkom wa3ra mais ntoma mhachrine m simo simoraymy mimo moi meryem safi c est mon msn mailto soso 2005 mousi9a net hicham toldo ach hadak chi sadi9 dyalach site adrianhicham 3l makshof tamo sba7 lkhayre sba7ato lilah manak miss kawtar salut yala9ina m3a ma7san mana ou tanatmana matab9awche tkhasro fi lhadra awlade khalti msa tupac saha hi everybody souma ha7na left Votre Message auteur maxlenght msg send Voir archives google 160 600 160x600 E1771E 006699 addv Ajouter Une addm addi Photo addt Telechargement addp Devenez partenaire Signaler bug erreur Contacter 250 Codage Design par Mohamed Yassine 0021274185715 N° 17 Bloc 62 Saida 46000 ligne 94 Total 65559 Corpyright Tous droits r?
Gender recognition has also already been applied to Tweets. (2010) examined various traits of authors from India tweeting in English, combining character N-grams and sociolinguistic features like manner of laughing, honorifics, and smiley use.An interesting observation is that there is a clear class of misclassified users who have a majority of opposite gender users in their social network. When adding more information sources, such as profile fields, they reach an accuracy of 92.0%.172 For Tweets in Dutch, we first look at the official user interface for the Twi NL data set, Among other things, it shows gender and age statistics for the users producing the tweets found for user specified searches.In the following sections, we first present some previous work on gender recognition (Section 2). Currently the field is getting an impulse for further development now that vast data sets of user generated data is becoming available. (2012) show that authorship recognition is also possible (to some degree) if the number of candidate authors is as high as 100,000 (as compared to the usually less than ten in traditional studies).Then we describe our experimental data and the evaluation method (Section 3), after which we proceed to describe the various author profiling strategies that we investigated (Section 4). Gender Recognition Gender recognition is a subtask in the general field of authorship recognition and profiling, which has reached maturity in the last decades(for an overview, see e.g. Even so, there are circumstances where outright recognition is not an option, but where one must be content with profiling, i.e.We then experimented with several author profiling techniques, namely Support Vector Regression (as provided by LIBSVM; (Chang and Lin 2011)), Linguistic Profiling (LP; (van Halteren 2004)), and Ti MBL (Daelemans et al.