DESCRIPTION :
The internship will be co-supervised by Claudia Ignat, Director of recherche at Inria and Leo Joubert, MCF, Université de Rouen Normandie.
The internship is in the context of the PEPR eNSEMBLE (https://pepr-ensemble.fr/).
Mission confiée
Large-scale collaborative systems, where a large number of users collaborate to carry out a shared task, are attracting much attention from industry and academia. CSCW studies [1,2] showed that the awareness of behavior of other members of the team is an important component to compensate for the lack of direct communication. By allowing each member to be aware of what other members are doing, trust can be built in the team [3]. Trust is defined as an individual's willingness to become vulnerable to the actions of others with the expectation that others will follow through on their commitments [4]. Trust is more crucial in open-collaborative systems such as Wikipedia in which members usually do not know each other personally. However, it is difficult for end users to manually assess the level of trust in each partner, that is the credibility value that a user can attribute to another user based on their past interactions. This internship aims to study the problem of trust
evaluation and seeks to design a computational trust model dedicated to collaborative systems.
We are particularly interested in the case of Wikipedia, a collaborative online encyclopedia, because it provides us with a huge database produced by a large number of contributors.
On the platform, users can submit revisions of articles to improve their content. The objective of Wikipedia is to ensure the quality and neutrality of the platform's documents.
We already studied how the collaborative interaction of one user affects the trust assessed by the other users in the trust game [5] and contract-based multi-synchronous collaboration [6]. In the trust game [7] the interaction consisted of the money transaction between the two users, while in contract-based multi-synchronous collaboration the computation of trust was based on the adherence to/violation of contracts shared between two users. In the context of the trust game we also showed (i) that presenting a trust score to users encourages collaboration between them in a meaningful way, at a similar level to displaying participants' nicknames; (ii) that users conform to the confidence score in their decision-making regarding monetary exchange [8]. The results therefore suggest that a trust model can be deployed in collaborative systems in order to assist users. However, in Wikipedia, users do not interact directly, but by means of the article to which
they contribute. It is difficult to figure out how one user's edits might influence another user's edits.
Usually, scientific literature considers the quality of a contribution in relation to its lifetime on a page. The longer the content of the contribution is present, the higher its quality. The problem with this measure is that it excludes from the quality judgment both the mutual trust that contributors may have with each other, and the fact that Wikipedia rules justifying the deletion of contributions may apply differently from one page to another.
To advance towards this issue, we want to calculate a Wikipedia user's trust level in relation to their past contributions, this trust level being able to predict the quality of this user's future contributions. The trust metric proposed in [5, 6] to predict the behavior of users in relation to their past interactions and taking into account fluctuations in user behavior could be applied by considering that interactions between users are the user contributions to revisions of Wikipedia articles. The main challenge is to define the quality of a user's contributions. For this we plan to study existing metrics based on the length of contributions (for example the length of a contribution in terms of the number of characters added) and the longevity of contributions (edit longevity, for example the duration of persistence of a contribution in the article).
Our concept relies on the use of a distance (for example the Levenstein distance) between the different versions of the document. We would like to calculate a measure of longevity based on a semantic distance by using BERT [9, 11] and SMART [10] models and compare it with existing measures. Wikipedia provides a dataset containing articles that have been manually assessed for quality by experts [12][13]. We therefore wish to validate our algorithms for measuring the quality of user contributions on this data.
Bibliography:
[1] Jeremy P. Birnholtz and Steven Ibara. Tracking changes in collaborative writing: edits, visibility and group maintenance. In CSCW 2012. ACM, 809-818.
[2] Chyng-Yang Jang, Charles Steinfield, and Ben Pfaff. Virtual team awareness and groupware support: an evaluation of the TeamSCOPE system. Int. J. Hum.-Comput. Stud. 56, 1 (2002), 109-126.
[3] C Brad Crisp and Sirkka L Jarvenpaa. 2013. Swift trust in global virtual teams. Journal of Personnel Psychology (2013)
[4] Roger C. Mayer and Mark B. Gavin. 2005. Trust in management and performance: Who minds the shop while the employees watch the boss? Acad Manage J 48, 5: 874-888.
[5] Quang-Vinh Dang and Claudia-Lavinia Ignat. Computational trust model for repeated trust games. In Proceedings of the IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, pages 34-41, August 2016.
[6] Claudia-Lavinia Ignat and Quang-Vinh Dang. "Users trust assessment based on their past behavior in large scale collaboration". In: The IEEE International Conference on Intelligent Computer Communication and Processing (ICCP 2021). Cluj-Napoca, Romania, Oct. 2021, 19:1-19:8. doi: 10.1109/ICCP53602.2021.9733490. hal: hal-03469344.
[7] Joyce Berg, John Dickhaut, and Kevin McCabe. Trust, reciprocity, and social history. Games and economic behavior, 10(1):122--142, 1995.
[8] Claudia-Lavinia Ignat, Quang-Vinh Dang, and Valerie L. Shalin. The influence of trust score on cooperative behavior. ACM Transactions on Internet Technology, 19(4), 22 pages, November 2019.
[9] Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. "A Robustly Optimized BERT Pre-training Approach with Post-training". In: Proceedings of the 20th Chinese National Conference on Computational Linguistics. CCL 2021. Huhhot, China: Springer, Aug. 2021, pp. 471-484. doi: 10.1007/978-3-030-84186-7_31.
[10] Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. "SMART: Robust and Efficient Fine- Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization". In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, July 2020, pp. 2177-2190. doi: 10.18653/v1/2020.acl-main.197.
[11] Wei Wang et al. "StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding". In: Proceedings of the 8th International Conference on Learning Representations. ICLR 2020. Addis Ababa, Ethiopia: OpenReview.NET, Apr. 2020. url: https://openreview.NET/forum?id=BJgQ4lSFPH
[12] Morten Warncke-Wang, Dan Cosley, and John Riedl. Tell me more: an actionable quality model for Wikipedia. In Proceedings of OpenSym, 10 pages, August 2013.
[13] Morten Warncke-Wang, English Wikipedia Quality Assessment Dataset. Figshare, Dataset. https://doi.org/10.6084/m9.figshare.1375406.v2
Principales activités
* Study the existing trust metrics in collaborative systems
* Study existing works on article's quality in Wikipedia
* Propose a metric for the quality of user contributions based on the length and longevity of contributions (using both syntactic and semantic distances)
* Adapt the trust metric proposed in [5] for Wikipedia considering that user interactions during trust game are their contributions for article revisions
* Perform measurements using Wikipedia dataset
Niveau de formation : Bac+5
Temps partiel / Temps plein : Plein temps
Type de contrat : Stage/Jeune diplômé
Compétences : Analyse des Données, Sciences Cognitives, Linguistique Informatique, Programmation Informatique, Réseaux Informatiques, Bases de Données, Technologies Informatiques, Cerner CCL, Anglais, Capacité d'Analyse, Prise de Décision, Sens de la Communication, Réseautage, Esprit d'Équipe, Algorithmes, Mathématiques Appliquées, Calculs, Économie, Psychologie, Maintenance et Dépannage, Gestion de la Qualité, Documentation Scientifique, Métrique, Équipe Virtuelle
Courriel :
claudia.ignat@inria.fr
Téléphone :
0139635511
Type d'annonceur : Employeur direct