New methods for metadata extraction from scientific literature

Widok

1 / 362

Metadane zasobu

Tytuł	New methods for metadata extraction from scientific literature Wariant tytułu: Nowe metody wydobywania metadanych z literatury naukowej
Osoby	Autorzy: Dominika Beata Tkaczyk Partner: Instytut Badań Systemowych PAN w Warszawie
Opis	Spreading the ideas and announcing new discoveries and findings in the scientific world is typically realized by publishing and reading scientific literature. Within the past few decades we have witnessed digital revolution, which moved scholarly communication to electronic media and also resulted in a substantial increase in its volume. Nowadays keeping track with the latest scientific achievements poses a major challenge for the researchers. Scientific information overload is a severe problem that slows down scholarly communication and knowledge propagation across the academia. Modern research infrastructures facilitate studying scientic literature by providing intelligent search tools, proposing similar and related documents, building and visualizing interactive citation and author networks, assessing the quality and impact of the articles using citation-based statistics, and so on. In order to provide such high quality services the system requires the access not only to the text content of stored documents, but also to their machine-readable metadata. Since in practice good quality metadata is not always available, there is a strong demand for a reliable automatic method of extracting machine-readable metadata directly from source documents. Our research addresses these problems by proposing an automatic, accurate and flexible algorithm for extracting wide range of metadata directly from scientific articles in born-digital form. Extracted information includes basic document metadata, structured full text and bibliography section. Designed as a universal solution, proposed algorithm is able to handle a vast variety of publication layouts with high precision and thus is well-suited for analyzing heterogeneous document collections. This was achieved by employing supervised and unsupervised machine-learning algorithms trained on large, diverse datasets. The evaluation we conducted showed good performance of proposed metadata extraction algorithm. The comparison with other similar solutions also proved our algorithm performs better than competition for most metadata types. Proposed method is a reliable and accurate solution to the problem of extracting the metadata from documents. It allows modern research infrastructures to provide intelligent tools and services supporting the process of consuming the growing volume of scientic literature by the readers, which results in facilitating the communication among the scientists and the overall improvement of the knowledge propagation and the quality of the research in the scientic world. (Angielski)
Słowa kluczowe	"eksploracja danych"@pl, "analiza dokumentów"@pl, "wydobywanie metadanych"@pl, "uczenie maszynowe"@pl, "Machine Learning"@en
Klasyfikacja	Typ zasobu: praca dyplomowa Dyscyplina naukowa: dziedzina nauk technicznych / informatyka (2011) Grupa docelowa: naukowcy, studenci, przedsiębiorcy Szkodliwe treści: Nie
Charakterystyka	Miejsce powstania: Warszawa Czas powstania: 2015 Liczba stron: 180 Promotor: Marek Antoni Niezgódka Język zasobu: Polski Lokalizacja: Warszawa
Licencja	CC BY-SA 4.0
Informacje techniczne	Deponujący: Anna Wasilewska Data udostępnienia: 15-10-2018
Kolekcje	Kolekcja Instytutu Badań Systemowych PAN w Warszawie, Kolekcja e-Biblio IBS PAN

Cytowanie

Skopiowano

Dominika Beata Tkaczyk. New methods for metadata extraction from scientific literature. [praca dyplomowa] Dostępny w Atlasie Zasobów Otwartej Nauki, https://azon.e-science.pl/zasoby/new-methods-for-metadata-extraction-from-scientific-literature,21567/. Licencja: CC BY-SA 4.0, https://creativecommons.org/licenses/by-sa/4.0/legalcode.pl. Data dostępu: 13.04.2025.

Podobne zasoby

Splunk - konfiguracja, rozpoznawanie i wizualizacja informacji o incydentach i zagrożeniach

Arkadiusz Kotynia, Julia Jancelewicz, Urszula Warmińska, inny dokument, Politechnika Wrocławska, Dziedzina nauk inżynieryjno-technicznych / automatyka, elektronika i elektrotechnika (2018)

Metody znakowania morfosyntaktycznego i automatycznej płytkiej analizy składniowej języka polskiego

Adam Radziszewski, praca dyplomowa, Politechnika Wrocławska, dziedzina nauk technicznych / informatyka (2011)

Uczenie maszynowe na podstawie przykładów w przypadku błędów w danych

Grażyna Szkatuła, praca dyplomowa, Instytut Badań Systemowych PAN w Warszawie, dziedzina nauk technicznych / informatyka (2011)

Korpus nagrań próbek mowy do celów budowy modeli akustycznych dla automatycznego rozpoznawania mowy w języku polskim, cz. 8

Teresa Sas, zbiór, baza danych, Politechnika Wrocławska, Dziedzina nauk inżynieryjno-technicznych / informatyka techniczna i telekomunikacja (2018)

Korpus nagrań próbek mowy do celów budowy modeli akustycznych dla automatycznego rozpoznawania mowy w języku polskim, cz. 5

Teresa Sas, zbiór, baza danych, Politechnika Wrocławska, Dziedzina nauk inżynieryjno-technicznych / informatyka techniczna i telekomunikacja (2018)

Usługa dokonująca analizy sentymentu

Stanisław Markowski, kod źródłowy, Politechnika Wrocławska, Dziedzina nauk inżynieryjno-technicznych / informatyka techniczna i telekomunikacja (2018)

Zobacz więcej