Slovotext - Tutorial

Tutorial

To use Slovotext, do the following:

1. Select or enter a language.

2. Select a .txt file, or paste text from the clipboard into the corresponding field.

3. Click the Run button.

Tutorial video

Detailed description

Interface language menu

Use and icons to switch between Russian and English. Please note that language switching results in lost input data!

Input data tab

This is the tab to specify a language, add texts for analysis, and perform search after the analysis is completed. In the default state, i.e. before running text analysis, the Input data tab is always open. It can be hidden after input texts are processed and other tabs become populated.

Select or enter language menu field

This drop-down menu contains five language options. The selection is to save time only. You may as well type in any language or description for a set of input texts. The language description will be further used in the Source summary, Occurrence stats, and Entity stats tabs.

Cyrillic checkbox

If your texts are cyrillic-based, make sure this box is checked. If selecting Russian from the drop-down menu, this box is checked automatically.

Add column icon:

Click this icon to add another set. The set represents a different set of texts for analysis, including in a different language. The number of added columns is unlimited. If necessary, individual columns can be removed by clicking their respective Remove column icons: .

Select file button

Click this button to select a text file for analysis. Please use the txt format only.

Or copy-paste here field

Alternatively, texts can be input by copying and pasting them into this field. Note: the minimum number of words is two.

Text name field

Use this field to enter a description for a selected or pasted text. This name will be further used in the Source summary tab for statistics comparison.

Add text icon:

Click this icon to add another text for analysis. Follow the above selecting/pasting and naming procedure. The number of added texts is unlimited. If necessary, individual texts can be removed by clicking their respective Remove text icons: .

Run button

Click the Run button to start processing input texts. The mouse cursor may freeze for some time. Wait a couple of seconds until icon appears. When the icon appears, the search function and tabs below will activate.

Search field

Use this field to enter a part of a word, a word or a phrase. The search functionality supports the use of the wildcard: * (asterisk). If necessary, the wildcards can be placed in the beginning and end of a searched text, e.g. *llusion* or *ood afternoo*. The search input is case-insensitive. To widen or narrow search, use the Include and Exclude fields below.

Search icon:

After entering a search query and, if necessary, Include/Exclude criteria, click this icon to start searching through all the texts within the given set.

Include field

Add another search term if a searched text may change its composition, e.g. due to vowel interchange. For instance, this parameter may be used to encompass all forms within the same paradigm. The results for the Include parameter will be added to the search total in the No. occur. box. The number of additional Include fields is unlimited.

Exclude field

If a search text may include omonimous entities not related to the paradigm in question, this parameter can be used to explicitly remove these from the search total. The results for the Exclude parameter will be subtracted from the search total in the No. occur. box. The number of additional Exclude parameters is unlimited. Please note that multiple Exclude parameters containing phrases (rather than single words) may result in an incorrect search total: the No. Occur output will take into account the last Exclude parameter only.

No. occur. box

This box shows the number of occurrences of a searched element in the texts of the set at hand. The results take into account the Exclude and Include search queries. For search queries containing a phrase (rather than a word or any part of it), only this box will show results. Other boxes, i.e. %, 1/Total, and No. occur./1000, will remain empty as their values are relative and require a set of similar entities to enable comparison with the searched item.

% box

This box shows the percentage of occurrences of a searched element of all the entities within the text set. Effectively, it is the same value as in 1/Total but expressed in percentage.

1/Total box

This box shows the ratio of one occurrence of a searched element to the total number of entities in the text set. Effectively, it is the same value as in % but expressed in fraction.

No. occur./1000 box

This box shows the ratio of occurrences of a searched element to 1,000 entities. (Note: The arbitrary value of 1,000 is for proportion only.)

Visualize tab

In the default state, i.e. before running text analysis, the tab is always closed. Once the Run button is clicked and text processing is completed, open this tab to find a link to a generated word cloud. If you see the processing icon, , wait before the WordCloud_[Language] link appears. Upon clicking the link, a new browser tab with a generated word cloud will open. The current word limit is 1,000 words, i.e. only 1,000 entities that most frequently occur in the text set. The word order in the image follows an alphabetic order, Cyrillic or Latin dependent on whether the Cyrillic box in the Input data tab is checked. If necessary, hide the Visualize tab by left-clicking it.

Relevance hierarchy tab

In the default state, i.e. before running text analysis, the tab is always closed. It lists all the entities found in input texts sorted by occurrence rates (No. of occur.), starting from the rarest to the most frequent ones. For example, if the number of occurrences is 1, the entities listed in the corresponding box are encountered in the involved set of input texts only once. Entities in boxes follow an alphabetic order. If texts including both Cyrillic- and Latin-based entities are analyzed, the found Latin-based entities will be listed first. More details can be found for every entity in the Entity stats tab, including their occurrence rates and shares of the total entity count.

If necessary, hide the Relevance hierarchy tab by left-clicking it.

Export to Word button

Click this button to export the generated hierarchy in the docx format. An output document will include hierarchies of all added sets.

Source summary tab

In the default state, i.e. before running text analysis, the tab is always closed. It provides statistics of the texts input for analysis in the form of a table with text names entered in the Text name field (Input data tab) and their relevant entity counts, plus the number of unique entities. Further, the table refers to a language selected or entered in the Select or enter language menu field (Input data tab).
The Total entities row shows the total number of entities contained in all input texts.
The Incl. unique row represents the number of unique entities, i.e. the total number of entities exclusive of re-occurrences if an individual entity is found more than once.

The pie chart is a distribution of texts by their entity numbers in the total entity count for the given set. Hover the mouse pointer over its slices to see exact values.

If necessary, hide the Source summary tab by left-clicking it.

Entity stats tab

In the default state, i.e. before running text analysis, the tab is always closed. It contains a table of all the entities found in the analyzed texts with their counts, frequencies, and percentages. The table refers to a language selected or entered in the Select or enter language menu field of the Input data tab.
The Count column is the number of occurrences of an entity within the analyzed set of texts.

The Freq. column represents frequency: the ratio of one occurrence of an entity to the total number of entities, i.e. the same as % but expressed in fraction.
The % column shows the percentage of occurrences of an entity of the total number of entities, i.e. the same as Freq. but expressed in percentage.

The entities here are exactly as in the Relevance hierarchy tab but with individual details and ordered from the most frequent to the rarest ones. If necessary, hide the Entity stats tab by left-clicking it.

Occurrence stats tab

In the default state, i.e. before running text analysis, the tab is always closed. This tab features a table with occurrence rates, the numbers of entities having respective occurrence rates, and percentages of these entities of the total entity count for this set of texts. The ordering is from the rarest to the most frequent ones. The table refers to a language selected or entered in Select or enter language menu field (Input data tab).

The No. occur. column represents how many times relevant entities occur within the given text set. It is the same as the No. of occur. value in the Relevance hierarchy tab.

The Entities column lists the numbers of entities having respective occurrence rates to their left. In effect, it is the total number of entities falling within the same box in the Relevance hierarchy tab.

The % column reflects the percentages of such entities with respective occurrence rates in the total entity count in the given text set.

If necessary, hide the Occurrence stats tab by left-clicking it.