3/30/2023 0 Comments Translatium multilingualThe meaning of each exemplary email and sentence in the data collection was encoded in a mathematical representation using the word embeddings technique. Then Rosette identified the sentences in the data collection. First, Rosette identified the language of each document from the exemplary emails and the to-be-reviewed 100,000-document data collection containing English and Japanese. The system that they built worked as follows. In doing so, they found that phrases were the key to improving machine translation, too. However, phrases alone might not bring back enough exact matches, so instead of regular keyword search, LSI and Rosette used fuzzy search based on meaning (also known as semantics). Phrases presented the happy medium between keyword terms and sentences. Sentences provide ample context but are too specific to result in a match. Keywords fail to provide enough context to filter out irrelevant search results. New paradigm: Natural language processing and machine translation Better machine translation to enable non-Japanese-speaking attorneys to review relevant culled documents.A search method that could access more context than keywords provide. The end result is that keyword search tends to miss substantial amounts of dispositive evidence (meritorious facts that could be relied upon to resolve a legal dispute). Human translation of all documents is prohibitively expensive and more affordable machine translation can be patchy and error-ridden. “Interest” can refer to “fascination” or “payment on money lent.” Irrelevant results waste attorney time and keyword search alone might only uncover a fraction of relevant documents.įrom an English perspective, having to conduct multilingual eDiscovery review with Japanese documents either requires an expensive Japanese-capable attorney or someone who can translate the documents or search queries into Japanese. Based on context, a word can have multiple meanings. However, keyword search cannot match words unless they are exactly the same - thus “accident” will not find “mishap” or “incident.” Furthermore, not all occurrences of a word are relevant. Often eDiscovery cases - multilingual or not - begin with a search of the data collection using case-relevant keywords. The target data collection for eDiscovery included business emails between engineers, written in English and Japanese, about all stages of the company’s chip design, manufacturing, and testing procedures. To begin the multilingual eDiscovery process, the client provided case-relevant exemplary emails that contained terms related to chip design, testing, and fabrication - such as alignment, analysis, correction, defect, design, equipment, evaluation, method, processing, reticle, specification, and wafer. The client was a computer chip manufacturer who alleged that an employee stole trade secrets about the product design, testing, improvements, and fabrication of the chips, taking proprietary information to a different manufacturer. (Watch the webinar recording about this project.) The legal case The solution combined the AI-powered text analytics of Rosette by Babel Street with Ai Translate by LSI - a translation solution that uses AI in tandem with expert human translators. (LSI) tackled that challenge in a legal case and achieved astonishing results. With particularly large volumes of text, it is prohibitively expensive to translate everything. By Eugene Reyes and Jason Boro, with Tina LieuĪttorneys performing multilingual eDiscovery face the constant challenge of having too much data for humans to review.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |