Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.intsys.msu.ru/en/staff/ryzhov/Ryjov%20A.%20Summary%20of%20results%20and%20experience%20in%20Data%20Science.pdf
Дата изменения: Tue Jan 14 02:58:18 2014
Дата индексирования: Thu Feb 27 22:31:21 2014
Кодировка:
Summary of results and experience in Data Science
Alexander P. Ryjov
Associated Professor, Chair of Mathematical Foundations of Intelligent Systems, Department of Mechanics and Mathematics, Lomonosov Moscow State University, Professor and Head, Chair of Business Processes Management Systems, IT Business School, Russian Presidential Academy of National Economy and Public Administration.

1. Systems for evaluation and monitoring of complex processes. Information monitoring task includes evaluation of current status of some process and modeling of possible ways of its development based on all available information (structured, non-structured, weak-structured). · · · system fo IAEA) system fo Medicine system fo r monitoring and evaluation of state's nuclear activities (department of safeguards, r evaluation and monitoring of risks of cardiovascular disease (Center of Preventive of Ministry of Health of Russia) r evaluation and monitoring of microelectronics design (Cadence Design Systems, Inc.)

More detailed presentation is available here:
http://intsys.msu.ru/en/staff/ryzhov/Systems%20for%20evaluation%20and%20monitoring%20of%20complex%20processes.pdf

2. Retail 2.1. Profiles Input: receipts + database of discount program (personal data) Question: who are our high profitable customers? Solution: split (PROFIT) for categories (less than 8200; 8200 ­ 23300; 23300 ­ 60500; more then 60500). Build profile for "more than 60500" 2.2. Customers' behavior Input: receipts for 1 year Question: which goods are good for selling in particular time (of the year/ months/ weeks/ days)? Solution: split goods for categories, time for periods Results: - customers with average check prefer to buy #10 at summer; - these customers prefer to buy ## 5, 7, 9 in conjunction at winter


2.3. Retail/Cross-selling Input: receipts for 1 year Question: which goods are most likely to be bought together? Solution: split goods for categories. Results: goods #14 are bought together with #5, #4, #7, # 9 if we add #6 or #13, we can increase the sales

3. Telecom I used similar approaches for telecom (mobile content selling). Architecture for big data-based recommendation engine and results are below:

4. Banks: credit scoring system. Business goal: objectively assess loan applicant's credit risks and decide whether to grant a loan or not. Technology used in the product: data mining, associative rules induction. Product main features: · Automatic analysis of existing credit histories along with application forms of current borrowers. Identification of common


· · · · · Results: · Client using Score reports that bank's share of bad loans is 2 times less than market average. 5. Finance: Suspicious transactions detection Target group: companies dealing with detection of suspicious or fraudulent transactions (auditors, banks, telecoms); companies with internal audit departments. Business goal: to efficiently reveal suspicious transactions with greater accuracy and less time. Technology used in the product: neural networks, cluster analysis. Product main features: · Automatic detection of non-typical (suspicious) transactions for further investigation. · Automatic detection of transactions similar to fraudulent, specified by the user. Results: · Tests showed 7 times more accurate detection of suspicious transactions than currently widespread method. 6. HR: Evaluation of job applicants. Target group: companies with 200+ employees or companies large turnover of certain types of employees (call centers, banks, shops). Business goal: discover what makes best employees best; estimate job applicant's potential loyalty. Method: analysis of company employees' resumes. Technology we use: data mining, associative rules induction. Tasks we solve: · Structure and analyze company employees resumes, combine with performance data if available. · Discover what is in common for top performing employees. · Discover indicators of loyalty/unloyalty.

characteristics and building profiles of "good" and "bad" borrowers. Instant assessment of loan applicant's credit risks and recommendation on granting a loan. Full integration with banking software. Quick integration: ready to use in 2-3 months; 6-9 months for full integration. Ease of use: operators do not need to have any science-specific knowledge. Reasoning of recommended decision: why should we deny application?


Build profiles of different groups of employees (e.g. top-performers, most loyal employees, graduates of specific university, etc.) Result: interactive report with employees profiles and loyalty indicators. ·

7. HR: Improving employees' creativity and communications Target group: companies with 100+ employees. Business goal: to boost employees' creativity, invention of new ideas and products. Method: improving internal communication. The more people communicate the more creative they are. Technology we use: social networks analysis. Tasks we solve: · Identification of key experts in the company. Who people go to for an advice? · Identification of initiators. Who starts spreading new ideas and news? · Identification of "bridges" between communities. Who connects departments? · What-if analysis. What will happen if ... (key employee leaves, retires, gets sick; connection between teams breaks)? · Recommendations: how to increase communications sustainability and knowledge sharing between departments? Employees communicate and share. Here is how: Who is an expert? Who people consult with?

Who spreads the knowledge?

Who is the bridge between groups?

8. Large-scale databases: Adaptive semantic layer.


Adaptive semantic layer for large-scale databases allow to effectively handle a large amount of information. This effect is reached by providing an opportunity to search information on the basis of generalized concepts, or in other words, linguistic descriptions. These concepts are formulated by the user in natural language, and modelled by fuzzy sets, defined on the universe of the significances of the characteristics of the data base objects. After adjustment of user's concepts based on search results, we have "personalized semantics" for all terms which particular person uses for communications with data base or social networks (for example, "young person" will be different for teenager and for old person; "good restaurant" will be different for people with different income, age, etc. The structure of an adaptive semantic layer is shown here: Based o optimal · · · n theoretical results (section 1), we can develop layer which allows: define user's concepts; search an information by these concepts; adjustment of user's concepts based on search results (GA-based tuning of membership functions and logic).

References: Lyapin B. , Ryjov A. A Fuzzy Linguistic Interface for Data Bases in Nuclear Safety Problems. Fuzzy Logic and Intelligent Technologies in Nuclear Science. Proceedings of the 1st International FLINS Workshop, Mol, Belgium, September 14-16, 1994. Edited by Da Ruan, Pierre D'hondt, Paul Govaerts, Etienne E. Kerre, World Scientific. p. 212-215. Alexander Ryjov. Personalization of Social Networks: Adaptive Semantic Layer Approach. In: Social Networks: A Framework of Computational Intelligence. Ed. by Witold Pedrycz and Shyi-Ming Chen. Springer Verlag, 2013 (will be published soon) 9. Energy: Smart Grid For smart grid we can generate and use a huge amount of information from smart meters and other measurement devices. I have experience in usage customer's data for optimization consumption and energy quality. We use data mining for extracting patterns of customer's behavior from amount of data; fuzzy logic, artificial neural networks and genetic algorithms (soft computing approach) for development of monitoring and control systems; machine learning and adaptive systems approaches for optimization monitoring and control systems. Mini-case: (Regional Energy Co.) Problem definition: Local electric power substation has a retreating feeder and the meter installed on it. The switching station has a unit step voltage control under load with steps (0%, ± 2.5%, ± 5%, ±


7.5%, ± 10%). The task is to maintain the voltage deviation at the substation buses at a given level ± 5%. Data can be obtained from the meter: the current value of the phase voltages and currents. Quality measure: (total time period when the voltage deviation are out of level ± 5% without control)/(total time period when the voltage deviation are out of level ± 5% with control). Results: up to 10 times quality increasing on real data (on the figure: real data (up), control, results (down)).