[雙語翻譯]財務外文翻譯--基于數據挖掘技術對財務報表分析模型的不同選擇(節(jié)選)_第1頁
已閱讀1頁,還剩12頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、中文 中文 4300 字, 字,2400 英文單詞, 英文單詞,1.3 萬英文字符 萬英文字符出處: 出處:Ishibashi K, Iwasaki T, Otomasa S, et al. Model selection for financial statement analysis: Variable selection with data mining technique [J]. Procedia Computer Scien

2、ce, 2016, 96(C):1681-1690.英 文: 文:Model selection for financial statement analysis: Variable selection with data mining techniqueKen Ishibashia, Takuya Iwasakia, Shota Otomasaa and Katsutoshi YadaaAbstractThe purpose of t

3、his study is to verify the effectiveness of a data-driven approach for financial statement analysis. In the area of accounting, variable selection for construction of models to predict firm’s earnings based on financial

4、statement data has been addressed from perspectives of corporate valuation theory, etc., but there has not been enough verification based on data mining techniques. In this paper, an attempt was made to verify the applic

5、ability of variable selection for the construction of an earnings prediction model by using recent data mining techniques. From analysis results, a method that considers the interaction among variables and the redundancy

6、 of model could be effective for financial statement data.Keywords: Financial statement analysis; earnings prediction model; model selection; variable selection; data mining1. IntroductionRecent advancement in informat

7、ion and communication technology is dramatically improving computational speeds. Under the circumstances, researchers have addressed studies focused on big data accumulated in various areas. Data mining techniques play a

8、n important role in data-driven analysis and modeling. Various methods related to data mining have been developed until now, and software such as SPSS and Weka has been developed to enable us to use them easily. However,

9、 for these applications, we generally need to select a method appropriate to data.The purpose of this study is to verify the effectiveness of a data-driven approach for the financial statement analysis. In the area of ac

10、counting, Ou and Penman (1989)1) addressed the construction of an earnings prediction model focused on financial statement data. They constructed a prediction model for the probability of a firm’s earnings increase in t

11、he subsequent fiscal year by using stepwise logistic regression analysis. By introducing variable selection, their prediction model used variables’ interactions that have not been proved theoretically. That is, it is

12、possible that they constructed an earnings prediction model using unusual information that other people do not have.The result of Ou and Penman (1989)1) has various problems related to the practical use of their method.

13、 In that research1), they did not state the reason why they applied logistic regression analysis to the model construction. Furthermore, follow-up studies2), 3) pointed out various problems through additional verificatio

14、ns of the model of Ou and Penman (1989)1). For example, Holthausen and Larcker (1992)2) applied the strategy of Ou and Penman (1989)1) to another fiscal period, but could not obtain anomalies of the probability of Relief

15、 is an instance-based attribute ranking scheme proposed by Kira and Rendell (1992)6), and later improved by Kononenko (1994)10). This method is applied to the estimation of a variable’s importance for the classification.

16、 In a classification of certain class, Relief decides a variable’s importance by focusing on instances located around the border of the class. From these instances, two instances are selected as near-miss and near-hit. T

17、he near-miss is an instance that is the closest to randomly selected samples but is not the same class as them. On the other hand, an instance selected as near-hit is the closest to them and is the same class. In Relief,

18、 the importance of a variable is decided based on the effectiveness for the classification of near-miss. Existing research5) showed that this method had large tolerance to noise but low redundancy.In the application of R

19、elief to variable selection, variables to adopt are generally decided by setting a threshold to their estimated ranks. In this study, the importance of variables is decided by 10-fold cross-validation, and we adopt varia

20、bles for which the “Merit” criterion for the classification is more than 0 are adopted.2.3. Correlation-based feature selectionCFS is a method that evaluates subsets of variables, not individual variables7). This method

21、searches subsets containing variables that are highly correlated with the class and have low inter-correlation with each other. CFS tends to be computationally cheap and choose small variables’ subsets, but it is difficu

22、lt to search solutions if there are strong variable interactions5).In this study, we use a Greedy algorithm to search for a subset that has the best CFS’s evaluation.2.4. Consistency-based subset evaluationCNS evaluates

23、variables’ subsets by using class consistency8). This method searches for combinations of variables which divide the data into subsets containing strong single class majority. Thus, this search tends to be biased in f

24、avor of small variable subsets with high-class consistency. Compared with CFS, CNS is useful if there are strong variable interactions, but the size of subset tends to be large5).In this study, CNS searches for subsets b

25、y using a Greedy algorithm like in CFS.2.5. C4.5 decision tree learnerC4.5 is a learning algorithm that constructs a decision tree by selecting variables appropriate to maximize the mutual information for classification9

26、). This method can avoid over-training to data by the function called “branch pruning”, which removes branches that have little mutual information or classify few instances. In the variable selection, variables contained

27、 in the decision tree are adopted as a subset of variables.In this study, a decision tree is constructed by using all training data for modeling, and then branches of which the number of classifying data is less than 50

28、are removed by the pruning. In this way, we obtain a subset with a size equivalent to CFS’s subsets.2.6. Stepwise methodIn existing research, Ou and Penman (1989)1) constructed an earnings prediction model by using stepw

29、ise logistic regression. Stepwise method is a conventional method that sequentially chooses variables to enhance evaluation criteria. In this method, the process of variable selection is very clear. However, because the

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論