process

介绍

提供常用的因子处理操作,如去极值,中性化等

standardize

  • jaqs_fxdayu.research.signaldigger.process.standardize(factor_df, index_member=None)

简要描述:

  • 横截面z-score标准化

参数:

字段 必选 类型 说明
factor_df pandas.DataFrame 日期为索引,证券品种为columns的二维因子表格
index_member pandas.DataFrame of bool 是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行标准化所纳入的样本只有每期横截面上属于对应指数成分股的股票,默认为空

返回:

标准化后的因子

示例:

import warnings
warnings.filterwarnings('ignore')
from jaqs_fxdayu.data import DataView
from jaqs_fxdayu.research.signaldigger.process import standardize

# 加载dataview数据集
dv = DataView()
dataview_folder = './data'
dv.load_dataview(dataview_folder)

# z-score标准化
standardize(factor_df = dv.get_ts("pe"), index_member = dv.get_ts("index_member")).head()
Dataview loaded successfully.
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000027.SZ 000039.SZ 000060.SZ 000061.SZ 000063.SZ 000069.SZ ... 601988.SH 601989.SH 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603858.SH 603885.SH 603993.SH
trade_date
20170502 -0.363380 -0.340032 -0.106714 0.152518 -0.266414 0.216918 0.086421 0.857408 -0.411592 -0.343106 ... -0.366394 0.891601 NaN NaN -0.361782 0.677455 NaN NaN -0.248940 0.131240
20170503 -0.364271 -0.341856 -0.107757 0.151190 -0.268283 0.219121 0.083804 0.852450 -0.412694 -0.344699 ... -0.367529 0.879934 NaN NaN -0.363002 0.697502 NaN NaN -0.248411 0.128307
20170504 -0.364991 -0.340861 -0.107070 0.154148 -0.267100 0.213994 0.078180 0.849831 -0.412865 -0.344161 ... -0.367343 0.871015 NaN NaN -0.363119 0.674523 NaN NaN -0.248024 0.118993
20170505 -0.364277 -0.339788 -0.116436 0.142003 -0.266276 0.199128 0.080549 0.857999 -0.412033 -0.343666 ... -0.365914 0.858166 NaN NaN -0.362034 0.659895 NaN NaN -0.243558 0.114178
20170508 -0.360932 -0.337663 -0.121213 0.133428 -0.265375 0.197282 0.087274 0.871560 -0.408468 -0.340375 ... -0.361849 0.824399 NaN NaN -0.358094 0.662941 NaN NaN -0.242522 0.121454

5 rows × 330 columns

winsorize

  • jaqs_fxdayu.research.signaldigger.process.winsorize(factor_df, alpha=0.05, index_member=None)

简要描述:

  • 横截面去极值

参数:

字段 必选 类型 说明
factor_df pandas.DataFrame 日期为索引,证券品种为columns的二维因子表格
alpha float 去极值的边界,如0.05代表去掉左右两边各2.5%分位的极端值(保留中心部分95%分布的数据)。默认0.05
index_member pandas.DataFrame of bool 是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行去极值所纳入的样本只有每期横截面上属于对应指数成分股的股票,默认为空

返回:

去极值后的因子

示例:

from jaqs_fxdayu.research.signaldigger.process import winsorize

winsorize(factor_df = dv.get_ts("pe"), 
          alpha=0.05,
          index_member = dv.get_ts("index_member")).head()
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000027.SZ 000039.SZ 000060.SZ 000061.SZ 000063.SZ 000069.SZ ... 601988.SH 601989.SH 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603858.SH 603885.SH 603993.SH
trade_date
20170502 6.7925 10.0821 42.9544 79.4778 20.4542 88.5511 70.1653 178.7903 0.0 9.6490 ... 6.3679 183.6078 NaN NaN 7.0177 153.4365 NaN NaN 22.9161 76.4800
20170503 6.7697 9.9035 42.6314 78.8332 20.1893 88.3302 69.4123 176.8719 0.0 9.5060 ... 6.3143 180.7143 NaN NaN 6.9472 155.2097 NaN NaN 22.9674 75.6340
20170504 6.6405 9.9876 42.4161 78.6490 20.2187 86.9501 68.1117 175.1454 0.0 9.5298 ... 6.3143 178.0838 NaN NaN 6.9002 150.8288 NaN NaN 22.8647 73.7727
20170505 6.5570 9.9193 40.5860 76.0703 20.0127 83.9137 67.6325 174.3781 0.0 9.3869 ... 6.3322 174.4011 NaN NaN 6.8649 147.1780 NaN NaN 23.1319 72.2499
20170508 6.5114 9.6988 39.3479 74.2284 19.6007 82.9752 67.9063 175.3372 0.0 9.3273 ... 6.3858 168.8771 NaN NaN 6.9002 146.7608 NaN NaN 22.7311 72.5883

5 rows × 330 columns

mad

  • jaqs_fxdayu.research.signaldigger.process.mad(factor_df, index_member=None)

简要描述:

  • 横截面去极值

参数:

字段 必选 类型 说明
factor_df pandas.DataFrame 日期为索引,证券品种为columns的二维因子表格
index_member pandas.DataFrame of bool 是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行去极值所纳入的样本只有每期横截面上属于对应指数成分股的股票,默认为空

返回:

去极值后的因子

示例:

from jaqs_fxdayu.research.signaldigger.process import mad

mad(factor_df = dv.get_ts("pe"), 
    index_member = dv.get_ts("index_member")).head()
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000027.SZ 000039.SZ 000060.SZ 000061.SZ 000063.SZ 000069.SZ ... 601988.SH 601989.SH 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603858.SH 603885.SH 603993.SH
trade_date
20170502 6.7925 10.0821 42.9544 79.4778 20.4542 88.5511 70.1653 91.92400 0.0 9.6490 ... 6.3679 91.92400 NaN NaN 7.0177 91.92400 NaN NaN 22.9161 76.4800
20170503 6.7697 9.9035 42.6314 78.8332 20.1893 88.3302 69.4123 91.87230 0.0 9.5060 ... 6.3143 91.87230 NaN NaN 6.9472 91.87230 NaN NaN 22.9674 75.6340
20170504 6.6405 9.9876 42.4161 78.6490 20.2187 86.9501 68.1117 92.15105 0.0 9.5298 ... 6.3143 92.15105 NaN NaN 6.9002 92.15105 NaN NaN 22.8647 73.7727
20170505 6.5570 9.9193 40.5860 76.0703 20.0127 83.9137 67.6325 86.81125 0.0 9.3869 ... 6.3322 86.81125 NaN NaN 6.8649 86.81125 NaN NaN 23.1319 72.2499
20170508 6.5114 9.6988 39.3479 74.2284 19.6007 82.9752 67.9063 86.30405 0.0 9.3273 ... 6.3858 86.30405 NaN NaN 6.9002 86.30405 NaN NaN 22.7311 72.5883

5 rows × 330 columns

rank_standardize

  • jaqs_fxdayu.research.signaldigger.process.rank_standardize(factor_df, index_member=None)

简要描述:

  • 排序标准化。将因子处理成横截面上的排序值(升序),并处理到0-1之间——仅保留原因子的顺序特征,剔除分布特征

参数:

字段 必选 类型 说明
factor_df pandas.DataFrame 日期为索引,证券品种为columns的二维因子表格
index_member pandas.DataFrame of bool 是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行排序标准化所纳入的样本只有每期横截面上属于对应指数成分股的股票,默认为空

返回:

排序标准化后的因子

示例:

from jaqs_fxdayu.research.signaldigger.process import rank_standardize

rank_standardize(factor_df = dv.get_ts("pe"), 
                 index_member = dv.get_ts("index_member")).head()
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000027.SZ 000039.SZ 000060.SZ 000061.SZ 000063.SZ 000069.SZ ... 601988.SH 601989.SH 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603858.SH 603885.SH 603993.SH
trade_date
20170502 0.063545 0.117057 0.722408 0.886288 0.361204 0.90301 0.862876 0.966555 0.0 0.107023 ... 0.053512 0.969900 NaN NaN 0.070234 0.943144 NaN NaN 0.408027 0.876254
20170503 0.063545 0.113712 0.722408 0.886288 0.354515 0.90301 0.859532 0.966555 0.0 0.100334 ... 0.053512 0.969900 NaN NaN 0.066890 0.939799 NaN NaN 0.408027 0.872910
20170504 0.063545 0.113712 0.725753 0.886288 0.357860 0.90301 0.852843 0.963211 0.0 0.100334 ... 0.053512 0.969900 NaN NaN 0.066890 0.939799 NaN NaN 0.408027 0.872910
20170505 0.063545 0.113712 0.712375 0.882943 0.351171 0.90301 0.859532 0.963211 0.0 0.100334 ... 0.053512 0.966555 NaN NaN 0.070234 0.943144 NaN NaN 0.424749 0.872910
20170508 0.060201 0.103679 0.719064 0.882943 0.331104 0.90301 0.862876 0.969900 0.0 0.096990 ... 0.053512 0.963211 NaN NaN 0.070234 0.946488 NaN NaN 0.421405 0.879599

5 rows × 330 columns

get_disturbed_factor

  • jaqs_fxdayu.research.signaldigger.process.rank_standardizeget_disturbed_factor(factor_df)

简要描述:

  • 将因子值加一个极小的扰动项,用于对quantile分组做区分

参数:

字段 必选 类型 说明
factor_df pandas.DataFrame 日期为索引,证券品种为columns的二维因子表格

返回:

加扰动项后的因子

示例:

from jaqs_fxdayu.research.signaldigger.process import get_disturbed_factor

get_disturbed_factor(factor_df = dv.get_ts("pe")).head()
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000027.SZ 000039.SZ 000060.SZ 000061.SZ 000063.SZ 000069.SZ ... 601988.SH 601989.SH 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603858.SH 603885.SH 603993.SH
trade_date
20170502 6.7925 10.0821 42.9544 79.4778 20.4542 88.5511 70.1653 178.7903 3.437688e-10 9.6490 ... 6.3679 183.6078 32.7886 9.8565 7.0177 153.4365 50.8349 31.0157 22.9161 76.4800
20170503 6.7697 9.9035 42.6314 78.8332 20.1893 88.3302 69.4123 176.8719 4.412786e-10 9.5060 ... 6.3143 180.7143 30.2450 9.8817 6.9472 155.2097 50.7259 31.0311 22.9674 75.6340
20170504 6.6405 9.9876 42.4161 78.6490 20.2187 86.9501 68.1117 175.1454 4.559244e-10 9.5298 ... 6.3143 178.0838 31.4771 9.8188 6.9002 150.8288 50.3727 30.6805 22.8647 73.7727
20170505 6.5570 9.9193 40.5860 76.0703 20.0127 83.9137 67.6325 174.3781 6.587699e-10 9.3869 ... 6.3322 174.4011 30.8809 9.5609 6.8649 147.1780 49.3963 30.2527 23.1319 72.2499
20170508 6.5114 9.6988 39.3479 74.2284 19.6007 82.9752 67.9063 175.3372 6.254412e-10 9.3273 ... 6.3858 168.8771 27.9399 9.3282 6.9002 146.7608 50.3779 29.5167 22.7311 72.5883

5 rows × 330 columns

neutralize

  • jaqs_fxdayu.research.signaldigger.process.neutralize(factor_df,group,float_mv=None,index_member=None)

简要描述:

  • 对因子做行业、市值中性化

参数:

字段 必选 类型 说明
factor_df pandas.DataFrame 因子。日期为索引,证券品种为columns的二维表格
group pandas.DataFrame 行业分类(也可以是其他分组方式)。日期为索引,证券品种为columns的二维表格,对应每一个品种在某期所属的分类
float_mv pandas.DataFrame 流通市值。日期为索引,证券品种为columns的二维表格。默认为空,为空时不进行市值中性化处理
index_member pandas.DataFrame of bool 是否是指数成分股。日期为索引,证券品种为columns的二维bool值表格,True代表该品种在该日期下属于指数成分股。传入该参数,则进行行业、市值中性化所纳入的样本只有每期横截面上属于对应指数成分股的股票,默认为空

返回:

行业、市值中性化后的因子

示例:

from jaqs_fxdayu.research.signaldigger.process import neutralize

neutralize(factor_df = dv.get_ts("pe"),
           group = dv.get_ts("sw1")).head()
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000027.SZ 000039.SZ 000060.SZ 000061.SZ 000063.SZ 000069.SZ ... 601988.SH 601989.SH 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603858.SH 603885.SH 603993.SH
trade_date
20170502 -2.662629 -7.782230 -27.98201 -38.33350 -4.083109 17.61469 -83.013425 107.385838 -168.217857 -8.215330 ... -3.087229 26.912500 9.87725 0.401371 -2.437429 108.584833 3.357346 -55.150405 -6.266428 -76.698725
20170503 -2.682662 -7.829960 -28.76077 -39.50720 -3.544909 16.93803 -83.589442 105.819463 -168.313357 -8.227460 ... -3.138062 24.588600 8.60545 0.429338 -2.505162 110.440158 3.330400 -55.949523 -6.084489 -77.367742
20170504 -2.815043 -7.733890 -28.67189 -38.74790 -4.016945 15.86211 -82.429800 104.271487 -168.140586 -8.191690 ... -3.141243 26.910367 9.39235 0.363257 -2.555343 106.489883 3.025662 -55.572859 -6.116911 -76.768800
20170505 -2.762233 -7.653145 -28.89854 -37.39795 -3.835882 14.42916 -82.012883 103.912075 -167.959957 -8.185545 ... -2.987033 27.853900 9.18745 0.241667 -2.454333 103.302950 2.387592 -55.251945 -5.618411 -77.395483
20170508 -2.564538 -7.591140 -29.02696 -36.10530 -3.576855 14.60034 -80.613975 104.639975 -167.807429 -7.962640 ... -2.690138 30.356133 7.68590 0.252262 -2.175738 102.756587 4.286408 -54.397259 -5.803628 -75.931975

5 rows × 330 columns