r - strucchange not reporting breakdates -


this first time strucchange bear me. problem i'm having seems strucchange doesn't recognize time series correctly can't figure out why , haven't found answer on boards deals this. here's reproducible example:

require(strucchange) # time series nmreprosuccess <- c(0,0.50,na,0.,na,0.5,na,0.50,0.375,0.53,0.846,0.44,1.0,0.285,                      0.75,1,0.4,0.916,1,0.769,0.357) dat.ts <- ts(nmreprosuccess, frequency=1, start=c(1996,1)) str(dat.ts) 

time-series [1:21] 1996 2016: 0 0.5 na 0 na 0.5 na 0.5 0.375 0.53 ...

to me means time series looks ok work with.

# obtain breakpoints bp.nmsuccess <- breakpoints(dat.ts~1) summary(bp.nmsuccess) 

which gives:

optimal (m+1)-segment partition:    call:  breakpoints.formula(formula = dat.ts ~ 1)   breakpoints @ observation number:   m = 1     6                m = 2   3   7              m = 3   3           14 16  m = 4   3   7       14 16  m = 5   3   7 10    14 16  m = 6   3   7 10 12 14 16  m = 7   3 5 7 10 12 14 16   corresponding breakdates:   m = 1                     0.333333333333333                                                        m = 2   0.166666666666667                   0.388888888888889                                      m = 3   0.166666666666667                                                                          m = 4   0.166666666666667                   0.388888888888889                                      m = 5   0.166666666666667                   0.388888888888889 0.555555555555556                    m = 6   0.166666666666667                   0.388888888888889 0.555555555555556 0.666666666666667  m = 7   0.166666666666667 0.277777777777778 0.388888888888889 0.555555555555556 0.666666666666667   m = 1                                        m = 2                                        m = 3   0.777777777777778 0.888888888888889  m = 4   0.777777777777778 0.888888888888889  m = 5   0.777777777777778 0.888888888888889  m = 6   0.777777777777778 0.888888888888889  m = 7   0.777777777777778 0.888888888888889   fit:   m   0       1       2       3       4       5       6       7        rss  1.6986  1.1253  0.9733  0.8984  0.7984  0.7581  0.7248  0.7226  bic 14.3728 12.7421 15.9099 20.2490 23.9062 28.7555 33.7276 39.4522 

here's start having problem. instead of reporting actual breakdates reports numbers makes impossible plot break lines onto graph because they're not @ breakdate (2002) @ 0.333.

plot.ts(dat.ts, main="natural mating") lines(fitted(bp.nmsuccess, breaks = 1), col = 4, lwd = 1.5) 

nothing shows me in graph (i think because it's small scale of graph).

in addition, when try fixes may possibly work around problem,

fm1 <- lm(dat.ts ~ breakfactor(bp.nmsuccess, breaks = 1)) 

i get:

error in model.frame.default(formula = dat.ts ~ breakfactor(bp.nmsuccess,  :    variable lengths differ (found 'breakfactor(bp.nmsuccess, breaks = 1)') 

i errors because of na values in data length of dat.ts 21 , length of breakfactor(bp.nmsuccess, breaks = 1) 18 (missing 3 nas).

any suggestions?

the problem occurs because breakpoints() can (a) cope nas omitting them, , (b) cope times/date through ts class. creates conflict because when omit internal nas ts loses ts property , hence breakpoints() cannot infer correct times.

the "obvious" way around use time series class can cope this, namely zoo. however, never got round integrate zoo support breakpoints() because break of current behavior.

to cut long story short: best choice @ moment book-keeping times , not expect breakpoints() you. additional work not huge. first, create time series response , time vector , omit nas:

d <- na.omit(data.frame(success = nmreprosuccess, time = 1996:2016)) d ##    success time ## 1    0.000 1996 ## 2    0.500 1997 ## 4    0.000 1999 ## 6    0.500 2001 ## 8    0.500 2003 ## 9    0.375 2004 ## 10   0.530 2005 ## 11   0.846 2006 ## 12   0.440 2007 ## 13   1.000 2008 ## 14   0.285 2009 ## 15   0.750 2010 ## 16   1.000 2011 ## 17   0.400 2012 ## 18   0.916 2013 ## 19   1.000 2014 ## 20   0.769 2015 ## 21   0.357 2016 

then can estimate breakpoint(s) , afterwards transform "number" of observations time scale. note i'm setting minimal segment size h explicitly here because default of 15% small short series. 4 still small possibly enough estimating of constant mean.

bp <- breakpoints(success ~ 1, data = d, h = 4) bp ##   optimal 2-segment partition:  ##  ## call: ## breakpoints.formula(formula = success ~ 1, h = 4, data = d) ##  ## breakpoints @ observation number: ## 6  ##  ## corresponding breakdates: ## 0.3333333  

we ignore break "date" @ 1/3 of observations map original time scale:

d$time[bp$breakpoints] ## [1] 2004 

to re-estimate model nicely formatted factor levels, do:

lab <- c(   paste(d$time[c(1, bp$breakpoints)], collapse = "-"),   paste(d$time[c(bp$breakpoints + 1, nrow(d))], collapse = "-") ) d$seg <- breakfactor(bp, labels = lab) lm(success ~ 0 + seg, data = d) ## call: ## lm(formula = success ~ 0 + seg, data = d) ##  ## coefficients: ## seg1996-2004  seg2005-2016   ##       0.3125        0.6911   

or visualization:

plot(success ~ time, data = d, type = "b") lines(fitted(bp) ~ time, data = d, col = 4, lwd = 2) abline(v = d$time[bp$breakpoints], lty = 2) 

success series breaks

one final remark: such short time series simple shift in mean needed, 1 consider conditional inference (aka permutation tests) rather asymptotic inference employed in strucchange. coin package provides maxstat_test() function purpose (= short series single shift in mean tested).

library("coin") maxstat_test(success ~ time, data = d, dist = approximate(99999)) ##  approximative generalized maximally selected statistics ##  ## data:  success time ## maxt = 2.3953, p-value = 0.09382 ## alternative hypothesis: two.sided ## sample estimates: ##   "best" cutpoint: <= 2004 

this finds same breakpoint , provides permutation test p-value. if however, 1 has more data , needs multiple breakpoints and/or further regression coefficients, strucchange needed.


Comments