version 4.8.8 (13.7 MB) by
E. Cheynet

Numerical implementation of an extended SEIR model with time-dependent death and recovery rates

A generalized SEIR model with seven states [2] is numerically implemented. The implementation is done from scratch except for the fitting, that relies on the function "lsqcurvfit". Therefore, the present implementation likely differs from the one used in ref.[2].

This Matlab implementation includes also some major differences with respect to ref. [2]. Among them is the expression of the death rate and recovery rate, which are analytical and empirical functions of the time. The idea behind this time-dependency is that the death and recovery rate should converge toward a constant value as the time increases. If the death rate is kept constant, the number of death may be overestimated. Births and natural death are not modelled here. This means that the total population, including the number of deceased cases, is kept constant. Note that ref. [2] is a preprint that is not peer-reviewed and I am not qualified enough to judge the quality of the paper.

The present submission contains:

- A function SEIQRDP.m that is used to simulate the time histories of the infectious, recovered and dead cases (among others)

- A function fit_SEIQRDP.m that estimates the eight parameters used in SEIQRDP.m in the least square sense.

- One example file Documentation.mlx, which presents the numerical implementation.

- One example file Example_province_region.mlx, which uses data collected by the Johns Hopkins University for the COVID-19 epidemy [3] for Hubei province (China).

- One example file Example_Country.mlx, which uses data collected by the Johns Hopkins University for the COVID-19 epidemy [3] for a country.

- One file "ItalianRegions.mlx" written by Matteo Secli (https://github.com/matteosecli) that I have modified for a slightly more robust fitting.

- One example file ChineseProvinces.mlx, which illustrates how the function fit_SEIQRDP.m is used in a for loop to be fitted to the data [3] from the different Chinese provinces.

- One example "uncertaintiesIssues.mlx", which illustrates the danger of fitting limited data sets.

- One example "Example_US_cities.mlx" that illustrates the fitting when "recovered" data are not available.

- One example simulateMultipleWaves,mlx that illustrates the fitting for multiple epidemic waves.

- One function getDataCOVID, which read from [3] the data collected by Johns Hopkins University.

- One function getDataCOVID_ITA written by Matteo Secli (https://github.com/matteosecli), that collects the updated data of the COVID-19 pandemic in Italy from the Italian government [4]

- One function getDataCOVID_US that collects the updated data in the USA from [3]

- One function checkRates.m that plots the fitted and computed death and recovery rates (quality check)

- One function getMultipleWaves.m that simulate and fit the SEIRQDP model to the situations where multiple epidemic waves are detected.

If you need to import data that are not included in the present submission, you can interactively import them in the Matlab workspace (https://se.mathworks.com/videos/importing-data-from-text-files-interactively-71076.html)

Any question, comment, or suggestion is welcome.

References

[2] Peng, L., Yang, W., Zhang, D., Zhuge, C., & Hong, L. (2020). Epidemic analysis of COVID-19 in China by dynamical modeling. arXiv preprint arXiv:2002.06563.

Cheynet, E. Generalized SEIR Epidemic Model (Fitting and Computation). Zenodo, 2020, doi:10.5281/ZENODO.3911854.

Created with
R2020b

Compatible with R2018b and later releases

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

Felin WiltaHi Cheynet,

Understand. Thanks a lot!

E. CheynetHi Felin,

If the initially reported number of quarantined case is zero, the initial values of E, I and Q will be zero, so the simulation will give "0" at any time step. In January 2020, the number of active cases in many countries was zero. I would recommend not using data before March 2020, except for China and maybe some other countries.

Felin WiltaHi Cheynet,

When I run example_county to simulate the epidemy outbreak based on the fitted parameters, when I use longer Initial and final time (i.e from 22 January 2020 to1st February 2021) I get the values for E,I,Q,R,D to be all zeros. but when I adjust to smaller time period (i.e from January 2021 to February 2021) I get nonzero value. This leads to Comparison of the fitted and real data section in example_country graph to not have the recovered (fitted), active (fitted), deceased (fitted) graph and only have the recovered (reported), active (reported), deceased (reported) graph when values are all zeros for E,I,Q,R,D (longer time period cases). May I know what caused this and how to make the values for E,I,Q,R,D nonzero in longer time period (i.e from 22 January 2020 to1st February 2021)?

Look forward for your clarification, thank you very much!

Allyson ChongHi Cheynet,

Thank you so much for your answers!! You've certainly been a great help towards my understanding of this model! Thank you again!!

E. CheynetHi Allyson,

Sorry for the late reply. I try to answer point by point below:

1 The file Example_Country.mlx includes data from the countries, which are regularly updated. These data are available on Github here: https://github.com/CSSEGISandData/COVID-19. The parameters used as "guess" are an initial guess necessary to carry out the least-square fit. These are given by the user as initial conditions and can influence the fitted values of the parameters. The units depend on which parameters you are looking at. Their unit needs to ensure a dimensional homogeneity of the equations. The unit time is the "day". In the file Documentation.mlx, lambda and kappa are explained in more details.

2 These are initial conditions. So they are values that are "guessed"

3 Yes, I am using a time step of 1 hour. Because the unit time is "day", I take 1/24 days as a time step. it's used mainly for a practical purpose. I am using a slightly modified version of the Runge-Kutta algorithm which relies on the assumption that the parameters kappa and lambda do not change significantly between two time-step. If you use a time step too large, the accuracy of the simulation will drop and you could even have convergence issues. However, the code should still work fairly well if you increase the time step to 12 hours, i.e. dt = 1/2 or even dt = 1, i.e. one day because kappa and lambda changes smoothly with time in the present code

4 You are free to make any change you want. There are several way to fit the model to the data. The fitting procedure I used is not the simplest approach because I fit first lambda and kappa to restrict their ranges of possible values, then apply a general fit with strict lower and upper limits for the parameters. A more straightforward (but not necessarily better) approach would be to fit lambda and kappa first, and do not allow their values to change anymore. A second fit can then be sued to retrieve the remaining parameters.

Allyson ChongHello again, Cheynet,

I managed to solve the problem before this! I have a few more questions and would really appreciate it if you can help me clear my doubts.

1. Under the file Example_Country.mlx and section 'Fitting of the generalized SEIR model to the real data', there is a part where I have to input the alpha_guess, beta_guess, etc. Are these figures obtained from the data for the country that I want to analyze or is it just an educated guess? Also, what are the units used for these parameters? Eg per day/per week/per 1000 population, etc.

2. Under the same section, there is a 'lambda_guess = [0.01,0.001,10]; % recovery rate' and 'kappa_guess = [0.001,0.001,10]; % death rate'. Can I know what do these values mean and how you obtained them?

3. Under the section 'Simulate the epidemy outbreak based on the fitted parameters', you used a time step of 1/24.

dt = 1/24; % time step May I know how you chose this value and how it affects the output?

4. Do I have to make any changes in the values in the functions fit_SEIQRDP, SEIQRDP and checkRates? I am using the global time series data for confirmed, deaths and recovered cases from the github link you shared. I downloaded the csv files and imported them instead of obtaining it directly from the source. That would be the only difference I made. Do I need to make any changes in the parameters for each country I analyze?

I'm having some trouble understanding the codes as I'm still a beginner and would be very grateful for your help!!

Allyson ChongAlright, thank you very much for your help Cheynet!

E. CheynetHi Allyson,

The message you see comes from the fact that indR is empty. Said differently, it means that the function getDataCOVID that you modified failed to read the excel file properly. I don't know exactly how you have modified the file so I cannot help much more. An alternative is that you manually import the excel file in the workspace, let Matlab create a function to read the excel file and re-use this function (after minor modifications).

Allyson ChongHi Cheynet,

I tried to alter the codes in the function 'getDataCOVID.m. Instead of using the data from the Github repository, I downloaded the data and separated them into excel files. I imported the data and altered the codes as below. I then ran this function in the Example_Country file. Then the error 'index exceeds the number of array elements' popped up for the line below

indR = indR(1);

How can I solve this?

%% Import the data

Ndays = floor(now)-datenum(2020,01,22)-1; % minus one day because the data are updated with a delay of 24 h

opts = delimitedTextImportOptions("NumVariables", Ndays+5);

opts.VariableNames = ["ProvinceState", "CountryRegion", "Lat", "Long", repmat("data",1,Ndays+1)];

opts.VariableTypes = ["string", "string", repmat("double",1,Ndays+3)];

% Specify file level properties

opts.ExtraColumnsRule = "ignore";

opts.EmptyLineRule = "read";

status = {'confirmed','deaths','recovered'};

for ii=1:numel(status)

if strcmpi(status{ii},'Confirmed')

tableConfirmed =readtable('data_confirmed.xlsx', opts);

elseif strcmpi(status{ii},'Deaths')

tableDeaths =readtable('data_deaths.xlsx', opts);

elseif strcmpi(status{ii},'Recovered')

tableRecovered =readtable('data_recovered.xlsx', opts);

else

error('Unknown status')

end

end

time = datetime(2020,01,22):days(1):datetime(datestr(floor(now)), 'Locale', 'en_US')-datenum(1);

Md NasseefHi Cheynet,

Thank you. I have installed the latest version of MATLAB R2020b and the code works fine now.

E. CheynetHi Md Nasseef,

I think the error comes from the fact that you need at least Matlab R2018 to use the optional parameter "opts = delimitedTextImportOptions". You can see in the comments by the users possibles solution to overcome this issue (see the comments around the 18 April 2020)

Md NasseefHello Cheynet,

I am getting an error to run getDataCOVID (Error in getDataCOVID (line 14) opts = delimitedTextImportOptions("NumVariables", Ndays+5)); I am using MATLAB R2017b. Can you guide me to fix this problem. Thank you.

E. CheynetThe code is updated, but there were several minor bugs to be corrected after the databases changed. Normally the examples should work properly now.

E. CheynetHi Fayeza,

Thank you for the feedback! I intend to update the code to make it easier for the user to choose the starting and ending time for the fitting. This should be done within one week

Fayeza FayezaHi Cheynetl,

I understand that the reported data begins from 22nd Jan but the code works starting from Jun. Can we do some changes to the code to start from say August ?

Xuhui YangE. CheynetHi Bruno,

the vector [a,b,c,...] groups the different guess for each parameters. That is a compact way to group the different guess values. The lower and upper bound are set inside the function fit_SEIQRDP, so the user do not need to specify them.

There are 10 parameters in total, so the fitting is not done by trying to find the ten best values simultaneously (at least not directly). A first fitting is done for the fatality (3 parameters) and recovery rates (3 parameters). The parameters are estimated using the individual measured rates. This fitting gives the narrow upper and lower boundaries for the parameters kappa and lambda. So after the first step, six of the ten parameters are almost fixed.

Bruno SilvaHi Cheynet,

Thanks for the answer. You clarified part of my doubts. Sorry if I wasn't clear on my lambda and kappa question: When I define the vector [a, b, c], what does that mean? For example: 'a' is a low limit, and 'c' is the upper limit, and 'b' is a rate? Or is the time series defined by [low limit, midpoint, upper limit]?

E. CheynetHi Bruno,

Thank a lot for the feedback!

I am not sure I understood properly your question. So I try a long answer:

lambda_guess and kappa_guess are values used to facilitate the fitting. Let say one wishes to get the minimum of a function. If the values of the first guess provided by the users are close to the absolute minimum, the algorithm will converge appropriately. If the guess values are not adequate, one may end with a local minimum. That is a common problem with least-square fitting algorithms.

If your questions are about the "physical" values of kappa and lambda, the file Documentation.mlx describes them, but I think there is a visual bug in the online display of the file Documentation.mlx because these equations do not appear.

The dispersion in checkRates is from the reported data (the "measured" values). So the dispersion depends on the data source. If you consider data from Italy, the dispersion is quite small for example. The dispersion will be larger if you take for example France or Belgium.

Did I answer appropriately your questions?

Bruno SilvaHello, Cheynet.

First of all, congratulations to your interesting work. I have a few small questions to ask: What is the meaning of choosing vectors for lambda_guess and kappa_guess?

In the checkRates test, the dispersion of the measured values must always present a dispersion similar to the example_Country? What are these measured values?

E. CheynetHi Raj,

Q is the number of active cases. In the present Matlab submission, It is called "quarantined" because it is assumed that each case that tested positive is isolated. This new "state", denoted Q can have two outcomes: recovered or deceased. Note that in the case of the US states and cities, the number of active cases was/is not available, so I am using the number of totals confirmed cases minus the number of deceased cases, instead. It's not ideal, but it's better than nothing.

Raj KishorHi Cheynet, a very small doubt. Is the Q (quarantined cases) is same as total active cases? or it is equal to total confirmed cases??

E. CheynetHi Vikas,

Oh I see! thank you, I did not noticed the typo it for the point (2). I will correct it in the next version.

Vikas SharmaHello Cheynet, great to see you consider my thoughts. As far as (2) is concerned, as I pointed out earlier, the typos exist in

Documentation.mlx illustration where you explain the Numerical solution part.

This typos is not there in any m files or implementation part. So results are not affected by this typos. However, one

should not get confused when he see the matrix A in illustration para of documentation.mlx.

Regards

E. CheynetHi Vikas,

Thank a lot for the feedback!

1) You are right. I also noted that kappa_0 and lambda_0 should have the dimension of the inverse of a time. I have updated the file documentation.mlx with appropriate definition of these constant.

2) It is strange, I still see that the matrix is 7x7. is it in the function SEIQRDP or fit_SEIQRDP?

3) You raised an important point! I have spent a couple of hours looking at E0 and I0. They, indeed, can affect the fit. Ideally, they should also be free parameters, but that makes a lot of unknown... I have updated the examples with a slightly modified version of your suggestions, which seems to work well with data from China.

Vikas SharmaHello E. Cheynet. Many congratulation for wonderful work. Some typos/suggestion need attention such as :

1. Both k1 and lamda1 should have dimension of T^(-1) or other depending upon different formula for recovery and mortality rate. It isdefinitely not having any impact on results but mathematically the exponential term should have zero dimension as a whole.

2. The matrix A should be 7by7, but it is showing 7by8 currently.

3. Current E0 and I0 is making model highly sensitive to perturb data specially for India in terms of final estimation. So setting initial guess for incubation time of 5 days (As reported in most of the literature) should correspond to following assumptions

I0 = 0.1*Q0(1); % Initial number of infectious cases 10% of reported case

E0 = 5*I0; % Initial number of exposed cases is 5 times initial infected.

Regards

E. CheynetHi Javier,

That is a good question. Both E and I are not known. So E0 and I0 are not known either. If the number of confirmed case is not zero, then it is highly unlikely that E0 or I0 is zero. It would be wrong to set them as equal to zero as it would hinder the simulation. The choice of E0 and I0 is arbitrary. I chose to use a value of E0 and I0 equal to Q0, but other choices are possible.

Javier MontoliuHello E. Cheynet! Very interesting project! I am a bit of a beginner and have a question, what crietria did you follow when choosing a value for the initial exposed and infected population (E0 and I0)? I see that Q0 is the initial active cases (Q0=Confirmed(1)-Recovered(1)-Deaths(1)), but why did you set the values E0 and I0 to the initial confirmed cases (Confirmed(1), total repoerted cases at initial time)?

Thank you!

E. CheynetTo avoid any misunderstanding: no machine learning is involved in this model, so "model training" is a non-sense here.

E. CheynetHi Tsardoz,

If you look at the function fit_SEIQRDP, you will see that the fitting algorithm does not fit all the parameters at the same time. For example, for the fatality rate, a first estimate is obtained and constrained to limit the risk to achieve local minima. For the recovery rate, if available, a similar approach is used. This greatly reduce the problem of overfitting. Could it be improved? Without a doubt. Nevertheless, the user is still responsible for assessing whether the fitting is meaningful or not. This is partly the purpose of the function "checkRate.m".

Besides, the fitting algorithm relies on a constrained range of parameters. This is the reason why lsqcurvfit is used instead of nlinfit. Finally, evaluating what are "realistic parameter values" is barely meaningful since there is limited information about which parameter is realistic, what they really mean, and what is the source of data. Example: there was a discussion in this thread that pointed out that the parameter delta used by Peng et al. was incorrectly defined as the "inverse of the quarantine time". The physical interpretation of the parameter was the problem, not its value.

Tsardoz"There will be overfitting if the fatality or recovery rate is constant with time."

Opposite is true I am afraid. The more parameters you try and estimate with limited data the more models overfit and produce poor results.

The goal should be to produce realistic parameter values not LSE on model training set predictions.

E. CheynetHi Tsardoz,

I ignore how you compute the R0 value, which means I cannot tell if you apply an appropriate method. Note that I did not (so far) include any calculation of the basic reproduction number. That is a study on its own. There will be overfitting if the fatality or recovery rate is constant with time. It is explicitly stated that the model deals with time-dependent recovery and mortality rates. The appropriate use of the fitting algorithm is the user's responsibility. In summary: garbage in, garbage out.

E. CheynetHi Md Humayun Kabir,

Check that the function 'getDataCOVID' is in your workspace. The Matlab version should not affect this issue.

Md Humayun KabirHello E. Cheynet,

My MATLAB is the 2018b version. When I run .mlx file, I am facing the following problem:

Undefined function or variable 'getDataCOVID'.

Could you please let me know what wrong am I doing?

Advance Thank You.

E. CheynetHi Lu Krul,

λ0,λ1,κ0,κ1 are empirical coefficients obtained by the least-square fitting. If possible, I calculate a first estimate using the calculated rate of deceased and recovered cases. The final estimate is obtained using the total set of equations.

As the variable "guess" denotes, this is a guess. It's arbitrarily chosen at the beginning.

I did not try to calculate the basic reproduction number. I cannot help much here.

The "tau" variable is a time delay that improves the modelling of the time-dependant recovery rate. λ0,λ1 are non-dimensional coefficients to be adjusted. λ0 is the recovery rate as time becomes large.

κ0 and κ1 are also empirical coefficients. κ0 is the initial death rate. κ1 includes implicitly the information about a time delay for the death rate. I may change him for a more explicit formula in a further version.

Lu KrulHi!

I am a beginner, and I am very interested in your project, which helps me a lot. To better understand your work, I have several questions to ask.

How did you get λ0,λ1,κ0,κ1?

How did you estimate the initial GUESS parameter?

What is the formula for basic reproduction number? Is the formula β*δ*(1-α)^T correct?

What’s the meaning of the three variables in lamda_guess? What’s the meaning of the two variables in kappa_guess?

I would appreciate it if u could give me a reply <3

Fryese39E. CheynetHi Bill,

That is a good question. If the number of recovered is unknown, I think it is a good idea to set the initial conditions with a number of recovered E0 different from zero. I did not try to see how a different value of E0 would affect the computed curves of quarantined and deceased. The number you propose may be realistic. I don't know enough on the topic to judge. I guess the choice of E0 would affect the parameter "lambda". However, the system of ODEs is coupled, so the parameter "lambda" is involved in more than one equation. That implies that E0 has only a partial (but maybe important?) role in the fitting procedure.

Bill BuddHi Etienne, Many thanks for your heroic efforts in handling the case where the number of Recovered is unknown. Your examples highlight the US but I am mainly interested in getting a forward prediction for the UK, where I live (the UK also doesn't publish numbers for Recovered).

I was wondering, for the case where the Recovered is unknown, have you tried seeding the solver with a number of Recovered equal to a simple multiple of the number of Deaths?

I tried this for the UK using Recovered=Deaths*1.5; with no other adjustments to the input arrays and the code returned very similar results to a time-delayed version of Italy, which has good reported data and very plausible curve fits.

I was surprised to see that the ratio of Recovered/Deaths for the fitted curves for my UK example, which was seeded with a scalar of 1.5, is time variant and quite similar to the one for Italy. So your solver seems to have almost magical properties.

No reply needed, but interested in any comments you may have.

Bill ChenE. CheynetHi ZAFAR, there is no .csv file to download. A function read the content of a table from a GitHub repository and store it as a csv. Then the csv is imported in a local workspace, read and deleted. There are alternatives approach that can be used though.

Yerlan AmanbekZAINULABADIN ZAFARNeed to download .csv file

Dale GroutageHi Cheynet, sorry for the posting of the script. I just thought that you could run the script for 'US' and 'Korea' and see that it works. You might be interested in how the plots of the "Active" cases are approaching a Peak ...around the 30th of April. What is interesting is that a lot of scientists are using the data and predictions of the Confirmed Daily Change to predict the Peak, which is a symmetric shaped curve, whereas the prediction curve of the "Active" chases is asymmetric with a long tail, which I believe is more realistic. Thanks for your comments!

E. CheynetHi Dale,

I do not recommend posting an entire Matlab script in the comment section. It is known that there is no "recovered data" for US cities. I have no error with the country "Japan". The data loads properly and the fitting completes successfully with the up-to-date version I uploaded. You may have used an older version that was more prone to bugs.

Dale GroutageE.Cheynet, run the code I posted for Location 'US' , 'Korea', and 'Japan' and you will see the problem. 'US' and 'Korea' work, but 'Japan' does not. When you look at the csv data files, it appears there is data for Confirmed, Deaths and Recovered, but when you use Location 'Japan' your codes does not pick up tableConfired, tableDeaths or tableRecovered.

E. CheynetHi Dale,

I am not sure I understand what you mean. If you talk about the Github page, you cannot modify the master branch, which is the one I use.

Dale GroutageE.Cheynet, note that if you run your code with my additions, the figures need the file folders to be setup. That is, if you run the US data, there must be a file sub folder labeled US in the Figs folder...Figs/US...or if your location is Korea, then there must be a file setup as ...Figs/Korea.

E. CheynetHi Samia,

I cannot reproduce the error you get (with R2019). Check if you use the latest version of the submission. If you still have the problem, you can use the manual debugging tools by Matlab

Samia SarothiHello E. Cheynet,

my MATLAB is 2019 version. I am facing the following problem:

Error using lsqcurvefit (line 262)

Function value and YDATA sizes are not equal.

Error in fit_SEIQRDP (line 90)

[Coeff,~,residual,~,~,~,jacobian] = lsqcurvefit(@(para,t) modelFun1(para,t),...

E. CheynetAnother alternative is to use the function: websave('dummy.csv',yourFileName); which will read the file you want to access on the internet and store it as a csv file. Then you can use a traditional approach to read the csv file in Matlab.

E. CheynetHi M Farooq,

Check if your Matlab version is recent enough. I think you need at least Matlab R2018b to use the function "delimitedTextImportOptions".

Alternatively, you can manually import the table you want and ask Matlab to automatically generate for you a function that you will use to the data.

M FarooqM FarooqHello E. Cheynet,

I'm trying to run

[tableConfirmed,tableDeaths,tableRecovered,time] = getDataCOVID()

and getting following error

Undefined function or variable 'delimitedTextImportOptions'.

Error in getDataCOVID (line 14)

opts = delimitedTextImportOptions("NumVariables", Ndays+5);

Could you plz let me know what wrong am I doing?

Regards

Hamdullah LivaogluE. CheynetAs far as I know, the rate "delta" has the dimension of the inverse of a time so it should not be expressed in percentage but in days^(-1).

E. CheynetThank Dionne. I have redefined the variable delta as the rate at which infectious cases enter in quarantine in the updated code.

Dionne AlemanThanks E. Cheynet for the fast response and all your efforts in creating this fantastic code implementation. Treating delta as a rate makes much more sense. Just to be explicit about delta v delta^(-1) in the papers and in your code: In the code, delta = 1/8 means that people are entering quarantine at a rate of 12.5%, while delta = 1/10 means entering at a rate of 10%. Thus, increasing the denominator would logically decrease the actual number of quarantines and increase infections.

Dale GroutageE.Cheynet...THANKS for including the state of Washington.

E. CheynetHi Dionne,

I agree with you that the definition of delta is weird. There is maybe a better definition if you look at the paper by ref. [2]. In their paper, they do not define it as the inverse of quarantine time but the rate at which people enter quarantine. This make more sense....

[2] Munz, P., Hudea, I., Imad, J., & Smith, R. J. (2009). When zombies attack!: mathematical modelling of an outbreak of zombie infection. Infectious Disease Modelling Research Progress, 4, 133-150.

Dionne AlemanDelta is defined as "inverse of average quarantine time", which I take to mean the inverse time in days an infected individual spends in quarantine. But, increasing the denominator in delta (i.e., increasing the number of days in quarantine) results in more infections. Treating delta as time in quarantine results in increased quarantine time yielding fewer infections, but the overall curves are unrealistic. I cannot tell from the code if I am misunderstanding the meaning of delta (perhaps it is rate of quarantine, though the cited Peng et al paper defines it as quarantine time) or if there is a bug. Has anyone else played with the delta parameter?

Dale GroutageE.Cdheynet...Here is what I have tried...I modified your code with a new Locations statement for Example_US_cities.mlx as follows:

fprintf(['Most recent update: ',datestr(time(end)),'\n'])

Location = ", Washington";

try

indC = find(contains(tableConfirmed.Combined_Key,Location)==1);

indD = find(contains(tableDeaths.Combined_Key,Location)==1);

catch exception

searchLoc = strfind(tableConfirmed.Combined_Key,Location);

indC = find([searchLoc{:}]==1);

searchLoc = strfind(tableConfirmed.Combined_Key,Location);

indD = find([searchLoc{:}]==1);

end

Here is my output:

Most recent update: 04-Apr-2020

Combined_Key

______________________________

"Adams, Washington, US"

"Asotin, Washington, US"

"Benton, Washington, US"

.

.

.

"Whitman, Washington, US"

"Yakima, Washington, US"

"Out of WA, Washington, US"

"Unassigned, Washington, US"

Error Messages:

Matrix index is out of range for deletion.

Error in datetime/parenAssign (line 58)

thisData(rowIndices) = [];

Error in DaleExample_Washington (line 38)

time(Confirmed<=minNum)= [];

E. Cheynet"Example_US_cities.mlx" contains one example for an entire state. A state is here defined as the sum of every city collected by the database associated with the same state name. Apparently it seems good enough if we look at the total population.

E. CheynetHi Dale,

That should be possible for individual states. However, one would need the database to exist and to be up-to-date on e.g. Github. I did not try to find one.

E. CheynetHi Dale,

Thank a lot for your message!

I am not sure I understand your request, but the present Matlab submission contains already an example with cities in the US: "Example_US_cities.mlx"

E. CheynetHi Jacobo,

The main reason is that the script removes confirmed cases with little data, which are not meaningful for the fitting. Depending on the number of confirmed cases, it is possible that the variable "Confirmed" becomes empty. It is also possible to trigger the error if one runs (manually) two times the same lines because the size of the vector "Confirmed" changes. In both cases, the user has the responsibility to check the size of the vector "Confirmed".

Jacobo CHoyHello,

I can not run canada with the example3 file. It says on line 40 : " Matrix index is out of range for deletion".

Dale GroutageHi E.Cheynet, I have been playing with your code...which is terrific! I an now trying to analyze individual states, like the state of Washington. I am not having much luck. Do you have any code like the examples you posted for multiple cities?

E. CheynetThe bad fit for some of the US cities seems to be due to the fact that I modified the recovery rate to make it constant. That was not a good idea. So I decided to come back to a time-dependant recovery rate.

Ronald ManríquezEugene GallagherThe new codes with the example from the US run well with my Matlab 20a with the optimization toolbox. The new US cities live script is very nice (with a few poor fits to some cities), but it is a shame that the US doesn't have accurate data on those that were infected but that are now recovered. It would be interesting to know what China, Italy and France are doing to get those recovered data that the US is not doing.

Nik NHello, Thaks for your nice codes,

I have error in getDatacovid.m in line : opts = delimitedTextImportOptions

Thanks again.

Rejikumar KHello,

I am Reji Kumar.

My MATLAB is older than 2012 version. How can I use the code in it? Anybody Please help.

Ranajoy Guha NeogiHi Etienne,

Thank you very much for explaining. Now my doubt is clear. So in the figure in mlx file 'Confirmed(Reported)'=Confirmed(JHU data) -Recovered(JHU data) -Death(JHU data) ; from JHU data and then matched to Quarntined Q(t). I was confusing 'Confirmed(Reported)' in your code as 'Confirmed' numbers in JHU data. Thank you again.

Ranajoy Guha NeogiRanajoy Guha NeogiHi Etienne,

Thank you very much for explaining. I still have a gap in my understanding. Please ignore my last post. Here is my query. In the Figure generated in mlx files how are the confirmed(reported), recovered(reported), Deaths(reported) are related to actual JHU data? Especially how confirmed(reported) data in the figure is related to the 'confirmed' data in JHU. Since, confirmed(reported) data points decrease after sometime in the figure but 'confirmed' data in JHU never falls.

E. CheynetHi Ranajoy,

You raise a fair point. In the Github depository of John Hopkins University, only three categories are available: Recovered, confirmed and deceased. Their definition of "confirmed" means "has been tested positive". That is why the number of confirmed cannot decrease in this database. I guess the term "confirmed (infectious)" is not the best choice as the infectious cases do not remain infectious forever. It is true that in the examples I uploaded, I also use the term confirmed. The confirmed cases are, in fact, the quarantined cases since I assume that in a real-life situation, as soon as someone is "confirmed", they are put in quarantine, which is likely the closest from the definition "active cases = confirmed-recovered-deaths".

Ranajoy Guha NeogiHello,

If I understood correctly, the data is being extracted from John Hopkins site where there are four categories Confirmed, Deaths, Recovered, Active in the John Hopkins github (where Confirmed=Deaths +Recovered+Active ; at any point of time in data ). Thus 'Confirmed' as per the John Hopkins definition can never fall as it is the sum total of Death, Recovered, Active. But in this generalized SEIR model the 'Confirmed' graph keeps falling after a time as per the model , so should we not take 'Active' data instead of 'Confirmed' when we match with the model?

E. CheynetHi Eugene,

It seems that John Hopkins university has started uploading new dataset with specific focus on the US on their Github page. I will wait they finish uploading everything before looking at the data.

Eugene GallagherAll of the *mlx files are running beautifully now. The plots of the actual data for the Exampl4.mlx are still showing some negative values but with a note that these are ignored. The most amazing graphic is the Lombardi region graph in ItalianRegions.mlx. That flateening of the infected curve is obvious but would be missed if the recovered and dead states were ignored.

It is a shame that you have been unable to fit the US data with your SEIR model because of a lack of data on recovered. Just now (4/1 1 PM EST) Johns Hopkins reports 190740 US confirmed infected, 4127 dead, and 7141 recovered.

E. CheynetUpdate: The data for the Italian regions have been updated after the database format was changed. If you try to run the old version of the code, it will not work well.

E. CheynetHei Eugene,

I have also noticed this warning. Apparently, the number of reported recovered + deaths became higher than the number of "confirmed" in some of the cases. That's weird but can be due to reporting errors. Since it's an issue that comes from the database, I have implemented a quick-and-dirty solution. In the updated the function fit_SEIQRD, I simply forbidding the number of "actual cases" to become negative and set it to zero

Bill BuddHi Etienne, I fixed the issue I mentioned yesterday by desampling the finely sampled Fitted curves before taking the 'diff'. I now get a very nice plot of Fitted vs Reported daily values, at least for Italy. Thanks for your patience and hope you get your wifi working.

E. CheynetHi,

Sorry for the delay in replies. I have temporarly lost access to wifi (only mobile net). I try to reply in a near future

Bill BuddHi Etienne, I'm trying not to be lazy here but I only started using Matlab yesterday (I have experience in IDL). I hope you can give me some pointers to solving a question relating to your SEIR modelling .

I took your Example3 and made it work for several countries including UK, where I live. I then applied the 'diff' operator to the cumulative arrays (Fitted and Reported) to simulate the daily cases.

I get good agreement between diff-ing the cumulative Reported arrays in your code and the Reported daily cases from other sources, so my 'diff' is working ok. However when applying 'diff' to the Fitted curves, they underestimate the Reported daily cases by a long way.

I traced this to the fact that the Fitted curves are more finely sampled than the Reported values (eg 471 elements compared to 37 in the case of Italy).

Please could you give me some advice in very general terms, of how to make the Fitted arrays conformable with the Reported arrays so that the 'diff' returns comparable results, eg to make D array conformable with Deaths in Example3.

Appreciate any advice you can offer and Thanks again for some great work

Jacobo CHoyHello,

I am getting an issue while modeling for canada with the example3 file. It says on line 40 : " Matrix index is out of range for deletion".

Eugene GallagherHi, I just ran Example4 after downloading the latest code and data. Before today, the SEIR model was running nicely for all provinces, but today 9 of the Chinese Provinces are modeling negative confirmed cases. The program prints a warning.

Bill BuddHi, FYI I just downloaded ECheynet-SEIR-aefecf6 and ran all the examples in Matlab R2020a and they all execute fine.

Many thanks and Respect to you Sir for some excellent code!

E. CheynetHi MJ,

You can check the value of time(1). If time(1) >=datetime(2020,04,01), then N= 0. I noted also that the databased by John Hopkins university has been changed yesterday, so the examples won't work well. I need to update examples 1 to 4.

MJSomething is not right here, as it returns N = 0, is this dt correct? Thanks for your help!!!

dt = 0.1; % time step

time1 = datetime(time(1)):dt:datetime(2020,04,01);

N = numel(time1);

t = [0:N-1].*dt;

Eugene GallagherThanks much. It is a shame that the Johns Hopkins site isn't putting recovered data in the csv files. Let's hope that they start putting in the recovered data so that your code can be run with future dates. For the first time, the Johns Hopkins site was listing recovered for the US on the JH website. On their interactive web page, today is the 1st day that they are providing the recovered data for the US (827 dead, 354 recovered Italy: 20,499 dead and 112,982 recovered):

https://coronavirus.jhu.edu/map.html

Joshua AsamoahThank you very much, Cheynet.

E. CheynetHi Joshua and Eugene,

You are right, there was a typo in the url address used. It should now be corrected. The Github account from John Hopkins university uses an updated dataset without the number of "recovered cases". That is a pity because this is a valuable piece of information for the fitting procedure. Therefore I decided to use the data updated up to the 23-03-2020 only. The main reason is that the purpose of the submission was simply to shows how the fitting of a generalized SEIR model can be done in Matlab.

Joshua AsamoahPlease I downloaded the new update but I having this error message. Please, how can I fix this

Error using urlreadwrite (line 92)

The server did not find a resource to match this request.

Error in urlwrite (line 52)

[f,status] = urlreadwrite(mfilename,catchErrors,url,filename,varargin{:});

Error in getDataCOVID (line 36)

urlwrite(fileName,'dummy.csv');

Eugene GallagherEtienne. I downloaded the new files. Example1 runs, but Example2, Example3 and Example4 all stop with the following error. Thanks so much for your work on this code.

Error using urlreadwrite (line 92)

The server did not find a resource to match this request.

Error in urlwrite (line 52)

[f,status] = urlreadwrite(mfilename,catchErrors,url,filename,varargin{:});

Error in getDataCOVID (line 36)

urlwrite(fileName,'dummy.csv');

E. CheynetThe examples files should be working normally now.

E. CheynetHi Eugene,

Thank for the information! I see that Johns Hopkins University has created a new database. However, they do no longer include the number of "recovered" as a function of the time. That is strange... I will try to update the example this evening.

Eugene GallagherEtienne, All 4 examples and the Italy analysis were working last night, but when Johns Hopkins updated their data, Example2, Example3, and Example4 all produce the error that the time variable for 3-24-2020 is invalid (NaT in the time vector). The Italy example still works, but it downloads its data from Italy. Here is the error message, and the last element of the time vector is NaT, not a valid date.

Error using dateformverify (line 18)

DATESTR failed converting date number to date vector. Date number out of range.

Error in datestr (line 200)

S = datef.ormverify(dtnumber, dateformstr, islocal);

Error in datetime/datestr (line 801)

s = datestr(datenum(this),varargin{:});

Angel SuarezHi.

How can I fix the following:

>> fit_SEIQRDP(Q, R, D, Npop, E0, I0, time, guess, varargin)

Attempt to execute SCRIPT varargin as a function:

Thanks

E. CheynetHi Eugene,

Thank for the message! I had not noticed the financial toolbox was required for the function "today". In the updated version of the code I have written: "floor(datenum(now)" instead of "datenum(today)"

Joshua AsamoahThese codes are really great. Thank you Dr E. Cheynet, for helping us with these.

Please, can you help us with how to plot the residuals?

Eugene GallagherThis programm is outstanding. I'll be using it in my graduate probability and statistics course to demonstrate fitting the parameters of a model with data. I needed one edit to get the program running, "today" which retrieves today's date is only available in Mathworks financial toolbox, so I just typed in today's date in the appropriate format.

E. CheynetHi Jose,

Thank you for your suggestion. Unfortunately, I not the qualifications nor the time to achieve or contribute to this work.

E. CheynetHe Jacobo,

I do not think it will work very well for the world due to the "time delays" between the different epidemy locations. It is also important to remember that one should not try to do long-term prediction with the function SEIQRDP.m. This is due to the large uncertainties that exist, especially when dataset are limited.

Jacobo CHoyHello,

Hi Jacobo,

I have updated the example files with more pedagogic description of the variable "guess" and what are the numbers used.

Response:

Seems you have fixed it and made more intuitive! Amazing thanks. Really good code.

___

Further questions:

I am trying to make a SEIR model for the world. I have used your code, but maybe creating a region called "world" adding all cases, deaths, etc... would help?

I went around the issue by adding 4 .mat files which I imported from a .csv where I added the region "world".

Just as a suggestion, it would be a good idea to have a region for the world.

So far I am unable to get a proper prediction for the world, the curve is not steep enough. I am still trying with several values.

Jose ReisHi! I agree.... If you don't mind, although at a slower pace, we can collaborate using this platform..... I already have some good people working on it, so if there is anything you would like to implement and would like to parallelize with other people, let me know.

This is what I am thinking right now:

-> creat a "wrapper" around the "fitter" to allow running all lines of data (countries and regions)

-> output to a .csv or xls the input and output from the Fit

-> add other metrics to the simulated results, such as

---> days until peak

---> total number of infected

---> total number of dead

-> merge the data (maybe with an API) with other demographic information, maybe from the WEF or WHO

-> run some data analytics in the output data, trying to find patterns and correlations

-> run a Correlation Matrix between input and output, so as to assess the sensibility of the fitting parameters into the results both of the fit and the "post proccessed" variables

What do you think?

E. CheynetHi Jose,

I have actually uploaded an updated version where the "contains" function was replaced by "strfind" yesterday, for those who work with older Matlab version. I do not advise you about revealing your email. You will get heavily spammed otherwise.

Jose ReisHello Cheynet,

This is what I did to work around the compatibility issue

%%%%%% Used in 2015 since "contains" wouldnt work

% id=strfind(tableRecovered.ProvinceState,'US');

% Index = find(not(cellfun('isempty',id)));

For the other issue, I simply used the plot command instead of the original one, as follows:

plot(time1,Q,'r',time1,R,'c',time1,D,'g',time1,I,'b','linewidth',2);

hold on

plot(time,Confirmed-Recovered-Deaths,'ro',time,Recovered,'bo',time,Deaths,'go','linewidth',1);

I am leading a group of friends here in Brazil, working on different models, one being one of them. If you could get in contact with us, that would be great. I dont know if I could publish my email here, but you can find me on linkdin as José Roberto Clark Reis.

Thanks,

José

FabianOh that was simultaneously posted. Thanks a lot Mr. Cheynet, that is awesome! I am a scientist in the filed of Electrochemistry but right now i am using my time at home to take a look into other important topics right now:)

E. CheynetHi Fabian,

I have uploaded a new version where the DATA.mat file is no longer required. Using the function "getDataCOVID", the data are directly read from the Github repository mentioned above.

FabianHey Community,

i might have a very basic question. I got the recent statistic data for the disease from GitHub (.csv files). How can i transfer the data to the Data.m file?

I tried to copy paste additional columns into the single tables (recovered, confirmed, deaths, time) directly in Matlab and saving the changes of the Data.m file.

I am getting the error:

Error using strfind

Cell must be a cell array of character vectors.

Which i do not really understand since this function is, as far as i understood, used to find the desired country i want to evaluate. (I did not change anything in the code of example 3)

When i reload the original Data.m file, the code works well.

In general: Thank you for the awesome work!

Best regards,

Fabian

E. CheynetHi Jacobo,

I have updated the example files with more pedagogic description of the variable "guess" and what are the numbers used.

Jacobo CHoy*I solved the issue* => i don't know which the issue was I just ran it in a .mlx instead of a .m

____

question for example 4 => What are the numbers in your guess? what is the order I can't figure it out.

I found them to be

alpha: protection rate

beta: infection rate

gamma: Inverse of the average latent time

delta: inverse of the average quarantine time

lambda0 and lambda1: coefficients used in the time-dependant cure rate

kappa0 and kappa1: coefficient used in the time-dependant mortality rate

Jacobo CHoyI published a request for help here: https://la.mathworks.com/matlabcentral/answers/511876-issue-with-seir-model-for-mathlab

I am looking forward to using your software for my work and I really hope I could use it.

I am getting this mistake:

Error using SEIQRDP

Too many input arguments.

Error in example3 (line 34)

[S,E,I,Q,R,D,P] = SEIQRDP(alpha1,beta1,gamma1,delta1,Lambda1,Kappa1,Npop,E0,I0,Q0,R0,D0,t);

Matlab ExpertE. CheynetHei Jose,

Thank for the feedback!

The function "contains" was introduced in Matlab 2016b and if you have an older version, you will not be able to use it. I think an alternative to it is " strfind"

For the second error, this may also be a compatibility issue, although it is a little more obscure to me. The function "datetime" was introduced in Matlab 2014b, but there have been maybe some updates that only work with more recent versions. I did write the function with Matlab 2019b. Alternatively, you can use "datenum" and the function "datetick".

Jose ReisHi,

I am trying to run your very nice implementation. Example 1 works fine. Example 2, I ran into at least two problems. The first one is with funcion "contains", right in the beginining of the code. I bypassed it and started it with the 156 value corresponding to the Hubei province. It ran ok.

Then, it halts when I come into:

>> figure

semilogy(time1,Q,'r',time1,R,'b',time1,D,'k');

Error using semilogy

A numeric or double convertible argument is expected

I am using 2015a. Could this be a compatibility issue?

Thanks